diff --git a/docs/planning/otel/OTelWebSdk.md b/docs/planning/otel/OTelWebSdk.md new file mode 100644 index 000000000..5070846bd --- /dev/null +++ b/docs/planning/otel/OTelWebSdk.md @@ -0,0 +1,1344 @@ +# OpenTelemetry Web SDK Specification + +## Executive Summary + +The OpenTelemetry Web SDK is a modern, enterprise-grade telemetry solution designed specifically for web applications. It provides full OpenTelemetry API compliance while addressing the unique challenges of browser environments and multi-team development scenarios. + +**Key Differentiators:** +- **Multi-Instance Architecture**: Multiple SDK instances can coexist without conflicts, enabling team isolation +- **Web-Optimized Performance**: Minimal bundle size, superior browser compatibility (ES2020+ with ES5 fallback) +- **Enterprise Features**: Dynamic configuration, complete cleanup/unload, multi-tenant support +- **Comprehensive Telemetry**: Full distributed tracing, structured logging, basic metrics, plus web-specific instrumentation + +**Primary Use Cases:** +- Large-scale web applications with multiple development teams +- Enterprise applications requiring compliance and governance controls +- Modern web frameworks (React, Angular, Vue, Next.js, etc.) +- Progressive Web Apps and Single Page Applications +- Applications requiring migration from existing telemetry solutions + +## Overview + +The OpenTelemetry Web SDK is designed as a modern, modular implementation that follows the OpenTelemetry specification while providing enhanced flexibility and performance for web applications. It delivers a complete observability solution encompassing distributed tracing, structured logging, and basic metrics collection through a multi-instance SDK factory that enables multiple teams to efficiently share resources while maintaining isolation. This factory pattern is particularly beneficial for modern web applications where multiple teams or components need isolated telemetry contexts in the same runtime while sharing underlying infrastructure. + +## Why a Web-Specific OpenTelemetry SDK? + +### **Web Application Requirements & Challenges** + +While standard OpenTelemetry SDKs provide excellent observability for backend services, web applications have distinct requirements that justify a specialized implementation: + +#### **1. Browser Environment Constraints** +- **Bundle Size Sensitivity**: Web applications must minimize JavaScript bundle size for performance +- **Tree-Shaking Requirements**: Dead code elimination is critical for production builds +- **Browser Compatibility**: Runtime requirement with graceful detection and fallback for unsupported browsers + - **Minimum Language Support**: This SDK will target ES2020 features (async/await, optional chaining, nullish coalescing, etc.) + - **Graceful Degradation**: The SDK will provide a loader/initialization **that targets ES5 (or earlier)** to ensure successful loading, parsing, and execution in older runtimes so it can detect unsupported environments and provide fallback behavior (such as reporting why the SDK can't load or falling back to a basic non-OpenTelemetry based SDK) + - **CDN Safety**: Bundle variants MUST avoid syntax errors in older browsers while providing feature detection + - **Runtime Detection**: Automatic capability detection to prevent crashes in unsupported environments +- **Memory Management**: Prevent memory leaks in long-running single-page applications +- **Network Efficiency**: Optimize telemetry transmission for mobile and low-bandwidth scenarios + +#### **2. Web-Specific Telemetry Needs** +- **Page View Tracking**: Navigate between SPA routes and traditional page loads +- **Dependency Tracking**: Track all Ajax style requests (XMLHttpRequest, fetch, sendBeacon) from the page +- **Browser Performance Metrics**: Navigation timing, resource timing, frame rates +- **Client-Side Error Tracking**: Unhandled exceptions, promise rejections, console errors +- **User Session Management**: Session boundaries, user identification, device context +- **Real User Monitoring (RUM)**: Actual user performance vs synthetic monitoring +- **User Experience Monitoring**: Core Web Vitals, paint timing, user interactions + +#### **3. Enterprise Multi-Team Development Challenges** +- **Multi-Tenant Architecture**: Multiple teams/libraries need shared telemetry configuration with optimized resource usage +- **Version Isolation**: Teams require independent SDK versions due to different release cycles and upgrade timelines + - **NPM Package Isolation**: Different teams can use different major/minor versions of the SDK package independently + - **CDN Version Independence**: Teams can load different CDN versions without conflicts or global state pollution + - **Semantic Versioning Compliance**: Patch/minor versions maintain backward compatibility; major versions may introduce breaking changes with comprehensive documentation and compatibility requirements + - **Independent Upgrade Paths**: Teams can upgrade SDK versions on their own schedule within semantic versioning constraints +- **Progressive Migration Support**: Teams transition from isolated to multi-tenant instances when ready, without forced coordination +- **Instance Isolation**: Teams need independent telemetry contexts without conflicts between versions or instances +- **Configuration Consistency**: When using shared SDK factory, teams inherit centralized enterprise defaults and apply team-specific overrides during SDK instance creation +- **Resource Efficiency**: Optimal resource sharing when teams use compatible multi-tenant instances, complete isolation when needed +- **Lifecycle Management**: Coordinated initialization and cleanup across components, SDK versions, and deployment modes + +#### **4. Emerging Web Standards and Runtime Diversity** +- **Multi-Runtime Requirements**: Support across diverse JavaScript execution environments beyond traditional browsers +- **Server-Side Partial Rendering (SSPR)**: Modern frameworks that blend server and client rendering requiring unified telemetry +- **Edge Computing Environments**: Cloudflare Workers, Vercel Edge Functions, Deno Deploy, and similar edge runtimes +- **Modern Web Frameworks**: Next.js App Router, SvelteKit, Nuxt.js, Remix, Astro, Qwik, and Solid Start +- **Hybrid Rendering Patterns**: Static Site Generation (SSG), Incremental Static Regeneration (ISR), and streaming SSR +- **Web Standards Evolution**: Web Workers, Service Workers, Shared Array Buffers, and emerging browser APIs + +**Key Framework Requirements:** + +The SDK must provide seamless integration across modern web frameworks and rendering patterns: + +- **Next.js App Router**: Support for Server Components and client-side hydration with correlation +- **SvelteKit Universal Rendering**: Server-side load tracking with client-side correlation +- **Remix Progressive Enhancement**: Loader/action telemetry with SSR and client interaction tracking +- **Edge Runtime Compatibility**: Native support for Cloudflare Workers, Vercel Edge Functions, and Deno Deploy +- **Framework-Agnostic APIs**: Consistent telemetry APIs that work across all supported frameworks + +**Universal Framework Patterns:** + +- **Astro Islands**: Partial hydration with telemetry correlation across islands +- **Qwik Resumability**: Fine-grained hydration tracking and performance monitoring +- **Solid Start**: Streaming SSR with progressive enhancement telemetry +- **Fresh (Deno)**: Island-based architecture with edge runtime support +- **Enhanced SPA Frameworks**: Angular Universal, Vue Nuxt.js, React Server Components + +### **OTelWebSdk Capabilities & Implementation Goals** + +The OTelWebSdk addresses these web application challenges through a comprehensive solution that delivers: + +#### **Primary Goals** + +1. **Full OpenTelemetry API Support for Tracing and Logs** + - Complete OpenTelemetry compatible tracing API implementation with web-optimized performance + - Full OpenTelemetry logs API support for structured logging in web applications + - Standards compliance for interoperability with OpenTelemetry ecosystem + - Web-optimized implementation for performance and bundle size + - Browser-specific adaptations without breaking OTel contracts + +2. **Enterprise Multi-Tenant Support** + - **BOTH multi-tenant instances AND version isolation are primary goals** - not alternatives but complementary capabilities + - **Strategic Migration Path**: Primary goal is migrating teams from isolated versions to efficient multi-tenant instances + - **Version Isolation Requirement**: MUST support complete version isolation so teams can use different versions independently + - **Multi-version coexistence**: Different SDK versions (NPM packages or CDN) operate without conflicts within the same environment + - **Semantic Versioning Compatibility**: Following semver - patch/minor versions maintain backward compatibility, major versions may introduce breaking changes + - **Breaking Change Documentation**: All breaking changes in major versions MUST be clearly documented with migration guidance + - **Centralized configuration management**: Works across both multi-tenant instances and isolated versions with team-specific customization + - **Progressive migration support**: Teams can migrate from isolated to multi-tenant instances when ready, without coordination requirements (API surface is either identical or very similar) + +3. **Dynamic Configuration Management** + - Runtime configuration updates without application restart (following existing ApplicationInsights dynamic configuration pattern) + - Includes updates for connection strings, feature flags, etc. + - Configuration validation with rollback on failure + - **Remote Feature Control**: Centralized feature flag management with real-time updates + - **Local Override Capabilities**: Local development and testing overrides for remote configurations + - **Feature A/B Testing**: Built-in experimentation framework for feature rollouts and testing + +4. **Complete Unload and Cleanup Support** + - Complete removal of SDK instances with all associated hooks and timers + - Comprehensive memory leak prevention and resource cleanup + - Graceful shutdown ensuring in-flight telemetry is properly handled + - Automatic cleanup of instrumentation hooks and DOM modifications (includes internal tagging and removal of any hooks) + +5. **Dynamic Instrumentation Loading and Unloading** + - Runtime plugin management + - Lazy loading of modules and instrumentation + - Individual unloading without SDK shutdown + - Provide Memory-efficient instrumentation disposal and garbage collection + - Coordinated module loading to prevent team conflicts (includes load and execution ordering) + - Performance framework available to enable instrumentations to monitor performance + - Granular control over instrumentation lifecycle and resource cleanup + +6. **Compliance** + - Privacy controls, GDPR compliance, data residency options + - Cookie and storage controls with configurable consent management + - Local/session storage usage controls and automatic cleanup + - Data retention policies and automated compliance reporting + - Configurable data scrubbing and anonymization capabilities + - Audit trails and governance controls for enterprise environments + +7. **Extensibility** + - Plugin architecture for custom telemetry scenarios + - Extensible processor and exporter framework + - Custom instrumentation hooks and lifecycle management + - Third-party integration support and adapter patterns + +8. **Performance Excellence** + - Minimal impact on application startup and runtime performance + - Optimized memory usage and garbage collection patterns + - Non-blocking telemetry collection with intelligent batching + - Performance budgets and real-time performance monitoring + - **Integrated Performance Framework**: Built on Application Insights `doPerf` and `IPerfManager` for automatic monitoring + - **CPU Time Limits**: Ensures telemetry operations don't block main thread or degrade user experience + - **Memory Thresholds**: Monitors and limits memory usage to prevent performance degradation + - Efficient resource utilization across shared SDK instances + - **Critical Timer Management Requirements**: + - **MUST NOT use interval timers**: No background timers that run continuously without purpose + - **On-Demand Timer Usage**: Timers MUST only be started when there is a specific action to perform + - **Automatic Timer Cleanup**: All timers MUST be stopped immediately when there is no pending work + - **Timer Coalescing**: Within each SDK instance, minimize timer count by coalescing multiple operations + - **Battery and CPU Conservation**: Prevents excessive CPU usage and allows runtime to sleep properly + - **Resource Impact Minimization**: Reduces background resource consumption and wake-up events + +9. **Lightweight Telemetry Processors** + - High-performance callback-based telemetry initializers designed for web application performance requirements + - Lightweight processors optimized for common scenarios without full OpenTelemetry processor overhead + - Simplified API specifically designed for telemetry enrichment, filtering, and sanitization + - Performance-optimized processing pipeline for web applications + - First-class SDK feature that provides Application Insights callback pattern while maintaining OpenTelemetry standards compliance + +#### **Secondary Goals** + +- **Developer Experience**: Type-safe APIs, comprehensive tooling, excellent debugging +- **Rich Web Telemetry**: Comprehensive browser performance monitoring beyond basic OTel +- **Migration Support**: Comprehensive migration tools and guides for existing telemetry solutions +- **Testing Excellence**: Extensive testing framework with performance validation and cross-browser support +- **Performance Optimization**: Advanced performance monitoring and optimization strategies +- **Cloud Integration**: Native Azure integration(s) while maintaining portability +- **Code Splitting for CDN Deployments**: For each released version, the CDN may provide variants that support dynamic imports to assist with page load time optimization - enabled by the foundational architectural goals of avoiding globals and using inversion of control patterns +- **Application Insights API Compatibility**: Backward compatibility support / shim to provide existing Application Insights API surface while processing through OTelWebSdk + - `trackPageView()`, `trackEvent()`, `trackException()`, `trackDependency()` methods, basically emitting events. + - Provides simplified Migration path for existing Application Insights implementations + +### **Benefits Over Standard OpenTelemetry SDK** + +#### **Performance Benefits** + +| Aspect | Standard OTel | OTelWebSdk | +|--------|---------------|------------| +| **Bundle Size** | Larger bundle size | Significantly smaller with web optimizations | +| **Tree Shaking** | Limited | Full support | +| **Browser Compatibility** | Modern browsers only | ES2020+ with graceful detection/fallback | +| **Memory Usage** | Backend-optimized | SPA-optimized with support for leak prevention and detection | +| **Network Efficiency** | General purpose | Web-optimized batching/compression | + +#### **Web Telemetry Benefits** + +- **Page View Tracking**: Automatic SPA route change detection and traditional page load monitoring (not available in core OTel) +- **Core Web Vitals Integration**: Built-in Largest Contentful Paint (LCP), First Input Delay (FID), Cumulative Layout Shift (CLS) tracking +- **Browser Performance Metrics**: Navigation Timing API, Resource Timing API, Paint Timing API integration +- **User Experience Monitoring**: User interaction tracking, session management, device context collection +- **Client-Side Error Tracking**: Comprehensive unhandled exception capture, promise rejection monitoring, console error tracking +- **Real User Monitoring (RUM)**: Actual user performance measurement vs synthetic monitoring capabilities +- **Automatic Anonymous Session Management**: Intelligent session boundary detection, anonymous session ID generation, and cross-tab session correlation +- **Anonymous User Tracking**: Privacy-compliant user identification across sessions without PII collection +- **Multi-Runtime Support**: Native support for edge computing (Cloudflare Workers, Vercel Edge, Deno Deploy) +- **Modern Framework Integration**: Purpose-built support for Next.js, SvelteKit, Remix, Astro, Qwik, and other modern frameworks +- **Server-Side Rendering Correlation**: Seamless telemetry correlation between SSR and client-side hydration +- **Progressive Web App Support**: Service Worker integration, offline telemetry queuing, background sync capabilities +- **Memory Leak Prevention**: SPA-optimized memory management with automatic cleanup for long-running applications +- **Bundle Size Optimization**: Tree-shaking support and modular architecture for minimal production bundles +- **Browser Compatibility Range**: ES2020+ runtime with ES5-compatible loader for graceful degradation +- **Network-Aware Batching**: Intelligent telemetry batching optimized for mobile and low-bandwidth scenarios +- **Frame Budget Awareness**: 5ms frame budget compliance to prevent UI jank and maintain 60fps performance +- **Dynamic Configuration**: Runtime configuration updates without application restart (hot reload capabilities) +- **Remote Feature Management**: Centralized feature flag control with local override capabilities for development +- **Feature A/B Testing Framework**: Built-in experimentation support for gradual feature rollouts and testing +- **Complete Unload Support**: Full SDK cleanup and resource deallocation for dynamic loading scenarios +- **Multi-Team Isolation**: Independent telemetry contexts for multiple teams without global state conflicts +- **Version Coexistence**: Different SDK versions can operate simultaneously during migration periods +- **Enterprise Compliance**: Built-in GDPR compliance, data residency controls, and audit trail capabilities +- **Cookie and Storage Controls**: Configurable consent management, storage usage controls, and automatic cleanup +- **Privacy-First Design**: Anonymous session/user tracking without PII collection, configurable data anonymization +- **Cross-Browser Consistency**: Unified behavior across all supported browsers/runtimes with automatic feature detection +- **Application Insights Compatibility**: Migration path from existing Application Insights implementations through OpenTelemetry + +#### **Multi-Tenant Architecture Benefits** + +The OTelWebSdk supports **two distinct patterns** for multi-tenant scenarios (where "tenants" typically represent different teams, business units, or applications): + +##### **Pattern 1: Multi-Tenant SDK (Same Version, Isolated Tenants)** +Multiple tenants share the same SDK version but with completely isolated configurations and contexts. + +1. **Shared Infrastructure, Isolated Configuration** + - Tenants share the same SDK version and core infrastructure + - Each tenant gets isolated telemetry contexts, processors, and exporters + - Independent tenant-specific configuration with enterprise defaults + - Efficient resource sharing (timers, HTTP clients) with logical isolation + +2. **Configuration Independence per Tenant** + - Tenant-specific configuration with no shared defaults or overrides + - Independent feature flag management and A/B testing per tenant + - Separate compliance and privacy controls per tenant instance + - Per-tenant performance budgets and resource allocation + +3. **Error Containment Between Tenants** + - SDK failures in one tenant's context don't affect other tenants + - Independent error handling and recovery mechanisms per tenant + - Isolated plugin/instrumentation failures with graceful degradation + - Per-tenant health monitoring and diagnostics + +##### **Pattern 2: Multi-Version Coexistence (Different SDK Versions)** +Different teams can use completely different SDK versions simultaneously without conflicts. + +1. **Version Safety and Compatibility** + - Multiple SDK versions can coexist safely within the same application + - No shared global state or configuration between different versions + - Teams control their own upgrade schedules without coordination requirements + - Complete independence between SDK versions + +2. **Independent Migration Paths** + - Tenants can migrate independently without application-wide coordination + - Safe development and testing with production instances running concurrently + - Gradual rollout capabilities with instant fallback options + - Zero-downtime migration path from legacy to modern telemetry + +##### **Shared Benefits (Both Patterns)** + +- **Semantic Versioning Support**: Semver compatibility maintained across all deployment patterns + - Patch/minor versions maintain backward compatibility + - Breaking changes clearly documented in major versions with migration guidance + - Teams can stay on stable versions while others adopt newer versions +- **Efficient Timer Management**: Critical performance optimization for client-side environments + - No interval timers running continuously in the background + - On-demand timer usage only when actions are pending + - Automatic cleanup when no work is queued + - Timer coalescing within each SDK instance to minimize resource usage + - Battery and CPU conservation through proper sleep/wake behavior +- **Minimal Resource Overhead**: Efficient resource sharing where possible (timers, HTTP clients, event listeners) +- **Performance Optimization**: Single telemetry pipeline with isolated processing contexts +- **Enterprise Governance**: Centralized policy enforcement with tenant flexibility + +### **Architecture Comparison** + +| Traditional SDK | Multi-Tenant (Same Version) | Multi-Version Coexistence | +|-----------------|----------------------------|--------------------------| +| Single configuration | Per-tenant configuration with shared infrastructure | Completely independent configurations | +| Version coupling | Same version, isolated tenants | Independent version management | +| Shared failure points | Tenant-isolated fault domains | Version-isolated fault domains | +| Global state conflicts | Clean tenant isolation | Complete version isolation | +| Difficult testing | Safe concurrent tenant testing | Safe concurrent version testing | +| Coordinated upgrades | Independent tenant configuration | Independent version migration | + +### **Usage Examples** + +The OTelWebSdk provides multiple usage patterns for different scenarios: + +**Key Usage Patterns:** +- **Basic Instance Creation**: Simple SDK instantiation with isolated configuration +- **Multi-Team Coordination**: Teams working independently with shared resource optimization +- **Instance Isolation**: Complete separation of telemetry contexts and configurations +- **OpenTelemetry + Application Insights**: Combined API usage for comprehensive telemetry +- **Advanced Multi-Instance**: Complex enterprise scenarios with multiple factories +- **Legacy Compatibility**: Backward-compatible patterns for existing applications + +**Benefits Demonstrated:** +- **Instance Isolation**: Teams get independent configurations without global conflicts +- **Multi-Team Support**: Teams can work independently while sharing optimized resources +- **Dynamic Management**: Runtime configuration and lifecycle management +- **Comprehensive Telemetry**: OpenTelemetry standard with Application Insights extensions + +For comprehensive usage examples including code samples, multi-team patterns, and configuration options, see **[OTelWebSdk-UsageExamples.md](./specs/OTelWebSdk-UsageExamples.md)**. + +### **When to Use OTelWebSdk vs Standard OpenTelemetry** + +#### **Use OTelWebSdk When:** +- Building web applications (SPAs, PWAs, static sites) +- **Dynamic Configuration Management**: Need runtime configuration updates without restart +- **Complete Unload Support**: Require full SDK cleanup and memory leak prevention +- **Dynamic Instrumentation Loading**: Want runtime plugin management and A/B testing +- **Multi-Tenant Architecture**: Multiple teams need shared SDK instances with isolation +- **Enterprise Compliance Controls**: Need GDPR compliance, data residency, audit trails +- **Web-Optimized Performance**: Require bundle size optimization and browser-specific adaptations +- **Rich Browser Telemetry**: Want Core Web Vitals, user interactions, session management beyond basic OTel +- **High-Level Delivery Guarantees**: Need built-in reliability patterns beyond OpenTelemetry priorities: +- **Guaranteed Telemetry Delivery**: Automatic retry logic with exponential backoff and circuit breaker patterns +- **Data Integrity Assurance**: Built-in validation, sanitization, and corruption detection for telemetry data +- **Reliable Batching**: Smart batching algorithms that guarantee optimal payload sizes and delivery timing +- **Graceful Degradation**: Automatic fallback mechanisms when primary telemetry endpoints fail +- **Resource Protection**: Built-in throttling and resource management to prevent telemetry from impacting application performance +- **Session Continuity**: Persistent telemetry state across page reloads and navigation in SPAs +- **Network Resilience**: Automatic adaptation to varying network conditions and offline scenarios +- Using or migrating from Application Insights with OpenTelemetry compatibility + +#### **Use Standard OpenTelemetry When:** +- Building backend services or APIs +- Working in non-browser environments (Node.js servers) +- Have simple telemetry requirements with basic "best effort" delivery (don't need guaranteed delivery) +- Comfortable with manual implementation of reliability patterns +- Don't need web-specific optimizations or enterprise-grade delivery guarantees +- Want to avoid any vendor-specific extensions +- Working in environments where basic telemetry export is sufficient +- Standalone simple constrained runtime requirements + +## Browser Compatibility Strategy + +The OTelWebSdk implements a comprehensive browser compatibility strategy that balances modern JavaScript features with broad browser support through intelligent detection and graceful degradation. + +### **Runtime Requirements** + +- **Primary Target**: **ES2020+** for main SDK functionality + - Modern async/await, optional chaining (`?.`), nullish coalescing (`??`) + - Dynamic imports, modules, and advanced JavaScript features + - Optimal performance and developer experience + +- **Loader Compatibility**: **ES5-compatible** detection and bootstrapping + - Universal parsing and execution across all browsers + - Safe feature detection without syntax errors + - Graceful fallback mechanism activation + +### **Browser Support Matrix** + +| Browser Category | Language Support | SDK Functionality | Fallback Behavior | +|------------------|------------------|-------------------|-------------------| +| **Modern Browsers** | ES2020+ | Full SDK with all features | N/A | +| **Legacy Browsers** | ES2015+ | Loader only (detection and fallback) | Graceful degradation, optional basic tracking | +| **Unsupported Legacy** | < ES2015 | None | Silent failure or basic image beacon tracking | + +**Specific Browser Support:** +- **Chrome**: 80+ (full SDK), 60+ (loader only), < 60 (graceful degradation) +- **Firefox**: 75+ (full SDK), 60+ (loader only), < 60 (graceful degradation) +- **Safari**: 13+ (full SDK), 11+ (loader only), < 11 (graceful degradation) +- **Edge**: 80+ (full SDK), 18+ (loader only), < 18 (graceful degradation) +- **Internet Explorer**: 11 (graceful degradation only) + +### **Runtime Environment Detection** + +#### **Comprehensive Environment Support** +- **Browser Main Thread**: Full functionality with web-specific optimizations +- **Web Workers**: Core telemetry with limited DOM access +- **Service Workers**: Background telemetry with offline capability +- **Node.js SSR**: Server-side rendering support with conditional browser features +- **Edge Runtimes**: Cloudflare Workers, Vercel Edge, Deno Deploy compatibility + +### **Graceful Degradation Patterns** + +#### **Feature-Based Fallbacks** +- **Modern APIs → Legacy APIs**: fetch → XMLHttpRequest → image beacon +- **Advanced Timing → Basic Timing**: PerformanceObserver → performance.timing → Date.now() +- **Rich Context → Simple Context**: Structured logging → basic console logging +- **Complex Processing → Simple Processing**: Full processors → basic initializers + +#### **Polyfill Strategy** +- **Internal Polyfills**: Hidden implementations for missing features +- **No Global Pollution**: SDK never modifies global prototypes or window objects +- **Bundle Size Conscious**: Only essential polyfills included per variant + +### **Development and Testing** + +#### **Bundle Size Optimization** +- **Tree Shaking**: Full dead code elimination for unused features +- **Code Splitting**: Dynamic imports for optional functionality +- **Compression**: Optimal minification and gzip compression +- **Size Budgets**: Automated bundle size monitoring and limits + +### **Migration and Upgrade Path** + +- **Backward Compatibility**: Maintained within semantic versioning +- **Progressive Enhancement**: Gradual adoption of newer browser features +- **Future-Proof**: Architecture ready for emerging web standards +- **Deprecation Policy**: Clear timeline for legacy browser support removal + +This browser compatibility strategy ensures the OTelWebSdk can run effectively across the diverse web ecosystem while providing optimal performance and features for modern browsers. + +## Architecture Overview + +The OTelWebSdk implements a modern, scalable architecture emphasizing: + +- **Interface-First Design**: All components implement well-defined TypeScript interfaces with no concrete dependencies +- **Inversion of Control (IoC)**: Complete dependency injection pattern with no global state or singletons +- **Factory-Based Instantiation**: Controlled SDK lifecycle through factory functions +- **Resource Sharing**: Efficient coordination while maintaining instance isolation +- **OpenTelemetry Compliance**: Full adherence to OpenTelemetry specifications +- **Explicit Dependency Management**: All dependencies injected through constructors, enabling testability and modularity + +For comprehensive architectural details including IoC principles, dependency injection patterns, component interactions, multi-instance factory design, and implementation guidelines, see **[OTelWebSdk-Architecture.md](./specs/OTelWebSdk-Architecture.md)**. + +### Anti-Patterns to Avoid + +#### **CRITICAL: Never Import OpenTelemetry Packages Directly** + +**The SDK MUST NOT import or depend on `@opentelemetry/*` packages** as many have side effects that automatically register global implementations, breaking the interface-first/factory function approach. These packages often cause: + +- **Global provider registration** that conflicts with multi-instance isolation +- **Automatic instrumentation setup** that prevents controlled lifecycle management +- **Static configuration loading** that breaks dynamic configuration capabilities +- **Side effect execution at module load time** that interferes with graceful degradation + +**Correct Approach:** The SDK must define its own **interface-only** contracts that are compatible with OpenTelemetry specifications through duck typing. This enables OpenTelemetry ecosystem compatibility while maintaining: + +- **Prevents Global State**: Avoids automatic global provider registration +- **Enables Multi-Instance**: Multiple SDK instances can coexist without conflicts +- **Maintains Isolation**: Teams get truly isolated telemetry contexts +- **Interface Compatibility**: Still works with OpenTelemetry ecosystem via duck typing +- **Bundle Size**: Avoids pulling in unused OpenTelemetry infrastructure +- **Tree-Shaking**: Ensures dead code elimination works properly + +### Key Implementation Principles + +The SDK implementation follows specific architectural patterns detailed in the [Architecture document](./specs/OTelWebSdk-Architecture.md): + +- **Interface-First Design**: TypeScript interfaces define all public contracts +- **Factory Function Pattern**: Controlled instantiation with dependency injection +- **IoC Pattern**: No global state, explicit dependency management +- **Closure-Based Implementation**: For bundle size optimization and true private state management +- **High-Performance Architecture**: Minimal overhead design with advanced batching, resource management, and bundle optimization +- **Modular Architecture**: Tree-shakable, plugin-based extensibility + +## Application Insights SDK Requirements + +Building on OpenTelemetry standards, the SDK inherits and extends the proven requirements from the Microsoft Application Insights JavaScript SDK to ensure enterprise-grade reliability and functionality: + +### 7. Enterprise Multi-Team Support +- **Dual Primary Goals**: Both multi-tenant instance efficiency AND complete version isolation are equally critical requirements +- **Progressive Migration Strategy**: Primary goal is enabling teams to migrate from isolated versions to multi-tenant instances for optimal resource usage when ready, without forced coordination or version compatibility constraints +- **Complete Team and Instance Isolation**: Independent SDK instances per team with isolated configuration and telemetry contexts that prevent team conflicts across SDK versions and deployment modes - each team gets completely isolated telemetry contexts through dedicated SDK instances and configurations +- **Complete Version Isolation Support**: Enterprise teams have different release cycles, upgrade timelines, and risk tolerances - therefore the SDK MUST support complete isolation across all deployment scenarios: teams using different NPM package versions, teams loading different CDN versions, or mixed deployments where some teams use NPM while others use CDN. This isolation prevents any team's SDK version choice from breaking other teams' implementations. +- **Semantic Versioning Compliance**: Backward compatibility must be maintained within major versions; breaking changes allowed in major version increments with comprehensive documentation and compatibility matrix for components +- **Resource Sharing Optimization**: Efficient sharing of connections, timers, and processing resources occurs only between SDK instances created by the same shared factory - different SDK versions or separate factory instances operate in complete isolation and never share resources, ensuring perfect version isolation without reliance on globals +- **Configuration Inheritance**: When multiple SDK instances are created from the same shared factory, child instances inherit the factory's default configuration settings (unless explicitly overridden during instance creation) - this enables consistent baseline configuration across teams using the same factory while allowing team-specific customization + +### 8. Bundle Size Optimization +- **Tree-Shaking Friendly**: Full support for dead code elimination with no side effects +- **Modular Loading**: Individual components can be imported to minimize bundle size +- **Compression Optimization**: Optimal minification and compression for production builds +- **Size Validation**: Unit tests that enforce bundle size limits and prevent size regressions in CI/CD pipeline + +### 9. Universal JavaScript Runtime Support +- **Platform Detection**: Automatic detection of runtime environment (Browser, Web Worker, SSR, Edge) with graceful degradation for unsupported features - the SDK MUST NOT install global polyfills as this can break end-user applications that have their own polyfills or rely on missing functionality for their own detection logic +- **API Adaptation**: Graceful use of modern browser APIs with feature detection and fallbacks - SDK uses internal polyfills hidden by internal function aliases that support bundle size optimization and code minification while never modifying global scope +- **Browser Requirements**: Main SDK will target ES2020+ language support (via TypeScript) but will provide graceful fallbacks internally +- **Emerging Runtime Support**: Compatibility with modern frameworks (Next.js, SvelteKit, Remix, Nuxt.js, Astro, Qwik, Solid Start), server-side rendering patterns (React Server Components, SSG, ISR, streaming SSR), edge runtimes (Cloudflare Workers, Vercel Edge Functions, Deno Deploy, Bun), and hybrid rendering patterns will be provided by Platform Detection and API Adaptation techniques without requiring runtime-specific code +- **Web Worker Support**: Complete functionality in Web Worker and Service Worker environments + +## Dynamic Instrumentation Management + +The OTelWebSdk provides comprehensive support for dynamic instrumentation loading and unloading, enabling teams to manage telemetry collection granularly without impacting the entire SDK lifecycle. + +**Key Capabilities:** +- **Runtime Loading**: Load individual instrumentations at runtime with configuration options +- **Hot-Swap Support**: Replace instrumentations for A/B testing without SDK restart +- **Resource Cleanup**: Complete instrumentation unloading with proper memory management +- **Third-Party Support**: Registration and management of custom instrumentations +- **Performance Monitoring**: Optional Application Insights performance framework integration + +**Benefits:** +- Zero-downtime updates and safe experimentation +- Performance optimization through selective loading/unloading +- Memory efficiency and error isolation +- Team independence and third-party ecosystem support + +For detailed instrumentation management, see **[OTelWebSdk-Instrumentation.md](./specs/OTelWebSdk-Instrumentation.md)**. + +## Lightweight Telemetry Processors + +The OTelWebSdk prioritizes **OpenTelemetry standards compliance** while providing **lightweight telemetry initializers** as a first-class SDK feature designed for performance-optimized telemetry processing scenarios. + +### OpenTelemetry Standards First + +The SDK implements full **OpenTelemetry processor interfaces** (`SpanProcessor`, `LogRecordProcessor`, `MetricProcessor`) as the primary telemetry processing mechanism, ensuring: + +- **Standards Compliance**: Full compatibility with OpenTelemetry ecosystem +- **Interoperability**: Works with any OpenTelemetry-compatible tooling and exporters +- **Future-Proof**: Aligned with evolving OpenTelemetry standards and best practices +- **Ecosystem Integration**: Seamless integration with OpenTelemetry instrumentation libraries + +### Lightweight Telemetry Initializers + +As a core SDK feature, the OTelWebSdk provides **callback-based telemetry initializers** specifically designed for high-performance, low-overhead telemetry processing scenarios: + +- **Performance Optimized**: ~1-2μs per telemetry item vs ~5-10μs for full OpenTelemetry processors +- **Bundle Size Efficient**: ~0.5KB vs 2-3KB per processor for simple scenarios +- **Low Overhead**: Simple callback functions ideal for data enrichment and filtering +- **Purpose-Built**: Designed specifically for common web application telemetry processing needs +- **Migration Compatible**: Provides familiar patterns for Application Insights users transitioning to OpenTelemetry + +This lightweight processing approach is a strategic SDK feature that addresses the unique performance requirements of web applications while maintaining full OpenTelemetry standards compliance. + +For comprehensive documentation on telemetry processing capabilities, implementation patterns, performance comparisons, and migration guidance, see: + +**[Telemetry Initializers Specification](./specs/OTelWebSdk-TelemetryInitializers.md)** + +### Processing Architecture + +The SDK supports both processing approaches with proper prioritization: + +```typescript +// 1. OpenTelemetry processors (recommended for complex processing) +sdk.addSpanProcessor(new BatchSpanProcessor(exporter)); +sdk.addLogRecordProcessor(new SimpleLogRecordProcessor(exporter)); + +// 2. Lightweight telemetry initializers (optimized for performance-critical scenarios) +sdk.addTelemetryInitializer((item) => { + item.data.customProperties['app.version'] = '2.1.0'; + return true; +}); +``` + +The telemetry processing pipeline executes **OpenTelemetry processors first**, followed by lightweight telemetry initializers, ensuring standards compliance while providing performance-optimized processing for common scenarios. + +## Multi-Instance SDK Factory Implementation + +### Core Factory Interface + +The multi-instance SDK factory is implemented through a sophisticated interface that coordinates multiple SDK instances while optimizing resource usage. + +For complete interface definitions, see [OTelWebSdk-Interfaces.md](./specs/OTelWebSdk-Interfaces.md). + + +### Factory Access Patterns + +The SDK provides multiple factory access patterns to support different deployment scenarios: + +- **Synchronous Factory Access**: For immediate SDK access +- **Asynchronous Factory Access**: For CDN loading +- **Named Factory Pattern**: For multi-project scenarios +- **Direct Factory Creation**: For custom scenarios + +### Factory Loading Patterns + +**CRITICAL: Factory Availability and Callback Requirements** + +⚠️ **The `createSdkFactory()` function return value is ONLY immediately available for NPM package imports. For all CDN loading scenarios, you MUST use callbacks.** + +| Loading Method | Synchronous Access | Pattern Required | +|----------------|-------------------|------------------| +| **NPM Package** | ✅ `const factory = createSdkFactory()` | Direct assignment valid | +| **CDN Script Tag** | ❌ Return may be null/undefined | **MUST use callbacks** | +| **CDN Dynamic Loading** | ❌ Return may be null/undefined | **MUST use callbacks** | +| **Mixed CDN/NPM** | ❌ Return may be null/undefined | **MUST use callbacks** | + +**Why Callbacks Are Required for CDN:** +- The loader loads first, then dynamically loads the actual SDK +- `createSdkFactory()` returns `null`/`undefined` until SDK is fully loaded +- Only the `onInit` callback guarantees a valid factory instance + +The factory supports both synchronous and asynchronous loading patterns for different deployment scenarios. + +#### **Factory Access Patterns** + +```typescript +// NPM usage - only valid when SDK is imported via NPM package +// Note: This ONLY works with npm packages, NOT with CDN loading +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const factory = createSdkFactory(); // Safe with NPM - factory and SDK is immediately available +console.log(`Using SDK version: ${factory.info.ver}`); + +// CDN usage with callbacks (REQUIRED pattern for CDN loading) +// Note: The loader provides createSdkFactory() which can then load the actual SDK +// NEVER rely on return value - always use callbacks for CDN scenarios +createSdkFactory({ + src: 'https://js.monitor.azure.com/scripts/otel/otel-web-sdk.min.js', + onInit: (factory) => { + console.log(`SDK loaded successfully, version: ${factory.info.ver}`); + // Initialize SDK here since synchronous call may return null/undefined + const sdk = factory.newInst('my-app', config); + sdk.initialize(); + }, + onError: (error) => { + console.error('Failed to load SDK:', error.message); + // Handle error or fallback logic here + } +}); +// Return value ignored - may be null/undefined during CDN loading +``` + +#### **Asynchronous Factory Access** + +```typescript +// Async loading for CDN scenarios +// Note: The loader provides createSdkFactoryAsync() which can then load the actual SDK (The loader is available as an npm package, direct script or from the CDN) + +const factory = await createSdkFactoryAsync(); + +// Load from specific CDN URL +const factory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/otel-web-sdk.min.js', + timeout: 10000 // 10 second timeout +}); + +// Handle loading failures +try { + const factory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/v2.0.0/otel-web-sdk.js' + }); +} catch (error) { + console.error('Failed to load from CDN:', error.message); +} +``` + +#### **CDN Loading with Callbacks** + +```typescript +// Load from CDN with callbacks (returns promise) +// Note: The loader provides createSdkFactory() and createSdkFactoryAsync() which can then load the actual SDK (The loader is available as an npm package, direct script or from the CDN) + +const factory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/otel-web-sdk.min.js', + onInit: (factory) => { + console.log(`SDK loaded successfully, version: ${factory.info.ver}`); + }, + onError: (error) => { + console.error('Failed to load SDK:', error.message); + } +}); + +// Load from CDN with callbacks only (synchronous if already loaded) +const factory = createSdkFactory({ + src: 'https://js.monitor.azure.com/scripts/otel/otel-web-sdk.min.js', + onInit: (factory) => { + console.log(`SDK loaded successfully, version: ${factory.info.ver}`); + // Initialize SDK here + const sdk = factory.newInst('my-app', config); + sdk.initialize(); + }, + onError: (error) => { + console.error('Failed to load SDK:', error.message); + // Handle error or fallback logic here + } +}); +// Returns factory immediately if already loaded, null/undefined if still loading or unavailable +// Note: For NPM usage or when SDK source is not available, may return null/undefined +// Recommended to use onInit callback for reliable initialization +if (factory) { + console.log(`Factory available immediately: ${factory.info.ver}`); +} +``` + +#### **Multi-Instance with Different Sources** + +```typescript +// Application with multiple SDK sources +// Note: The loader provides createSdkFactory() and createSdkFactoryAsync() which can then load the actual SDK + +const legacyFactory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/v1.4/otel-web-sdk.js' +}); + +const modernFactory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/latest/otel-web-sdk.js' +}); + +// Each factory manages its own instances with complete isolation +const legacySDK = legacyFactory.newInst('legacy-component', legacyConfig); +const modernSDK = modernFactory.newInst('modern-component', modernConfig); + +// Factories loaded from different sources operate independently +console.log(`Legacy SDK version: ${legacyFactory.info.ver}`); +console.log(`Modern SDK version: ${modernFactory.info.ver}`); +``` + +#### **Multi-Version Coordination** + +```typescript +// Application with multiple SDK sources +const legacyFactory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/v1.4/otel-web-sdk.js' +}); + +const modernFactory = await createSdkFactoryAsync({ + src: 'https://js.monitor.azure.com/scripts/otel/latest/otel-web-sdk.js' +}); + +// Each factory manages its own instances with complete isolation +const legacySDK = legacyFactory.newInst('legacy-component', legacyConfig); +const modernSDK = modernFactory.newInst('modern-component', modernConfig); + +// Factories loaded from different sources operate independently +console.log(`Legacy SDK version: ${legacyFactory.info.ver}`); +console.log(`Modern SDK version: ${modernFactory.info.ver}`); +``` + +#### **Build Information Access** + +```typescript +// NPM usage - synchronous access to SDK and build information +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const factory = createSdkFactory(); // Valid: NPM ensures immediate availability (when use the SDK package) +const info = factory.info; + +console.log({ + id: info.id, // "factory-abc123" - Unique factory instance ID + ver: info.ver, // "1.5.2" - SDK version + loadMethod: info.loadMethod // "npm" | "cdn" | "dynamic" - How factory was created +}); + +// CDN usage - access build information in callback (assumes the loader has been loaded and is available via window (loaded from the CDN)) +window.createSdkFactory({ + onInit: (factory) => { + const info = factory.info; + console.log({ + id: info.id, + ver: info.ver, + loadMethod: info.loadMethod + }); + } +}); +``` + +### Instance Creation and Configuration + +```typescript +interface IOTelWebSdk { + // OpenTelemetry API Accessors (Primary) + readonly trace: TraceAPI; + readonly logs: LogsAPI; + readonly metrics: MetricsAPI; // Basic metrics support - simple counters, histograms, gauges + readonly context: ContextAPI; + readonly diag: DiagAPI; + + // Telemetry Processing (OpenTelemetry + Lightweight) + addTelemetryInitializer(initializer: ITelemetryInitializer): void; + removeTelemetryInitializer(initializer: ITelemetryInitializer): void; + addSpanProcessor(processor: SpanProcessor): void; + addLogRecordProcessor(processor: LogRecordProcessor): void; + + // Instrumentation Management + getInstrumentationManager(): IInstrumentationManager; + + // Lifecycle Management + initialize(): Promise; + unload(onDone?: (result: IUnloadResult) => void, timeoutMs?: number): Promise; + flush(): Promise; + + // Instance Information + getInstanceName(): string; + getConfiguration(): IOTelWebSDKConfig; + isInitialized(): boolean; + + // Application Insights Compatibility Layer (Secondary/Optional) + // These are implemented as OpenTelemetry instrumentations + trackPageView?(pageView: IPageViewTelemetry): void; + trackEvent?(event: IEventTelemetry): void; + trackException?(exception: IExceptionTelemetry): void; + trackDependency?(dependency: IDependencyTelemetry): void; + trackMetric?(metric: IMetricTelemetry): void; + trackTrace?(trace: ITraceTelemetry): void; + + // Legacy Context Configuration (Optional) + setUserContext?(user: IUserContext): void; + setSessionContext?(session: ISessionContext): void; + setDeviceContext?(device: IDeviceContext): void; + + // Legacy Auto-tracking (Optional) + enableAutoTracking?(config?: IAutoTrackingConfig): void; + disableAutoTracking?(): void; + startTrackingUserAction?(name: string): IUserActionTimer; +} +``` + +### Multi-Manager Support + +For complex enterprise scenarios, the SDK supports multiple named managers: + +```typescript +// NPM usage - synchronous factory access is valid +import { createSdkFactory } from '@microsoft/otel-web-sdk'; + +// Default factory (singleton) +const defaultFactory = createSdkFactory(); // Valid: NPM ensures immediate availability + +// Named factories for different projects/environments +const productionFactory = createSdkFactory('production'); // Valid: NPM usage +const stagingFactory = createSdkFactory('staging'); // Valid: NPM usage +const developmentFactory = createSdkFactory('development'); // Valid: NPM usage + +// Each factory operates independently with its own configuration +console.log(`Production factory: ${productionFactory.info.id}`); +console.log(`Staging factory: ${stagingFactory.info.id}`); + +// Independent lifecycle management +await productionFactory.initializeAllInstances(); +await stagingFactory.unloadAllInstances(); +``` + +### Shared Resource Implementation + +```typescript +class SharedExportQueue implements IExportQueue { + private _queue: ITelemetryItem[] = []; + private _maxSize: number; + private _batchTimer: ITimerHandle | null = null; + private _processing: boolean = false; + + constructor(maxSize: number = 1000) { + this._maxSize = maxSize; + } + + enqueue(item: ITelemetryItem): void { + if (this._queue.length >= this._maxSize) { + // Drop oldest items or apply sampling + this._queue.shift(); + } + + this._queue.push(item); + this._scheduleBatch(); + } + + async flush(): Promise { + if (this._processing || this._queue.length === 0) { + return; + } + + this._processing = true; + const batch = this._queue.splice(0, 100); // Process in batches + + try { + await this._processBatch(batch); + } finally { + this._processing = false; + + // Continue processing if more items arrived + if (this._queue.length > 0) { + this._scheduleBatch(); + } + } + } + + private _scheduleBatch(): void { + if (this._batchTimer) return; + + this._batchTimer = setTimeout(() => { + this._batchTimer = null; + this.flush(); + }, 5000); // 5-second batching interval + } + + private async _processBatch(items: ITelemetryItem[]): Promise { + // Group items by endpoint and send efficiently + const groups = this._groupByEndpoint(items); + + const exportPromises = Array.from(groups.entries()).map(([endpoint, groupItems]) => + this._exportToEndpoint(endpoint, groupItems) + ); + + await Promise.all(exportPromises); + } +} + +class SharedConnectionPool implements IConnectionPool { + private _connections: Map = new Map(); + private _maxConnections: number; + + constructor(maxConnections: number = 2) { + this._maxConnections = maxConnections; + } + + getConnection(endpoint: string): IConnection { + if (!this._connections.has(endpoint)) { + if (this._connections.size >= this._maxConnections) { + // Reuse least recently used connection + const lruEndpoint = this._findLRUConnection(); + this._connections.delete(lruEndpoint); + } + + this._connections.set(endpoint, this._createConnection(endpoint)); + } + + return this._connections.get(endpoint)!; + } + + private _createConnection(endpoint: string): IConnection { + return new OptimizedConnection(endpoint, { + keepAlive: true, + maxConcurrency: 6, + retryConfig: { + maxRetries: 3, + backoffMs: 1000 + } + }); + } +} +``` + +## Factory Setup and Distribution Strategies + +The OTelWebSdk factory requires different strategies for sharing and distribution depending on how teams consume the SDK. This section details how the factory handles NPM vs CDN loading scenarios and prevents conflicts. + +### **NPM Distribution: Isolated Factories** + +When teams import the SDK via NPM, each import creates an isolated factory instance by default. This ensures complete isolation and prevents accidental sharing between teams. + +#### **Default NPM Behavior: Isolated Factories** + +```typescript +// Team A's package - NPM import (synchronous access is valid) +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const teamAFactory = createSdkFactory(); // Valid: NPM ensures factory is immediately available + +// Team B's package - NPM import (synchronous access is valid) +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const teamBFactory = createSdkFactory(); // Valid: NPM ensures factory is immediately available +``` + +**Isolation Benefits:** +- No accidental interference between teams +- Independent versioning and configuration +- Clear ownership boundaries +- Simplified testing and development +- Complete adherence to "no globals" principle + +### **CDN Distribution: Loader-Based Loading** + +When loaded via CDN, teams load the SDK loader which then provides factory creation capabilities. The loader supports two distinct loading patterns. + +**Important URL Distinction:** +- **Script tags** load the **loader** (`sdkldr.min.js`) which provides `createSdkFactory()` and `createSdkFactoryAsync()` functions +- **Factory functions** load the **SDK** (`otel-web-sdk.min.js`) which provides the actual telemetry functionality + +**Loader Version Agnosticism:** +The loader itself is **version-agnostic** and can load any SDK version, including future releases. While the loader has a configured default SDK version (used when no `src` parameter is provided), the same loader can dynamically load different SDK versions by specifying the `src` parameter. This design enables: +- **Forward Compatibility**: Older loaders can load newer SDK versions +- **Flexible Deployment**: Teams can use stable loader versions while adopting newer SDK features +- **Migration Support**: Gradual rollout of new SDK versions without loader updates +- **Testing Scenarios**: Load specific SDK versions for testing or comparison + +The Loader itself is available +- via an npm package to allow direct usage +- as a stand-alone script (the same as the Application Insights SDK Loader is today) +- From the CDN by dropping a script tag on the page + +#### **CDN Loading Patterns** + +**Standard CDN Loading (Script Tag - Side effect of Global Registration):** +```html + + + +``` + +**Direct Loader Placement (Inline - No Global Registration):** +```html + +``` + +**Explicit Version Loading:** +```html + + + +``` + +#### **Global Registration Rules** + +**The ONLY case where global state is modified:** +- **Script Tag Loading**: When the loader is loaded via ` +// All teams use: window.createSdkFactory({ onInit: (factory) => { ... } }) + +// Option 2: Teams agree to use NPM with shared factory (valid for NPM) +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const sharedFactory = createSdkFactory(); // Valid: NPM ensures immediate availability +export { sharedFactory }; + +// Option 3: Teams use isolated NPM factories (default behavior for npm package) +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const teamFactory = createSdkFactory(); // Valid: NPM ensures immediate availability +``` +``` + +### **Factory Distribution Best Practices** + +#### **For Library Authors** + +```typescript +// Good: Use SDK factory provided by application +export function initializeMyLibrary(factory?: any) { + // Use provided factory or create isolated one (only valid for NPM usage) + const libFactory = factory || createSdkFactory(); // Valid: assumes NPM package import + return libFactory.createSDKInstance('my-library', myConfig); +} + +// Alternative: Let application decide factory strategy +export function initializeMyLibrary() { + // Create isolated factory for this library (only valid for NPM usage) + const factory = createSdkFactory(); // Valid: assumes NPM package import + return factory.createSDKInstance('my-library', myConfig); +} + +// CDN-aware library pattern (handles both NPM and CDN scenarios) +export function initializeMyLibrary(options?: { factory?: any }) { + if (options?.factory) { + // Use provided factory + return options.factory.createSDKInstance('my-library', myConfig); + } + + // For CDN scenarios, require application to provide factory + throw new Error('Factory required - use initializeMyLibrary({ factory })'); +} +``` + +#### **For Application Developers** + +```typescript +// Good: Explicit factory strategy decision + +// Strategy 1: CDN sharing - load once, all teams share (use callbacks) +// +// All teams use: window.createSdkFactory({ onInit: (factory) => { ... } }) + +// Strategy 2: NPM sharing - create once, export for teams +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const appFactory = createSdkFactory(); // Valid: NPM ensures immediate availability +export { appFactory as sharedTelemetryFactory }; + +// Strategy 3: NPM isolation - each team creates own factory +// Teams call: createSdkFactory() independently (valid for NPM usage) + +// Good: Document the chosen strategy for your application +``` + +### **Debugging Factory Issues** + +#### **Factory Inspection Tools** + +```typescript +// NPM usage - synchronous access is valid +import { createSdkFactory } from '@microsoft/otel-web-sdk'; +const factory = createSdkFactory(); // Valid: NPM ensures immediate availability +const info = factory.info; + +console.log({ + id: info.id, // Unique factory instance ID + ver: info.ver, // SDK version + loadMethod: info.loadMethod // 'npm', 'cdn', 'dynamic' +}); + +// For multiple NPM factories, inspect each one individually +const productionFactory = createSdkFactory('production'); // Valid: NPM usage +const stagingFactory = createSdkFactory('staging'); // Valid: NPM usage + +console.log('Production factory:', { + id: productionFactory.info.id, + version: productionFactory.info.ver, + loadMethod: productionFactory.info.loadMethod +}); + +console.log('Staging factory:', { + id: stagingFactory.info.id, + version: stagingFactory.info.ver, + loadMethod: stagingFactory.info.loadMethod +}); +``` + +#### **Common Issues and Solutions** + +| Issue | Symptom | Solution | +|-------|---------|----------| +| **Multiple Factories** | Potential duplicate telemetry, high resource usage | Compare factory IDs via `factory.info.id` to identify duplicates | +| **Version Conflicts** | Console warnings, inconsistent behavior | Check each factory's version via `factory.info.ver` | +| **Missing Telemetry** | Some teams not sending data | Verify factory configuration and SDK instance creation | +| **Load Method Issues** | Unexpected factory behavior | Check `factory.info.loadMethod` to understand how factory was created | + +This distribution strategy ensures teams can work independently when needed while providing clear paths for coordination and resource sharing when beneficial. + +For comprehensive multi-instance usage examples including multi-team patterns, advanced configurations, legacy compatibility, OpenTelemetry + Application Insights usage, and complete code samples, see **[OTelWebSdk-UsageExamples.md](./specs/OTelWebSdk-UsageExamples.md)**. + +## Benefits of Multi-Instance SDK Factory + +### 1. **Standards Compliance** +- Full OpenTelemetry API compatibility +- Interoperability with other OTel libraries +- Future-proof observability strategy + +### 2. **Application Insights Integration** +- Seamless translation to AI telemetry format +- Preserves all AI-specific features +- No migration required for existing AI users + +### 3. **Instance Management** +- Clean object-oriented lifecycle +- Automatic resource cleanup +- Multi-instance support for complex applications + +### 4. **Developer Experience** +- Single SDK for both OTel and AI APIs +- Type-safe interfaces throughout +- Comprehensive configuration options +- Rich web-specific extensions + +### 5. **Performance Optimized** +- Tree-shakable architecture +- Lazy initialization +- Efficient telemetry batching +- Minimal browser runtime overhead + +### 6. **Enterprise Ready** +- Multi-team instance isolation +- Shared resource optimization +- Configurable sampling and filtering +- Privacy-compliant data collection + + +## SDK Lifecycle Flow + +``` +Application Startup Manager Coordination SDK Instance Management + │ │ │ + │ 1. Import SDK │ │ + │ import { createSdkFactory } │ │ + │ │ │ + │ │ │ + │ 2. Get SDK Factory │ │ + │ const factory = │ │ + │ createSdkFactory('default') │ │ + ├─────────────────────────────▶│ │ + │ │ 3. Create/Get Factory │ + │ │ - Initialize shared resources │ + │ │ - Setup resource pools │ + │ │ - Configure timer management │ + │ │ │ + │ 4. Create SDK Instance │ │ + │ const sdk = factory.newInst() │ │ + ├─────────────────────────────▶│ │ + │ │ 5. Coordinate Instance Creation│ + │ ├─────────────────────────────▶│ + │ │ │ 6. Create Instance + │ │ │ - Validate configuration + │ │ │ - Register with manager + │ │ │ - Access shared resources + │ │ │ - Setup providers + │ │ │ + │ 7. Initialize SDK │ │ + │ await sdk.initialize() │ │ + ├─────────────────────────────▶│ │ + │ │ 8. Coordinate Initialization │ + │ ├─────────────────────────────▶│ + │ │ │ 9. Initialize Components + │ │ │ - Start processors + │ │ │ - Initialize auto-instruments + │ │ │ - Setup exporters + │ ◀────────────────────────────┤ 10. Report Ready │ - Register with shared pools + │ SDK initialized successfully │ ◀─────────────────────────────┤ + │ │ │ + │ 11. Use SDK │ │ + │ tracer = sdk.traceProvider. │ │ + │ getTracer() ├─────────────────────────────▶│ + │ logger = sdk.loggerProvider. │ │ 12. Runtime Operations + │ getLogger() │ │ - Create telemetry + │ meter = sdk.meterProvider. │ │ - Process through pipelines + │ getMeter() │ │ - Use shared resources + │ │ │ - Batch with other instances + │ │ │ + │ 13. Manager Operations │ │ + │ info = factory.info │ │ + │ instance = factory. │ │ + │ getInst('x') │ │ + │ │ 14. Coordinate Queries │ + │ │ - Query instance registry │ + │ │ - Return instance references │ + │ │ - Provide status information │ + │ │ │ +Application Shutdown Manager Coordination Instance Management + │ │ │ + │ 15. Shutdown Instances │ │ + │ await manager.shutdownAll() │ │ + ├─────────────────────────────▶│ │ + │ │ 16. Coordinate Shutdown │ + │ ├─────────────────────────────▶│ + │ │ │ 17. Instance Shutdown + │ │ │ - Flush pending telemetry + │ │ │ - Stop processors + │ │ │ - Close connections + │ │ │ - Unregister from manager + │ │ │ + │ │ 18. Manager Cleanup │ + │ │ - Shutdown shared resources │ + │ │ - Clear instance registry │ + │ │ - Close connection pools │ + │ ◀────────────────────────────┤ 19. Report Complete │ + │ All instances shutdown │ │ + │ │ │ + │ 20. Optional: Full Cleanup │ │ + │ await manager.unloadAll() │ │ + ├──────────────────────────────▶│ │ + │ │ 21. Complete Unload │ + │ │ - Remove all patches/hooks │ + │ │ - Clear all references │ + │ │ - Reset to pristine state │ + │◀─────────────────────────────┤ 22. Manager Unloaded │ + │ All SDK resources removed │ │ +``` + +## Implementation Documentation + +The OTelWebSDK implements the OpenTelemetry specification interfaces with a focus on explicit instance management: + +- **IOTelSdk**: Base interface implemented by OTelWebSdk following the OpenTelemetry specification +- **IOTelApi**: Core API interface supporting explicit context management as per OpenTelemetry standards +- **IOTelTraceApi**: Trace API implementation with non-global trace functionality following OpenTelemetry trace specification +- **IOTelContextManager**: Context management implementation with improved context propagation capabilities per OpenTelemetry context specification + +## Core Components and Architecture + +The OTelWebSdk consists of several core components working together to provide a complete observability solution, all following strict interface-first design principles and coordinated by the SDK Manager. + +For complete interface definitions including SDK Manager, Web SDK instances, and Provider components, see **[OTelWebSdk-Interfaces.md](./specs/OTelWebSdk-Interfaces.md)**. + +### Detailed Architecture and Design Patterns + +For comprehensive architectural details including: +- Component interaction diagrams +- Processing pipeline flows +- Detailed design principles and patterns +- Implementation examples with IoC patterns +- Multi-instance coordination mechanisms + +See the [Architecture Documentation](./specs/OTelWebSdk-Architecture.md). + +## Implementation Documentation + +### Operational Documentation + +- **[Testing Strategy](./specs/OTelWebSdk-Testing.md)** - Comprehensive testing approach including unit, integration, performance, and cross-browser testing +- **[Performance Strategy](./specs/OTelWebSdk-Performance.md)** - Performance optimization techniques, monitoring, and benchmarking +- **[Migration Guide](./specs/OTelWebSdk-Migration.md)** - Migration strategies and tools for existing telemetry solutions +- **[Interface Definitions](./specs/OTelWebSdk-Interfaces.md)** - Complete interface definitions and type contracts + +## Implementation Roadmap + +The OTelWebSdk follows a structured 6-phase implementation approach designed to deliver incremental value while building upon architectural foundations. + +For comprehensive implementation planning including detailed technical tasks, timelines, milestones, and immediate next steps, see **[OTelWebSdk-Implementation.md](./specs/OTelWebSdk-Implementation.md)**. + +### Quick Reference +- **Month 1**: Core SDK with basic tracing +- **Month 2**: Complete telemetry stack +- **Month 3**: Production-ready with Azure Monitor integration + +### Related Documentation +- **[Implementation Planning](./specs/OTelWebSdk-Implementation.md)** - Detailed phases, tasks, and timeline +- **[Architectural Design](./specs/OTelWebSdk-Architecture.md)** - IoC principles and technical requirements diff --git a/docs/planning/otel/specs/OTelWebSdk-Architecture.md b/docs/planning/otel/specs/OTelWebSdk-Architecture.md new file mode 100644 index 000000000..b019c9438 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Architecture.md @@ -0,0 +1,2123 @@ +# OpenTelemetry Web SDK Architecture + +## Overview + +The OpenTelemetry Web SDK is designed as a modern, modular implementation that follows the OpenTelemetry specification while providing enhanced flexibility and performance for web applications. It delivers a complete observability solution encompassing distributed tracing, structured logging, and basic metrics collection without relying on global singletons or static state. + +## Inversion of Control Architecture + +The SDK architecture strictly adheres to the Inversion of Control (IoC) pattern to eliminate dependencies on global state and enhance modularity: + +### Core IoC Principles + +1. **No Global State**: The SDK MUST NOT use any global variables, window properties, or static state +2. **Explicit Dependencies**: All dependencies are explicitly declared and injected through factory functions +3. **Constructor Injection**: Components receive their dependencies through their creation functions +4. **Interface-Based Design**: Components depend on interfaces, not concrete implementations +5. **Lifecycle Management**: Parent components manage the lifecycle of their dependencies + +### Dependency Injection Pattern + +```typescript +// Example: TraceProvider with explicit dependency injection +export function createTraceProvider(config: ITraceProviderConfig): IOTelTraceProvider { + // All dependencies explicitly provided in configuration + const { + resource, // Injected resource information + spanProcessors = [], // Injected span processors + spanExporters = [], // Injected span exporters + sampler, // Injected sampling strategy + idGenerator, // Injected ID generation strategy + contextManager, // Injected context manager + logger, // Injected logger + clock = Date, // Injected clock (for testing) + performanceNow = performance.now.bind(performance) // Injected timing + } = config; + + // Validate all required dependencies are provided + if (!resource) { + throw new Error("Resource must be provided to TraceProvider"); + } + if (!sampler) { + throw new Error("Sampler must be provided to TraceProvider"); + } + if (!contextManager) { + throw new Error("ContextManager must be provided to TraceProvider"); + } + + // Private closure variables - no global state + let _config = { ...config }; + let _tracers = new Map(); + let _isShutdown = false; + + let _self = {} as IOTelTraceProvider; + + // Define methods directly on the interface instance + _self.getTracer = (name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer => { + if (_isShutdown) { + logger.warn("TraceProvider is shutdown, returning no-op tracer"); + return createNoOpTracer(); + } + + const key = `${name}@${version || 'unknown'}`; + let tracer = _tracers.get(key); + + if (!tracer) { + // Inject dependencies into tracer creation + tracer = createTracer({ + name, + version, + resource, // Injected from provider config + spanProcessors, // Injected from provider config + sampler, // Injected from provider config + idGenerator, // Injected from provider config + contextManager, // Injected from provider config + logger, // Injected from provider config + clock, // Injected from provider config + performanceNow, // Injected from provider config + ...options + }); + _tracers.set(key, tracer); + } + + return tracer; + }; + + // All operations use injected dependencies, never global state + _self.shutdown = async (): Promise => { + if (_isShutdown) return; + + _isShutdown = true; + + // Shutdown injected processors + await Promise.all(spanProcessors.map(processor => + processor.shutdown().catch(err => + logger.error("Error shutting down processor", err) + ) + )); + + _tracers.clear(); + }; + + return _self; +} +``` + +### Benefits of IoC Pattern + +1. **Testability**: Easy to mock dependencies for unit testing +2. **Flexibility**: Different implementations can be injected for different environments +3. **Isolation**: No shared global state between SDK instances +4. **Maintainability**: Clear dependency relationships and easier refactoring +5. **Performance**: No global lookups or singleton access overhead +6. **Security**: Reduced attack surface by eliminating global state mutation + +### Anti-Patterns to Avoid + +```typescript +// WRONG: Global singleton access +class BadTraceProvider { + getTracer() { + const globalSdk = window.__OTEL_SDK__; // Global access - forbidden + return globalSdk.createTracer(); + } +} + +// WRONG: Static state access +class BadSpanProcessor { + process(span: IOTelSpan) { + GlobalConfig.exportUrl; // Static access - forbidden + } +} + +// CORRECT: Dependency injection +export function createSpanProcessor(config: ISpanProcessorConfig): IOTelSpanProcessor { + // All dependencies accessed directly from injected config + config.exporter; // Injected exporter + config.batchSize; // Injected batch size + config.exportTimeout; // Injected timeout + config.logger; // Injected logger + // Implementation uses only injected dependencies +} +``` + +## Core Architectural Principles + +The following principles are listed in **priority order**, with #1 being the highest priority foundational requirement and subsequent principles building upon and supporting the earlier ones: + +### 1. Interface-First Design + +The OTelWebSDK must strictly adhere to an interface-first design pattern to ensure optimal API design, type safety, and maintainability: + +#### **Public Interface Design** +- **All public components must have dedicated interfaces** with proper `I` prefix (e.g., `IUnloadResult`) +- **All OpenTelemetry compatible interfaces must be prefixed** with proper `IOTel` prefix (e.g., `IOTelTraceProvider`, `IOTelLogger`, `IOTelMeterProvider`) +- **Public interfaces must include comprehensive TypeDoc documentation** with detailed descriptions, examples, and default values +- **All properties and methods in interfaces must have explicit return types** to ensure type safety +- **Public interfaces are the only exports in the public API** - implementation classes SHOULD never directly exposed + +#### **Internal Interface Design** +- **Internal interfaces must use `_I` prefix** and be marked with `@internal` TypeDoc tags to exclude them from public API documentation +- **Internal interfaces extend public interfaces when appropriate** to provide additional implementation-specific functionality +- **Internal interfaces are not exported in public API** and provide access to implementation details for internal components only + +#### **Implementation Encapsulation** +- **Implementation classes must be private or internal** and never exported in the public API +- **All implementation details must be encapsulated** behind the interface contracts +- **Implementation classes are hidden behind interfaces** with no direct access to internal state or methods + +#### **Factory Function Pattern** +- **Factory functions must follow the `create*` naming pattern** (e.g., `createOTelWebSdk`, `createTracerProvider`) +- **Factory functions must return interface types, not implementation types** to maintain proper encapsulation +- **Factory functions handle all initialization complexity** including configuration validation, dependency setup, and plugin initialization +- **Implementation classes are never exposed directly** - only through their corresponding interfaces + +#### **Enhanced Configuration Interfaces** +- **Configuration interfaces must have detailed TypeDoc documentation** with descriptions, examples, and default values for all properties +- **Dedicated interfaces for specific component types** (e.g., metric types, span options, logger configurations) +- **Nested interfaces for configuration options** to organize related settings and improve readability +- **Proper typing for all configuration options** with backward compatibility for existing configurations + +#### **Interface Benefits Implementation** +- **Improved type safety and compile-time checking** through strict interface contracts +- **Better encapsulation of implementation details** preventing access to internal APIs +- **Easier testing through interface mocking** enabling comprehensive unit testing +- **Better tree-shaking for reduced bundle size** by eliminating unused implementation code +- **Enhanced IDE support with better IntelliSense** through detailed interface documentation + +#### **Naming Conventions** +- **Public interfaces**: `I` prefix (e.g., `ISdkLoader`) +- **Open Telemetry Compatible interfaces**: `IOTel` prefix (e.g., `IOTelTraceProvider`, `IOTelSpan`, `IOTelLogger`) +- **Internal interfaces**: `_I` prefix with `@internal` TypeDoc tags (e.g., `_ISpanProcessor`, `_IExporter`) +- **Factory functions**: `create*` pattern (e.g., `createOTelWebSdk`, `createBatchSpanProcessor`) +- **Const enums**: lowercase `e` prefix for internal use (e.g., `eSpanKind`, `eLogSeverity`, `eSpanStatusCode`) +- **Public enum types**: corresponding types without `e` prefix using `createEnumStyle` pattern (e.g., `SpanKind`, `LogSeverity`, `SpanStatusCode`) + +#### **Enum Design Pattern** + +The SDK follows the ApplicationInsights-JS enum pattern using `createEnumStyle` for optimal tree-shaking and type safety: + +**Internal Const Enums (Not Exported):** +```typescript +/** + * Internal const enum for span kinds - used internally by the SDK + * @internal + */ +export const enum eSpanKind { + INTERNAL = 0, + SERVER = 1, + CLIENT = 2, + PRODUCER = 3, + CONSUMER = 4 +} + +/** + * Internal const enum for log severity levels - used internally by the SDK + * @internal + */ +export const enum eLogSeverity { + TRACE = 1, + DEBUG = 5, + INFO = 9, + WARN = 13, + ERROR = 17, + FATAL = 21 +} +``` + +**Public Exported Types (Using createEnumStyle):** +```typescript +import { createEnumStyle } from "@microsoft/applicationinsights-core-js"; + +/** + * Public enum-style object for span kinds + * Provides both key => value and value => key mappings for runtime use + */ +export const SpanKind = (/* @__PURE__ */createEnumStyle({ + INTERNAL: eSpanKind.INTERNAL, + SERVER: eSpanKind.SERVER, + CLIENT: eSpanKind.CLIENT, + PRODUCER: eSpanKind.PRODUCER, + CONSUMER: eSpanKind.CONSUMER +})); + +/** + * Type definition for span kind values + * Allows both numeric values and enum object values + */ +export type SpanKind = number | eSpanKind; + +/** + * Public enum-style object for log severity levels + */ +export const LogSeverity = (/* @__PURE__ */createEnumStyle({ + TRACE: eLogSeverity.TRACE, + DEBUG: eLogSeverity.DEBUG, + INFO: eLogSeverity.INFO, + WARN: eLogSeverity.WARN, + ERROR: eLogSeverity.ERROR, + FATAL: eLogSeverity.FATAL +})); + +/** + * Type definition for log severity values + */ +export type LogSeverity = number | eLogSeverity; +``` + +**Usage Pattern Benefits:** +1. **Internal SDK Code**: Uses const enums directly for optimal performance and tree-shaking +2. **Public API**: Consumers use the exported enum-style objects and types +3. **Tree-Shaking Optimization**: Unused enum values are eliminated from final bundle +4. **Type Safety**: Full TypeScript support with proper type checking +5. **Runtime Flexibility**: Supports object properties, const enum values, and numeric values +6. **Developer Experience**: Multiple usage patterns accommodate different developer preferences + +#### **Documentation Standards** +- **Comprehensive TypeDoc comments for all public interfaces** with purpose, usage examples, and implementation notes +- **Interface relationship diagrams** showing inheritance hierarchies and component relationships +- **Mapping interfaces to OpenTelemetry specification concepts** for standards compliance +- **Complete usage examples** demonstrating proper interface usage patterns +- **Enum documentation with value explanations** and mapping to OpenTelemetry specification values +- **Clear distinction between internal const enums and public enum objects** with appropriate `@internal` tags + +### 2. Closure-Based Implementation OR DynamicProto-JS Classes +- **Implementation Choice**: Use EITHER closure pattern for interface implementations OR DynamicProto-JS for class-based implementations +- **Closure Pattern**: Use closures when implementing interfaces - provides private member hiding with direct object property assignment +- **DynamicProto Classes**: Use DynamicProto-JS when you need class inheritance - provides private member hiding with prototype-based inheritance +- **Bundle Size Optimization**: Both patterns provide optimal bundle size through tree-shaking and dead code elimination +- **Private Member Hiding**: Both patterns provide true encapsulation - closures through closure variables, DynamicProto through internal closures +- **Usage Guideline**: Use closures for interface implementations, use DynamicProto only when class inheritance is required + +### 3. Factory Function Pattern with Dependency Injection +- **Controlled Instantiation**: All components created through factory functions following `create*` naming convention +- **Inversion of Control**: Dependencies are injected rather than accessed globally or through singletons +- **Dependency Resolution**: Built-in dependency resolution and lifecycle management without global state +- **Configuration Validation**: Comprehensive validation and sanitization of all configuration inputs +- **Error Handling**: Graceful failure modes with detailed error reporting +- **Explicit Dependencies**: All required dependencies must be explicitly provided, eliminating hidden globals + +### 4. Inversion of Control (IoC) Pattern +- **Dependency Injection**: All dependencies are explicitly provided through factory functions and configuration +- **No Global Properties**: The SDK MUST NOT rely on global variables, singletons, or static state +- **Explicit Instance Management**: All SDK instances must be explicitly created and managed by consumers +- **Isolation**: Multiple SDK instances can coexist without interference +- **Context Propagation**: Explicit context passing eliminates hidden global state dependencies +- **Testability**: Enhanced testing capabilities through controlled instance creation and dependency injection + +### 5. Modular Architecture +- **Selective Loading**: Components can be imported individually to minimize bundle size +- **Plugin Architecture**: Extensible design supporting custom exporters, processors, and instrumentations +- **Lazy Loading**: Components are initialized only when needed +- **Version Compatibility**: Backward and forward compatibility through interface versioning + +### 6. Performance-First Design +- **Minimal Overhead**: Designed to have negligible impact on application performance +- **Async Operations**: Non-blocking telemetry collection and export +- **Batching**: Intelligent batching strategies to reduce network overhead +- **Resource Management**: Automatic cleanup and resource management + +### 7. Key Implementation Principals / Features + +The SDK implements a complete set of enterprise-grade features following strict architectural patterns: + +#### **Core Implementation Patterns** +1. **Interface-First Design**: TypeScript interfaces define all public contracts and public APIs +2. **Factory-First Architecture**: Use factory functions to create instances instead of exposing constructors directly, providing explicit instantiation with dependency injection +3. **IoC Pattern**: No global state, explicit dependency management throughout all components +4. **No Global Singletons**: The design avoids using global variables like context, api, trace, and logs - all state is contained within explicitly created instances managed by the SDK Factory +5. **Closure-Based Implementation**: Use closures instead of classes for bundle size optimization, true hiding of internal private properties/state and reduced bundle size +6. **Modular Architecture**: Supports tree-shaking to create minimal bundle sizes by only including used code with interfaces for all arguments to avoid tight coupling - individual components can be selectively imported + +#### **OpenTelemetry Compliance and Extensions** +7. **Full OTel Compliance**: Complete **compatible** implementation of OpenTelemetry's trace, logs, and basic metrics APIs following the official specification +8. **Multi-Tenant Support**: Support for multiple named instances enables team/project isolation and independent resource management: + - **Shared Resource Optimization**: Multiple SDK instances share timers, connection pools, export queues, and object pools to minimize overhead + - **Instance Isolation**: Each SDK instance maintains its own configuration, processors, and instrumentations while sharing underlying resources + - **Explicit Instance Creation**: Users must explicitly create instances of all components through the Factory, giving them full control over the lifecycle and configuration + +#### **Performance and Extensibility Features** +9. **High-Performance Architecture**: Minimal overhead design with advanced batching, resource management, and bundle optimization +10. **Context Management**: Better control over context propagation with explicit context creation and management +11. **Extensibility**: Support for custom exporters, processors, and samplers to extend functionality +12. **DynamicProto-JS**: When classes are required, use DynamicProto-JS for complex implementations to optimize bundle size while maintaining prototype inheritance benefits +13. **Azure Monitor Integration**: Seamless integration with Azure Monitor through specialized exporters + +## SDK Instance Factory Architecture + +The OTelWebSDK uses a **SDK Instance Factory** as the primary interface for consumers following strict **Inversion of Control (IoC) principles**. The SDK Factory serves as the factory for creating SDK instances with full dependency injection. + +### Key IoC Requirements + +- **No Global Context Dependencies**: Helper functions MUST NOT rely on globals to obtain context or configuration +- **Explicit Dependency Injection**: All required dependencies (context, configuration, managers) must be explicitly passed as arguments +- **Factory as Primary Interface**: All SDK instances MUST be created through `factory.newInst()` +- **No Manual Registration**: Instances are automatically registered during creation + +### IoC-Compliant Consumer Pattern + +```typescript +// CORRECT: IoC-compliant pattern +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// 1. Create factory (no global state dependency) +const factory = createSdkFactory('my-application'); + +// 2. Factory creates instance with dependency injection +const sdk = factory.newInst('my-service', { + connectionString: 'InstrumentationKey=key', + resource: { + 'service.name': 'my-service', + 'service.version': '1.0.0' + } +}); + +// 3. Use SDK normally (dependencies already injected) +await sdk.initialize(); +const tracer = sdk.tracerProvider.getTracer('operations'); + +// CORRECT: Helper function with explicit dependencies +function createNewInst(factory: ISdkFactory, instanceName: string, config: any) { + // GOOD: All dependencies explicitly passed as arguments + // No hidden globals, no implicit context - everything is explicit + return factory.newInst(instanceName, config); +} + +// Usage: Dependencies are clear and explicit +const myInstance = createNewInst(factory, 'my-service', { connectionString: 'key' }); +``` + +### Anti-Pattern (Violates IoC) + +```typescript +// WRONG: Helper function relies on globals to get context +function createNewInst(config: any) { + // BAD: Getting context from global state + const currentContext = globalState.getCurrentContext(); // Violates IoC! + const factory = currentContext.getFactory(); // Uses hidden global dependency + return factory.newInst('service', config); +} + +// WRONG: Function doesn't explicitly declare its dependencies +function createNewInst(config: any) { + // BAD: Hidden dependency on global factory + const factory = getGlobalFactory(); // Where does this come from? Not explicit! + return factory.newInst('service', config); +} +``` + +**IoC Violation**: These functions hide their dependencies, making testing difficult and creating implicit global state requirements. + +### SDK Factory Architecture + +The SDK Factory orchestrates multiple SDK instances while optimizing resource usage: + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ SDK Factory Architecture │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Application Layer │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │ Team A │ │ Team B │ │ Team C │ │ Team D │ │ +│ │ Application │ │ Application │ │ Application │ │ Application │ │ +│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ +│ │ │ │ │ │ +│ └────────────────┼────────────────┼────────────────┘ │ +│ │ │ │ +│ ┌──────────────────────▼────────────────▼───────────────────────────────────┐ │ +│ │ SDK Factory │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Instance │ │ Resource │ │ Configuration │ │ │ +│ │ │ Registry │ │ Coordinator │ │ Manager │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ - Create │ │ - Connection │ │ - Policies │ │ │ +│ │ │ - Register │ │ Pooling │ │ - Inheritance │ │ │ +│ │ │ - Lifecycle │ │ - Timer Coord │ │ - Validation │ │ │ +│ │ │ - Cleanup │ │ - Batch Coord │ │ - Updates │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ +│ └──────────────────────┬────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────▼─────────────────────────────────────────────────────┐ │ +│ │ SDK Instance Layer │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Team A │ │ Team B │ │ Team C │ │ Team D │ │ │ +│ │ │ SDK Inst │ │ SDK Inst │ │ SDK Inst │ │ SDK Inst │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ - Isolated │ │ - Isolated │ │ - Isolated │ │ - Isolated │ │ │ +│ │ │ Config │ │ Config │ │ Config │ │ Config │ │ │ +│ │ │ - Team │ │ - Team │ │ - Team │ │ - Team │ │ │ +│ │ │ Context │ │ Context │ │ Context │ │ Context │ │ │ +│ │ │ - Telemetry │ │ - Telemetry │ │ - Telemetry │ │ - Telemetry │ │ │ +│ │ │ Namespace │ │ Namespace │ │ Namespace │ │ Namespace │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └──────────────────────┬────────────────────────────────────────────────────┘ │ +│ │ │ +│ ┌──────────────────────▼─────────────────────────────────────────────────────┐ │ +│ │ Shared Resources Layer │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Connection │ │ Timer │ │ Export │ │ Processing │ │ │ +│ │ │ Pool │ │ Scheduler │ │ Coordinator │ │ Pipeline │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ - Optimize │ │ - Sync │ │ - Batch │ │ - Shared │ │ │ +│ │ │ Network │ │ Export │ │ Optimize │ │ Processors│ │ │ +│ │ │ - Pool │ │ - Reduce │ │ - Team │ │ - Parallel │ │ │ +│ │ │ Reuse │ │ Overhead │ │ Isolation │ │ Processing│ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +This architecture enables: + +### Multi-Tenant Benefits +- **Tenant Isolation**: Each team gets their own SDK instance with isolated configuration and telemetry context +- **Resource Efficiency**: Shared connections, timers, and processing resources across instances +- **Centralized Management**: Enterprise-wide policies and monitoring capabilities +- **Independent Lifecycle**: Teams can manage their SDK instances independently +- **Namespace Separation**: Automatic telemetry namespacing prevents team conflicts + +### Resource Coordination +- **Connection Pooling**: Efficient sharing of network connections across tenant instances +- **Timer Synchronization**: Coordinated export schedules to reduce browser overhead +- **Batch Optimization**: Intelligent batching that considers all team instances +- **Processing Pipeline Sharing**: Shared processors for common operations while maintaining isolation + +## Dependency Injection Flow Architecture + +Shows how dependencies are injected throughout the SDK without global state: + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ Dependency Injection Architecture │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Configuration Layer │ +│ ┌─────────────────────────────────────────────────────────────────────────────┐ │ +│ │ IOTelWebSdkConfig │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Connection │ │ Exporters │ │ Processors │ │ Resource │ │ │ +│ │ │ Config │ │ Config │ │ Config │ │ Config │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬───────────────────────────────────────────┘ │ +│ │ │ +│ Factory Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ createOTelWebSdk(config) │ │ +│ │ │ │ +│ │ Dependencies Validated & Injected: │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ ✓ Resource │ │ ✓ Logger │ │ ✓ Context │ │ ✓ Clock │ │ │ +│ │ │ Required │ │ Required │ │ Manager │ │ Injectable│ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Provider Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ Provider Factory Functions │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ createTraceProvider │ │ createLogProvider │ │ createMeter │ │ │ +│ │ │ (traceConfig) │ │ (logConfig) │ │ Provider │ │ │ +│ │ │ │ │ │ │ (meterConfig) │ │ │ +│ │ │ Injects: │ │ Injects: │ │ │ │ │ +│ │ │ • Resource │ │ • Resource │ │ Injects: │ │ │ +│ │ │ • SpanProcessors │ │ • LogProcessors │ │ • Resource │ │ │ +│ │ │ • Sampler │ │ • LogExporters │ │ • MetricReaders │ │ │ +│ │ │ • IdGenerator │ │ • Logger │ │ • MetricExporter │ │ │ +│ │ │ • ContextManager │ │ • ContextManager │ │ • Logger │ │ │ +│ │ │ • Logger │ │ • Clock │ │ • Clock │ │ │ +│ │ │ • Clock │ │ • Performance │ │ • Performance │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Implementation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ DynamicProto + Closure Pattern │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ TraceProviderImpl │ │ LogProviderImpl │ │ MeterProviderImpl│ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ Closure Variables: │ │ Closure Variables: │ │ Closure Variables│ │ │ +│ │ │ let _resource; │ │ let _resource; │ │ let _resource; │ │ │ +│ │ │ let _processors; │ │ let _processors; │ │ let _readers; │ │ │ +│ │ │ let _sampler; │ │ let _exporters; │ │ let _exporters; │ │ │ +│ │ │ let _tracers; │ │ let _loggers; │ │ let _meters; │ │ │ +│ │ │ let _isShutdown; │ │ let _isShutdown; │ │ let _isShutdown; │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ All dependencies │ │ All dependencies │ │ All dependencies │ │ │ +│ │ │ injected via │ │ injected via │ │ injected via │ │ │ +│ │ │ factory config │ │ factory config │ │ factory config │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Component Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ Component Factory Functions │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ createTracer() │ │ createLogger() │ │ createMeter() │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ Dependencies │ │ Dependencies │ │ Dependencies │ │ │ +│ │ │ from Provider: │ │ from Provider: │ │ from Provider: │ │ │ +│ │ │ • Resource │ │ • Resource │ │ • Resource │ │ │ +│ │ │ • Processors │ │ • Processors │ │ • Readers │ │ │ +│ │ │ • Sampler │ │ • Exporters │ │ • Exporters │ │ │ +│ │ │ • ContextManager │ │ • ContextManager │ │ • Clock │ │ │ +│ │ │ • Logger │ │ • Logger │ │ • Logger │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Benefits: │ +│ • No global state or singletons │ +│ • All dependencies explicit and testable │ +│ • Clean interfaces with dependency injection │ +│ • True isolation between SDK instances │ +│ • Enhanced performance through closures │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +### Key Dependency Injection Principles + +1. **Explicit Dependencies**: All components declare their dependencies upfront through factory function parameters +2. **No Hidden Globals**: Zero reliance on global variables, window properties, or static state +3. **Constructor Injection**: Dependencies provided at creation time, not accessed later +4. **Interface-Based**: Components depend on interfaces, enabling easy testing and swapping +5. **Lifecycle Clarity**: Parent components manage the lifecycle of their dependencies +6. **Performance Optimization**: Closure-based implementation with injected dependencies for optimal runtime performance + +## Application Insights SDK Requirements + +Building on OpenTelemetry standards, the SDK inherits and extends the proven requirements from the Microsoft Application Insights JavaScript SDK to ensure enterprise-grade reliability and functionality: + +### 7. Dynamic Configuration Management +- **Runtime Configuration Updates**: Support for configuration changes after SDK initialization without requiring restart +- **Hot Reload Capability**: Ability to update connection strings, sampling rates, and feature flags dynamically +- **Configuration Validation**: Real-time validation of configuration changes with rollback on failure +- **Configuration Persistence**: Automatic persistence of valid configuration changes across application restarts +- **Gradual Configuration Rollout**: Support for staged configuration deployment and A/B testing scenarios +- **Reactive Configuration Pattern**: Configuration changes propagated through reactive event patterns rather than explicit APIs + +```typescript +// Dynamic configuration implemented through reactive patterns +const sdk = createOTelWebSdk(initialConfig); +await sdk.initialize(); + +// Configuration changes handled reactively via event callbacks +sdk.onConfigChange((newConfig: IOTelWebSdkConfig) => { + console.log('Configuration updated:', newConfig); + // SDK automatically applies validated configuration changes +}); + +// Configuration updates can come from various sources: +// - Remote configuration services +// - Feature flag systems +// - Environment variable changes +// - Runtime API calls + +// Example: Remote configuration update triggers reactive pattern +// External system pushes config change -> SDK validates -> onConfigChange fires -> Components adapt +``` + +### 8. Complete Unload and Cleanup Support +- **Instance Unload**: Complete removal of SDK instances with all associated hooks, timers, and event listeners +- **Memory Leak Prevention**: Comprehensive cleanup of closures, event handlers, and DOM modifications +- **Hook Removal**: Automatic removal of all instrumentation hooks (XHR, fetch, unhandled errors, etc.) +- **Resource Cleanup**: Proper disposal of timers, intervals, observers, and background processes +- **Graceful Shutdown**: Coordinated shutdown ensuring in-flight telemetry is properly handled + +```typescript +interface IUnloadResult { + success: boolean; + cleanupItems: string[]; + failedCleanup: string[]; + duration: number; + pendingTelemetry: number; +} + +// Example unload sequence +const sdk = createOTelWebSdk(config); +await sdk.initialize(); + +// ... application usage ... + +// Clean shutdown with optional callback and timeout +const unloadResult = await sdk.unload( + (result: IUnloadResult) => { + if (!result.success) { + console.warn('Some cleanup items failed:', result.failedCleanup); + } + }, + 5000 // 5 second timeout +); + +// Or simple shutdown without callback/timeout +const simpleResult = await sdk.unload(); +``` + +### 9. Enterprise Multi-Tenant Support + +Features provided by the Sdk Factory model +- **Tenant Isolation**: Independent SDK instances per team with isolated configuration and telemetry contexts +- **Resource Sharing**: Efficient sharing of connections, timers, and processing resources across tenant instances +- **Namespace Management**: Automatic namespacing of telemetry data to prevent team conflicts +- **Configuration Inheritance**: Team-specific configuration overrides with enterprise-wide default policies +- **Coordinated Management**: Central management capabilities for monitoring and controlling team instances + +### 10. Bundle Size Optimization +- **Tree-Shaking Friendly**: Full support for dead code elimination with no side effects +- **Modular Loading**: Individual components can be imported to minimize bundle size +- **Code Splitting**: Automatic code splitting for large applications with dynamic imports +- **Compression Optimization**: Optimal minification and compression for production builds +- **Size Monitoring**: Built-in bundle size analysis and optimization recommendations + +```typescript +// Selective imports for minimal bundle size +import { createTraceProvider } from '@microsoft/otel-web-sdk/trace'; +import { createOTLPExporter } from '@microsoft/otel-web-sdk/exporters/otlp'; + +// Or full SDK import with tree-shaking +import { createOTelWebSdk } from '@microsoft/otel-web-sdk'; + +// Bundle size targets: +// Core SDK: ~15-20KB gzipped +// With all providers: ~25-35KB gzipped +// With auto-instrumentation: ~40-55KB gzipped +// Full feature set: ~60-80KB gzipped (still smaller than most alternatives) +``` + +### 11. Universal JavaScript Runtime Support +- **Platform Detection**: Automatic detection of runtime environment (Browser, Web Worker, Node.js, SSR) +- **API Adaptation**: Runtime-specific API usage with graceful fallbacks for missing features +- **SSR Compatibility**: Full support for Server-Side Rendering scenarios with hydration +- **Web Worker Support**: Complete functionality in Web Worker and Service Worker environments +- **Node.js Integration**: Seamless operation in Node.js environments for SSR and build tools + +```typescript +interface IRuntimeAdapter { + readonly platform: 'browser' | 'webworker' | 'node' | 'unknown'; + readonly capabilities: IRuntimeCapabilities; + + detectEnvironment(): IRuntimeEnvironment; + getPerformanceNow(): (() => number); + getStorage(): IStorageAdapter | null; + getNetworkAdapter(): INetworkAdapter; + getTimerAdapter(): ITimerAdapter; +} + +interface IRuntimeCapabilities { + hasPerformance: boolean; + hasStorage: boolean; + hasWebWorkers: boolean; + hasServiceWorkers: boolean; + hasDOM: boolean; + hasConsole: boolean; + hasProcess: boolean; + supportsStreams: boolean; +} + +// Example runtime adaptation +const runtimeAdapter = createRuntimeAdapter(); + +if (runtimeAdapter.platform === 'browser') { + // Use browser-specific optimizations + enableWebVitalsTracking(); + enableUserInteractionTracking(); +} else if (runtimeAdapter.platform === 'node') { + // Use Node.js-specific features for SSR + enableServerSideInstrumentation(); +} + +// SSR example +function renderWithTelemetry(component: ReactComponent): string { + const sdk = createOTelWebSdk({ + platform: 'ssr', + enableClientSideHydration: true, + telemetryEndpoint: process.env.TELEMETRY_ENDPOINT + }); + + return ReactDOMServer.renderToString( + + {component} + + ); +} +``` + +### Integration with OpenTelemetry Standards + +These Application Insights requirements seamlessly integrate with OpenTelemetry standards: + +- **Dynamic Configuration** extends OTel configuration management with runtime updates +- **Unload Support** ensures proper cleanup of OTel providers, processors, and exporters +- **Multi-Tenant Support** leverages OTel's modular design for isolated instances +- **Bundle Size** optimization works with OTel's tree-shakable architecture +- **Runtime Support** adapts OTel APIs to different JavaScript environments + +This dual approach provides OpenTelemetry standard compliance while delivering the enterprise-grade features that Application Insights users depend on. + +## Detailed Architecture Diagram + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ OTelWebSDK Architecture │ +├─────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ Application │ │ Framework │ │ User Code │ │ +│ │ Code │ │ Integration │ │ │ │ +│ └─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘ │ +│ │ │ │ │ +│ └──────────────────────┼──────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────┐ │ +│ │ IOTelWebSDK Interface │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │IOTelTraceProv│ │IOTelLogProv │ │IOTelMeterProv│ │IOTelCtxMgr │ │ │ +│ │ │ │ │ │ │ (Basic) │ │ │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────┐ │ +│ │ Factory Function Layer │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │createOTel │ │createTrace │ │createLog │ │ │ +│ │ │WebSdk() │ │Provider() │ │Provider() │ │Provider() │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────┐ │ +│ │ Implementation Layer (Hidden) │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │OTelWebSdk │ │TraceProvider│ │LogProvider │ │MeterProvider│ │ │ +│ │ │Impl │ │Impl │ │Impl │ │Impl │ │ │ +│ │ │(DynamicProto│ │(DynamicProto│ │(DynamicProto│ │(DynamicProto│ │ │ +│ │ │ + Closures) │ │ + Closures) │ │ + Closures) │ │ + Closures) │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────┐ │ +│ │ Processing Pipeline │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │Span │ │Log Record │ │Metric │ │Context │ │ │ +│ │ │Processors │ │Processors │ │Readers │ │Propagators │ │ │ +│ │ │- Simple │ │- Simple │ │- Periodic │ │- W3C │ │ │ +│ │ │- Batch │ │- Batch │ │- Manual │ │- B3 │ │ │ +│ │ │- Sampling │ │- Filter │ │- Push │ │- Jaeger │ │ │ +│ │ │- Custom │ │- Enrichment │ │- Pull │ │- Baggage │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────┘ │ +│ │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────┐ │ +│ │ Export Layer │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │Span │ │Log │ │Metric │ │Custom │ │ │ +│ │ │Exporters │ │Exporters │ │Exporters │ │Exporters │ │ │ +│ │ │- Azure │ │- Azure │ │- Azure │ │- Pluggable │ │ │ +│ │ │ Monitor │ │ Monitor │ │ Monitor │ │- Community │ │ │ +│ │ │- OTLP │ │- OTLP │ │- Prometheus │ │- Enterprise │ │ │ +│ │ │- Console │ │- Console │ │- OTLP │ │- Debug │ │ │ +│ │ │- Jaeger │ │- Splunk │ │- Custom │ │- Test │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────┘ │ +│ │ │ +└────────────────────────────────────┼─────────────────────────────────────────┘ + │ +┌────────────────────────────────────▼─────────────────────────────────────────┐ +│ Backend Systems │ +│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ +│ │Azure Monitor│ │OTLP │ │Prometheus │ │Custom │ │ +│ │- App │ │- Collector │ │- Server │ │- Proprietary│ │ +│ │ Insights │ │- Gateway │ │- Agent │ │- Legacy │ │ +│ │- Log │ │- Vendor │ │- Pushgateway│ │- Future │ │ +│ │ Analytics │ │ Distros │ │- Federation │ │ Systems │ │ +│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +## Implementation Patterns: Closures OR DynamicProto Classes + +The SDK uses two distinct implementation patterns depending on the specific requirements: + +### Pattern 1: Closure-Based Interface Implementation + +For most interface implementations, the SDK uses closures with direct object property assignment: + +### Benefits of Closure Pattern + +1. **Superior Bundle Size Optimization**: Closure variables can be aggressively minified by modern bundlers +2. **True Private State**: Internal variables remain completely inaccessible from outside the closure +3. **Enhanced Tree-Shaking**: More effective dead code elimination for minimal bundle sizes +4. **Browser Compatibility**: Works consistently across all target browsers including legacy environments +5. **Efficient Implementation**: Direct property assignment with optimized execution patterns +6. **Runtime Optimization**: V8 and other engines can better optimize closure-based code + +### Implementation Pattern + +```typescript +export function createTraceProvider(config: ITraceProviderConfig): IOTelTraceProvider { + // Private closure variables - completely encapsulated + let _config = { ...defaultConfig, ...config }; + let _tracers = new Map(); + let _processors: IOTelSpanProcessor[] = []; + let _isShutdown = false; + let _logger = createLogger("TraceProvider"); + + // Validation and setup + validateConfiguration(_config); + + // Create the interface instance + let _self = {} as IOTelTraceProvider; + + // Define methods directly on the interface instance + // Public method implementations with closure access + _self.getTracer = (name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer => { + if (_isShutdown) { + _logger.warn("TraceProvider is shutdown, returning no-op tracer"); + return createNoOpTracer(); + } + + const key = `${name}@${version || 'unknown'}`; + let tracer = _tracers.get(key); + + if (!tracer) { + tracer = createTracer({ + name, + version, + traceProvider: _self, + resource: _config.resource, + ...options + }); + _tracers.set(key, tracer); + } + + return tracer; + }; + + _self.addSpanProcessor = (processor: ISpanProcessor): void => { + if (_isShutdown) { + throw new Error("Cannot add processor to shutdown TraceProvider"); + } + + _processors.push(processor); + // Notify existing tracers of new processor + _tracers.forEach(tracer => tracer._addProcessor(processor)); + }; + + _self.shutdown = async (): Promise => { + if (_isShutdown) return; + + _isShutdown = true; + + // Shutdown all processors in parallel + await Promise.all(_processors.map(processor => + processor.shutdown().catch(err => + _logger.error("Error shutting down processor", err) + ) + )); + + // Clear references for garbage collection + _tracers.clear(); + _processors.length = 0; + }; + + return _self; +} +``` + +### Pattern 2: DynamicProto-JS Classes (When Inheritance is Needed) + +When class inheritance or complex prototype chains are required, use DynamicProto-JS: + +```typescript +import { dynamicProto } from "@microsoft/dynamicproto-js"; + +export function createComplexProcessor(config: IProcessorConfig): IOTelSpanProcessor { + // Private closure variables + let _config = { ...config }; + let _isShutdown = false; + + // Create class instance + let _self = {} as IOTelSpanProcessor; + + // Use DynamicProto for complex inheritance scenarios + dynamicProto(BaseProcessor, _self, (_self) => { + + _self.onStart = (span: IOTelSpan): void => { + // Implementation with access to closure variables + if (!_isShutdown) { + // Process span using _config + } + }; + + _self.onEnd = (span: IOTelSpan): void => { + // Implementation + }; + + }); + + return _self; +} + +/** + * @DynamicProtoStub + */ +function BaseProcessor() { + // Stub for DynamicProto +} +``` + let _self = {} as IOTelTraceProvider; + + // Define methods directly on the interface instance + // Public method implementations with closure access + _self.getTracer = (name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer => { + if (_isShutdown) { + _logger.warn("TraceProvider is shutdown, returning no-op tracer"); + return createNoOpTracer(); + } + + const key = `${name}@${version || 'unknown'}`; + let tracer = _tracers.get(key); + + if (!tracer) { + tracer = createTracer({ + name, + version, + traceProvider: _self, + resource: _config.resource, + ...options + }); + _tracers.set(key, tracer); + } + + return tracer; + }; + + _self.addSpanProcessor = (processor: ISpanProcessor): void => { + if (_isShutdown) { + throw new Error("Cannot add processor to shutdown TraceProvider"); + } + + _processors.push(processor); + // Notify existing tracers of new processor + _tracers.forEach(tracer => tracer._addProcessor(processor)); + }; + + _self.shutdown = async (): Promise => { + if (_isShutdown) return; + + _isShutdown = true; + + // Shutdown all processors in parallel + await Promise.all(_processors.map(processor => + processor.shutdown().catch(err => + _logger.error("Error shutting down processor", err) + ) + )); + + // Clear references for garbage collection + _tracers.clear(); + _processors.length = 0; + }; + + return _self; +} + +### Performance Characteristics + +- **Initialization**: ~2-5ms for full SDK setup including all providers +- **Span Creation**: <0.1ms per span with minimal memory allocation +- **Context Propagation**: <0.05ms per operation with optimized header handling +- **Bundle Size**: 15-25KB gzipped for full SDK (vs 40-60KB for class-based implementations) +- **Memory Usage**: DynamicProto provides 50-70% reduction through shared prototypes; closures optimize per-instance overhead + +## Interface Design and Naming Conventions + +The SDK follows strict architectural patterns for interface design to ensure consistency, maintainability, and future-proofing: + +### Interface Hierarchy and Organization + +```typescript +// Core SDK Interfaces +export interface IOTelWebSdk extends IOTelSdk { + readonly traceProvider: IOTelTraceProvider; + readonly loggerProvider: IOTelLogProvider; + readonly meterProvider: IOTelMeterProvider; + readonly contextManager: IOTelContextManager; + + // Lifecycle Management + unload(onDone?: (result: IUnloadResult) => void, timeoutMs?: number): Promise; +} + +// Provider Interfaces +export interface IOTelTraceProvider extends IProvider { + getTracer(name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer; + addSpanProcessor(processor: ISpanProcessor): void; + getActiveSpanProcessors(): ISpanProcessor[]; +} + +export interface IOTelLoggerProvider extends IProvider { + getLogger(name: string, version?: string, options?: IOTelLoggerOptions): IOTelLogger; + addLogRecordProcessor(processor: ILogRecordProcessor): void; + getActiveLogRecordProcessors(): ILogRecordProcessor[]; +} + +// Basic Metrics Provider - focused on simple metric generation only +export interface IOTelMeterProvider extends IProvider { + getMeter(name: string, version?: string, options?: IOTelMeterOptions): IOTelMeter; + addMetricReader(reader: IMetricReader): void; + getActiveMetricReaders(): IMetricReader[]; + // Note: Advanced features like metric views, complex aggregations, and + // sophisticated metric processing are intentionally excluded for lightweight implementation +} +``` + +### Naming Convention Standards + +- **Public Interfaces**: `I` prefix (e.g., `IUnloadResult`, `ISdkLoader`) +- **OpenTelemetry Compatible Interfaces**: `IOTel` prefix (e.g., `IOTelTraceProvider`, `IOTelSpan`, `IOTelLogger`) +- **Internal Interfaces**: `_I` prefix with `@internal` TypeDoc tags (e.g., `_ISpanProcessor`, `_IExporter`) +- **Configuration Interfaces**: Descriptive names with `Config` suffix (e.g., `ITraceProviderConfig`, `ISpanOptions`) +- **Factory Functions**: `create*` pattern (e.g., `createOTelWebSdk`, `createBatchSpanProcessor`) +- **Const Enums**: lowercase `e` prefix for internal use (e.g., `eSpanKind`, `eLogSeverity`) +- **Public Enum Objects**: corresponding names without `e` prefix using `createEnumStyle` pattern + +### Interface Evolution Strategy + +1. **Semantic Versioning**: Interface changes follow strict semantic versioning +2. **Deprecation Path**: Clear deprecation warnings with migration guides +3. **Extension Points**: Interfaces designed with future extension capabilities +4. **Backward Compatibility**: New methods added as optional with sensible defaults + +## Module Structure and Organization + +The SDK follows a modular architecture that promotes maintainability, testability, and selective loading: + +```bash +otel-web-sdk/ +├── src/ +│ ├── core/ +│ │ ├── interfaces/ +│ │ │ └── ... +│ │ ├── impl/ +│ │ │ └── ... +│ │ ├── factories/ +│ │ │ └── ... +│ │ └── index.ts +│ ├── trace/ +│ │ ├── interfaces/ +│ │ │ └── ... +│ │ ├── impl/ +│ │ │ ├── processors/ +│ │ │ ├── exporters/ +│ │ │ ├── samplers/ +│ │ │ └── ... +│ │ ├── factories/ +│ │ │ └── ... +│ │ └── index.ts +│ ├── logs/ +│ │ ├── interfaces/ +│ │ │ └── ... +│ │ ├── impl/ +│ │ │ ├── processors/ +│ │ │ ├── exporters/ +│ │ │ └── ... +│ │ ├── factories/ +│ │ │ └── ... +│ │ └── index.ts +│ ├── context/ +│ │ ├── interfaces/ +│ │ │ └── ... +│ │ ├── impl/ +│ │ │ ├── propagators/ +│ │ │ ├── storage/ +│ │ │ └── ... +│ │ ├── factories/ +│ │ │ └── ... +│ │ └── index.ts +│ ├── instrumentation/ +│ │ ├── auto/ +│ │ │ └── ... +│ │ ├── manual/ +│ │ │ └── ... +│ │ └── index.ts +│ ├── platform/ +│ │ ├── browser/ +│ │ ├── webworker/ +│ │ ├── node/ +│ │ └── ... +│ ├── utils/ +│ │ ├── performance/ +│ │ ├── validation/ +│ │ ├── serialization/ +│ │ ├── testing/ +│ │ └── ... +│ └── index.ts +├── tests/ +│ ├── unit/ +│ ├── integration/ +│ ├── performance/ +│ ├── browser/ +│ └── ... +├── examples/ +│ ├── basic/ +│ ├── react/ +│ ├── angular/ +│ ├── vue/ +│ └── ... +├── docs/ +│ ├── api/ +│ ├── guides/ +│ ├── migration/ +│ └── ... +├── tools/ +│ ├── build/ +│ ├── test/ +│ ├── size-analysis/ +│ └── ... +├── package.json +├── tsconfig.json +├── rollup.config.js +└── README.md +``` + +### Module Loading Strategy + +1. **Entry Point**: Single main entry point with selective imports +2. **Tree Shaking**: Full support for dead code elimination +3. **Lazy Loading**: Components loaded only when needed +4. **Code Splitting**: Automatic splitting for large applications +5. **Platform Detection**: Automatic platform-specific loading + +## Configuration Management Architecture + +The SDK implements a sophisticated configuration system designed for flexibility, validation, and runtime adaptability: + +### Configuration Hierarchy + +```typescript +interface IOTelWebSdkConfig { + // Connection and endpoint configuration + connectionString?: string; + instrumentationKey?: string; // Legacy support + endpoint?: string; + + // Resource identification + resource?: IResourceConfig; + + // Provider-specific configuration + tracing?: ITraceProviderConfig; + logging?: ILoggerProviderConfig; + + // Global settings + disabled?: boolean; + debug?: boolean; + environment?: string; + + // Performance and limits + performance?: IPerformanceConfig; + limits?: ILimitsConfig; + + // Extension points + plugins?: IPluginConfig[]; + instrumentations?: IInstrumentationConfig[]; + + // Runtime behavior + shutdownTimeout?: number; + flushTimeout?: number; + enableAutoInstrumentation?: boolean; +} +``` + +### Configuration Principles + +1. **Hierarchical Override**: Configuration flows from global → provider → component levels +2. **Runtime Validation**: All configuration validated at creation time with detailed error messages +3. **Hot Reloading**: Support for runtime configuration updates where safe +4. **Environment Detection**: Automatic detection of browser, framework, and deployment environment +5. **Secure Defaults**: All defaults chosen for security and performance +6. **Schema Validation**: JSON Schema validation for configuration objects + +### Configuration Sources + +```typescript +// Priority order (highest to lowest): +// 1. Explicit configuration passed to factory functions +// 2. Environment variables (when available) +// 3. Configuration providers (remote config, etc.) +// 4. Default configuration + +const configManager = createConfigManager({ + sources: [ + new ExplicitConfigSource(userConfig), + new EnvironmentConfigSource(), + new RemoteConfigSource(configEndpoint), + new DefaultConfigSource() + ], + validation: { + strict: true, + throwOnError: true, + logValidationErrors: true + } +}); +``` + +## Error Handling and Resilience Architecture + +The SDK implements comprehensive error handling to ensure reliability in production environments: + +### Error Categories and Handling + +1. **Configuration Errors**: Validation failures, invalid settings + - Throw immediately during initialization + - Provide detailed error messages with correction suggestions + - Include configuration path and expected values + +2. **Network Errors**: Export failures, connectivity issues + - Implement exponential backoff with jitter + - Circuit breaker pattern for persistent failures + - Fallback to local storage when possible + +3. **Runtime Errors**: Unexpected exceptions during telemetry collection + - Graceful degradation to no-op behavior + - Error reporting to configured error handlers + - Continue operation without affecting application + +4. **Resource Errors**: Memory pressure, quota exceeded + - Automatic throttling and backpressure + - Resource cleanup and garbage collection + - Performance monitoring and alerting + +### Error Handling Pattern + +```typescript +interface IErrorHandler { + handleError(error: Error, context: IErrorContext): void; +} + +interface IErrorContext { + component: string; + operation: string; + telemetryType: 'trace' | 'log' | 'metric'; + recoverable: boolean; + metadata?: Record; +} + +// Example implementation +class SDKErrorHandler implements IErrorHandler { + private _errorCallbacks: ((error: Error, context: IErrorContext) => void)[] = []; + private _circuitBreakers = new Map(); + + handleError(error: Error, context: IErrorContext): void { + // Log error with context + this._logger.error(`Error in ${context.component}.${context.operation}`, { + error: error.message, + stack: error.stack, + context + }); + + // Update circuit breaker state + const breaker = this._circuitBreakers.get(context.component); + if (breaker && !context.recoverable) { + breaker.recordFailure(); + } + + // Notify error callbacks + this._errorCallbacks.forEach(callback => { + try { + callback(error, context); + } catch (callbackError) { + // Prevent callback errors from affecting SDK operation + this._logger.warn("Error in error callback", callbackError); + } + }); + } +} +``` + +## Future-Proofing Strategies + +The architecture is designed to adapt to evolving requirements and emerging standards: + +### OpenTelemetry Specification Evolution + +1. **Specification Tracking**: Automated monitoring of OpenTelemetry specification changes +2. **Feature Flags**: New features hidden behind feature flags for gradual rollout +3. **Version Compatibility**: Support for multiple OpenTelemetry specification versions +4. **Migration Tools**: Automated migration assistance for breaking changes +5. **Preview Features**: Early access to upcoming specification features + +### Technology Evolution Preparedness + +1. **Web Standards**: Ready for new browser APIs and web standards +2. **Framework Integration**: Pluggable architecture for new JavaScript frameworks +3. **Platform Expansion**: Design supports expansion to new platforms (IoT, mobile) +4. **Transport Protocols**: Extensible transport layer for new protocols +5. **Data Formats**: Support for emerging telemetry data formats + +### Extensibility Architecture + +```typescript +// Plugin architecture for future extensibility +interface ISDKPlugin { + readonly name: string; + readonly version: string; + readonly dependencies?: string[]; + + initialize(sdk: IOTelWebSdk): Promise; + shutdown(): Promise; + + // Optional extension points + onBeforeSpanStart?(span: IOTelSpan): IOTelSpan; + onAfterSpanEnd?(span: IOTelSpan): void; + onBeforeLogEmit?(logRecord: ILogRecord): ILogRecord; +} + +// Extension point registration +interface IExtensionRegistry { + registerPlugin(plugin: ISDKPlugin): void; + registerInstrumentation(instrumentation: IInstrumentation): void; + registerExporter(type: TelemetryType, exporter: IExporter): void; + registerProcessor(type: TelemetryType, processor: IProcessor): void; + registerPropagator(propagator: IPropagator): void; +} +``` + +### Backward Compatibility Strategy + +1. **Interface Versioning**: Semantic versioning for all public interfaces +2. **Deprecation Timeline**: 12-month deprecation period for breaking changes +3. **Migration Guides**: Comprehensive guides for version upgrades +4. **Compatibility Layers**: Adapters for legacy API compatibility +5. **Feature Detection**: Runtime feature detection for progressive enhancement + +### Performance Future-Proofing + +1. **WebAssembly Ready**: Architecture supports WebAssembly modules for performance-critical operations +2. **Web Workers**: Full support for Web Worker environments +3. **Streaming**: Prepared for streaming telemetry APIs +4. **Compression**: Pluggable compression algorithms for data export +5. **Caching**: Intelligent caching strategies for offline scenarios + +### Security and Privacy Evolution + +1. **Privacy Regulations**: Built-in support for emerging privacy regulations +2. **Data Minimization**: Configurable data collection minimization +3. **Encryption**: Support for client-side encryption of sensitive telemetry +4. **Audit Trails**: Comprehensive audit logging for compliance +5. **Data Governance**: Fine-grained control over data collection and export + +## Processing Pipeline Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ Dependency Injection Architecture │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Configuration Layer │ +│ ┌─────────────────────────────────────────────────────────────────────────────┐ │ +│ │ IOTelWebSdkConfig │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Connection │ │ Exporters │ │ Processors │ │ Resource │ │ │ +│ │ │ Config │ │ Config │ │ Config │ │ Config │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬───────────────────────────────────────────┘ │ +│ │ │ +│ Factory Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ createOTelWebSdk(config) │ │ +│ │ │ │ +│ │ Dependencies Validated & Injected: │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ ✓ Resource │ │ ✓ Logger │ │ ✓ Context │ │ ✓ Clock │ │ │ +│ │ │ Required │ │ Required │ │ Manager │ │ Injectable│ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Provider Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ Provider Factory Functions │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ createTraceProvider │ │ createLogProvider │ │ createMeter │ │ │ +│ │ │ (traceConfig) │ │ (logConfig) │ │ Provider │ │ │ +│ │ │ │ │ │ │ (meterConfig) │ │ │ +│ │ │ Injects: │ │ Injects: │ │ │ │ │ +│ │ │ • Resource │ │ • Resource │ │ Injects: │ │ │ +│ │ │ • SpanProcessors │ │ • LogProcessors │ │ • Resource │ │ │ +│ │ │ • Sampler │ │ • LogExporters │ │ • MetricReaders │ │ │ +│ │ │ • IdGenerator │ │ • Logger │ │ • MetricExporter │ │ │ +│ │ │ • ContextManager │ │ • ContextManager │ │ • Logger │ │ │ +│ │ │ • Logger │ │ • Clock │ │ • Clock │ │ │ +│ │ │ • Clock │ │ • Performance │ │ • Performance │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Implementation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ DynamicProto + Closure Pattern │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ TraceProviderImpl │ │ LogProviderImpl │ │ MeterProviderImpl│ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ Closure Variables: │ │ Closure Variables: │ │ Closure Variables│ │ │ +│ │ │ let _resource; │ │ let _resource; │ │ let _resource; │ │ │ +│ │ │ let _processors; │ │ let _processors; │ │ let _readers; │ │ │ +│ │ │ let _sampler; │ │ let _exporters; │ │ let _exporters; │ │ │ +│ │ │ let _tracers; │ │ let _loggers; │ │ let _meters; │ │ │ +│ │ │ let _isShutdown; │ │ let _isShutdown; │ │ let _isShutdown; │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ All dependencies │ │ All dependencies │ │ All dependencies │ │ │ +│ │ │ injected via │ │ injected via │ │ injected via │ │ │ +│ │ │ factory config │ │ factory config │ │ factory config │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └─────────────────────────────────┬─────────────────────────────────────────┘ │ +│ │ │ +│ Component Creation Layer │ │ +│ ┌─────────────────────────────────▼─────────────────────────────────────────┐ │ +│ │ Component Factory Functions │ │ +│ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ │ │ +│ │ │ createTracer() │ │ createLogger() │ │ createMeter() │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ Dependencies │ │ Dependencies │ │ Dependencies │ │ │ +│ │ │ from Provider: │ │ from Provider: │ │ from Provider: │ │ │ +│ │ │ • Resource │ │ • Resource │ │ • Resource │ │ │ +│ │ │ • Processors │ │ • Processors │ │ • Readers │ │ │ +│ │ │ • Sampler │ │ • Exporters │ │ • Exporters │ │ │ +│ │ │ • ContextManager │ │ • ContextManager │ │ • Clock │ │ │ +│ │ │ • Logger │ │ • Logger │ │ • Logger │ │ │ +│ │ └─────────────────────┘ └─────────────────────┘ └──────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Benefits: │ +│ • No global state or singletons │ +│ • All dependencies explicit and testable │ +│ • Clean interfaces with dependency injection │ +│ • True isolation between SDK instances │ +│ • Enhanced performance through closures │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Processing Pipeline Architecture + +Detailed view of how telemetry flows through the processing pipeline: + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ Processing Pipeline Architecture │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Telemetry Generation Layer │ +│ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Traces │ │ Logs │ │ +│ │ │ │ │ │ +│ │ tracer.startSpan() │ │ logger.info() │ │ +│ │ span.setAttribute() │ │ logger.error() │ │ +│ │ span.addEvent() │ │ logger.debug() │ │ gauge.set() │ │ +│ │ span.end() │ │ │ │ │ │ +│ └──────────┬──────────┘ └──────────┬──────────┘ └──────────┬──────────┘ │ +│ │ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────────┐ ┌─────────────────────┐ │ +│ │ Span Creation │ │ LogRecord Creation │ │ +│ │ │ │ │ │ +│ │ • Validate data │ │ • Validate data │ │ +│ │ • Apply resource │ │ • Apply resource │ │ +│ │ • Set timestamps │ │ • Set timestamps │ │ +│ │ • Generate IDs │ │ • Set severity │ │ +│ └──────────┬──────────┘ └──────────┬──────────┘ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ Processor Layer │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Span Processors │ │ Log Processors │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ SimpleProcessor │ │ SimpleProcessor │ │ │ +│ │ │ ├─onStart() │ │ ├─onEmit() │ │ │ +│ │ │ └─onEnd() │ │ └─forceFlush() │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ BatchProcessor │ │ BatchProcessor │ │ │ +│ │ │ ├─buffer spans │ │ ├─buffer logs │ │ │ +│ │ │ ├─batch export │ │ ├─batch export │ │ │ +│ │ │ └─timer flush │ │ └─timer flush │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ SamplingProc. │ │ FilterProcessor │ │ │ +│ │ │ ├─sample decision│ │ ├─level filter │ │ │ +│ │ │ └─drop/keep │ │ └─attribute filt│ │ │ +│ │ │ │ │ │ │ │ +│ │ │ CustomProcessor │ │ EnrichProcessor │ │ │ +│ │ │ ├─transform │ │ ├─add context │ │ │ +│ │ │ └─custom logic │ │ └─add metadata │ │ │ +│ │ └─────────────────┘ └─────────────────┘ │ │ +│ └──────────────────────────────┬──────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ Export Coordination │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Span Export │ │ Log Export │ │ │ +│ │ │ Coordination │ │ Coordination │ │ Coordination │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ • Batch sizes │ │ • Batch sizes │ │ • Collection intervals │ │ │ +│ │ │ • Export timing │ │ • Export timing │ │ • Export timing │ │ │ +│ │ │ • Retry logic │ │ • Retry logic │ │ • Retry logic │ │ │ +│ │ │ • Error handling│ │ • Error handling│ │ │ +│ │ │ • Backpressure │ │ • Backpressure │ │ │ +│ │ └─────────────────┘ └─────────────────┘ │ │ +│ └──────────────────────────────┬──────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ Exporter Layer │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ │ │ +│ │ │ Span Exporters │ │ Log Exporters │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ AzureMonitor │ │ AzureMonitor │ │ │ +│ │ │ ├─format spans │ │ ├─format logs │ │ │ +│ │ │ ├─compress │ │ ├─compress │ │ │ +│ │ │ └─HTTP POST │ │ └─HTTP POST │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ OTLP Exporter │ │ OTLP Exporter │ │ │ +│ │ │ ├─protobuf │ │ ├─protobuf │ │ │ +│ │ │ ├─gRPC/HTTP │ │ ├─gRPC/HTTP │ │ │ +│ │ │ └─standards │ │ └─standards │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ Console Export │ │ Console Export │ │ │ +│ │ │ ├─debug output │ │ ├─debug output │ │ │ +│ │ │ └─development │ │ └─development │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ Custom Export │ │ Custom Export │ │ │ +│ │ │ ├─extensible │ │ ├─extensible │ │ │ +│ │ │ └─pluggable │ │ └─pluggable │ │ │ +│ │ └─────────────────┘ └─────────────────┘ │ │ +│ └──────────────────────────────┬──────────────────────────────────────────────┘ │ +│ │ │ +│ ▼ │ +│ ┌───────────────────────────────────────────────────────────────────────────┐ │ +│ │ Backend Systems │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │ +│ │ │ Azure Monitor │ │ OTLP Collectors │ │ Custom Backends │ │ │ +│ │ │ - Application │ │ - OpenTelemetry │ │ - Prometheus │ │ │ +│ │ │ Insights │ │ Collector │ │ - Jaeger │ │ │ +│ │ │ - Log Analytics │ │ - Vendor │ │ - Zipkin │ │ │ +│ │ │ - Metrics │ │ Distributions │ │ - Splunk │ │ │ +│ │ │ Explorer │ │ - Cloud Vendor │ │ - Datadog │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Context Propagation Architecture + +Shows how context flows through the application and across service boundaries: + +``` +┌─────────────────────────────────────────────────────────────────────────────────┐ +│ Context Propagation Architecture │ +├─────────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ Application Execution Flow │ +│ ┌─────────────────────────────────────────────────────────────────────────────┐ │ +│ │ Active Context │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Root Span │ │ Child Span │ │ Log Context │ │ Baggage │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ TraceID: │ │ TraceID: │ │ TraceID: │ │ user.id: │ │ │ +│ │ │ abc123... │ │ abc123... │ │ abc123... │ │ user123 │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ SpanID: │ │ SpanID: │ │ SpanID: │ │ session.id: │ │ │ +│ │ │ def456... │ │ ghi789... │ │ ghi789... │ │ sess456 │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ ParentID: │ │ ParentID: │ │ TraceFlags: │ │ team.name: │ │ │ +│ │ │ (none) │ │ def456... │ │ sampled │ │ frontend │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ └──────────────────────────┬────────────────────────────────────────────────┘ │ +│ │ │ +│ Local Context Management │ │ +│ ┌──────────────────────────▼────────────────────────────────────────────────┐ │ +│ │ Context Manager │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │ +│ │ │ Context Storage │ │ Context Stack │ │ Context Operations │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ • Active │ │ • Push Context │ │ • context.with(ctx, fn) │ │ │ +│ │ │ Context │ │ • Pop Context │ │ • context.active() │ │ │ +│ │ │ • Async │ │ • Context Chain │ │ • context.setSpan() │ │ │ +│ │ │ Locals │ │ • Parent Links │ │ • context.getSpan() │ │ │ +│ │ │ • Thread │ │ • Nested Scopes │ │ • context.setBaggage() │ │ │ +│ │ │ Storage │ │ │ │ • context.getBaggage() │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │ +│ └──────────────────────────┬────────────────────────────────────────────────┘ │ +│ │ │ +│ Cross-Service Propagation │ │ +│ ┌──────────────────────────▼────────────────────────────────────────────────┐ │ +│ │ Propagators │ │ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │ │ +│ │ │ W3C TraceContext│ │ W3C Baggage │ │ Custom Propagators │ │ │ +│ │ │ │ │ │ │ │ │ │ +│ │ │ Header: │ │ Header: │ │ • B3 (Zipkin) │ │ │ +│ │ │ traceparent │ │ baggage │ │ • Jaeger │ │ │ +│ │ │ │ │ │ │ • X-Ray │ │ │ +│ │ │ Format: │ │ Format: │ │ • OpenTracing │ │ │ +│ │ │ 00-{traceId}- │ │ key1=value1, │ │ • Custom headers │ │ │ +│ │ │ {spanId}-{flag} │ │ key2=value2 │ │ │ │ │ +│ │ │ │ │ │ │ inject(context, carrier)│ │ │ +│ │ │ Extract/Inject: │ │ Extract/Inject: │ │ extract(carrier) -> ctx │ │ │ +│ │ │ HTTP headers │ │ HTTP headers │ │ │ │ │ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │ │ +│ └──────────────────────────┬────────────────────────────────────────────────┘ │ +│ │ │ +│ Outgoing Request Flow │ │ +│ ┌──────────────────────────▼────────────────────────────────────────────────┐ │ +│ │ HTTP Request │ │ +│ │ │ │ +│ │ Application Code HTTP Headers │ │ +│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │ +│ │ │ fetch('/api') │────────────────▶│ traceparent: │ │ │ +│ │ │ │ │ 00-abc123...def456...01 │ │ │ +│ │ │ Current Context:│ │ │ │ │ +│ │ │ • TraceID │ │ baggage: │ │ │ +│ │ │ • SpanID │ │ user.id=user123, │ │ │ +│ │ │ • Baggage │ │ session.id=sess456, │ │ │ +│ │ │ │ │ team.name=frontend │ │ │ +│ │ └─────────────────┘ └─────────────────────────────┘ │ │ +│ └──────────────────────────┬────────────────────────────────────────────────┘ │ +│ │ │ +│ Incoming Request Flow │ │ +│ ┌──────────────────────────▼────────────────────────────────────────────────┐ │ +│ │ HTTP Response │ │ +│ │ │ │ +│ │ Server Code HTTP Headers │ │ +│ │ ┌─────────────────────────────┐ ┌─────────────────────────────┐ │ │ +│ │ │ app.get('/api', (req) => { │◀────│ traceparent: │ │ │ +│ │ │ │ │ 00-abc123...def456...01 │ │ │ +│ │ │ const context = │ │ │ │ │ +│ │ │ propagation.extract(req)│ │ baggage: │ │ │ +│ │ │ │ │ user.id=user123, │ │ │ +│ │ │ const span = tracer │ │ session.id=sess456, │ │ │ +│ │ │ .startSpan('handler', │ │ team.name=frontend │ │ │ +│ │ │ { parent: context }) │ │ │ │ │ +│ │ │ │ │ │ │ │ +│ │ │ // Extracted Context: │ │ │ │ │ +│ │ │ // • Parent TraceID │ │ │ │ │ +│ │ │ // • Parent SpanID │ │ │ │ │ +│ │ │ // • Inherited Baggage │ │ │ │ │ +│ │ │ }) │ │ │ │ │ +│ │ └─────────────────────────────┘ └─────────────────────────────┘ │ │ +│ └──────────────────────────┬────────────────────────────────────────────────┘ │ +│ │ │ +│ Async Operation Flow │ │ +│ ┌──────────────────────────▼────────────────────────────────────────────────┐ │ +│ │ Async Context Continuity │ │ +│ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Promise Chain │ │ │ +│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ +│ │ │ │ Initial │ │ Promise │ │ Final │ │ │ │ +│ │ │ │ Context │───▶│ Context │───▶│ Context │ │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ │ TraceID: │ │ TraceID: │ │ TraceID: │ │ │ │ +│ │ │ │ abc123... │ │ abc123... │ │ abc123... │ │ │ │ +│ │ │ │ │ │ │ │ │ │ │ │ +│ │ │ │ SpanID: │ │ SpanID: │ │ SpanID: │ │ │ │ +│ │ │ │ original │ │ promise │ │ final │ │ │ │ +│ │ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │ │ +│ │ └───────────────────────────────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Event Loop / Callback Chain │ │ │ +│ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ +│ │ │ │ Callback 1 │ │ Callback 2 │ │ Callback N │ │ │ │ +│ │ │ │ Context │───▶│ Context │───▶│ Context │ │ │ │ +│ │ │ │ Preserved │ │ Preserved │ │ Preserved │ │ │ │ +│ │ │ └─────────────┘ └─────────────┘ └─────────────────────────┘ │ │ │ +│ │ └───────────────────────────────────────────────────────────────────────┘ │ │ +│ └────────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Benefits: │ +│ • End-to-end traceability across services │ +│ • Automatic context inheritance in async operations │ +│ • Standards-compliant propagation (W3C) │ +│ • Baggage for cross-cutting concerns │ +│ • Multiple propagator support for interoperability │ +│ │ +└─────────────────────────────────────────────────────────────────────────────────┘ +``` + +## Implementation Phases + +The SDK implementation follows a structured 6-phase approach designed to deliver incremental value while building upon architectural foundations: + +### Overview of Implementation Phases + +1. **Phase 1: Core Architecture Foundation** - Interface design, factory functions, and development infrastructure +2. **Phase 2: Core Implementation** - Unified SDK implementation with IoC patterns and context management +3. **Phase 3: Tracer Provider Implementation** - Complete distributed tracing capabilities +4. **Phase 4: Logger Provider Implementation** - Structured logging with trace correlation +5. **Phase 5: Basic Metric Provider Implementation** - Simple metrics collection (counters, histograms, gauges) +6. **Phase 6: Exporters and Integrations** - Production-ready exporters and framework integrations + +### Detailed Implementation Specifications + +For comprehensive implementation planning including detailed technical tasks, milestones, and timeline, see: + +**[OTelWebSdk-Implementation.md - Implementation Phases](./OTelWebSdk-Implementation.md#implementation-phases)** + +The Implementation document provides: +- Detailed technical tasks and objectives for each phase +- Cross-references to architectural patterns and requirements defined in this document +- Specific interface requirements and implementation patterns +- Timeline and milestone planning +- Resource coordination and dependency management + +## Next Steps and Implementation Roadmap + +### Phase 1: Core Architecture Implementation +- Interface definitions and factory function framework +- Closure + DynamicProto pattern implementation +- **Inversion of Control implementation with dependency injection** +- Basic configuration and error handling +- Core SDK lifecycle management + +### Phase 2: Telemetry Providers +- Trace provider with span management +- Log provider with structured logging +- Metric provider with instrument types +- Context propagation implementation + +### Phase 3: Processing and Export +- Processor pipeline implementation +- Azure Monitor exporters +- OTLP exporters +- Performance optimization + +### Phase 4: Advanced Features +- Auto-instrumentation implementation +- Framework-specific integrations +- Advanced sampling and filtering +- Monitoring and diagnostics + +### Phase 5: Future-Proofing Features +- Plugin architecture implementation +- Extension registry system +- Migration tools and compatibility layers +- Performance monitoring and alerting + +## Quality Assurance and Operations + +### Testing Strategy +Comprehensive testing approach covering: +- **Unit Testing**: Component-level validation with Dynamic Proto patterns +- **Integration Testing**: Cross-component interaction verification +- **Performance Testing**: Benchmarking and regression detection +- **Browser Compatibility**: Cross-browser validation and fallback testing + +For detailed testing strategies and implementation examples, see [Testing Strategy](./OTelWebSdk-Testing.md). + +### Performance Optimization +Advanced performance techniques including: +- **Lazy Initialization**: Minimize startup impact +- **Object Pooling**: Efficient memory management +- **Batch Processing**: Optimized data export +- **Performance Monitoring**: Real-time performance tracking + +For comprehensive performance strategies and optimization techniques, see [Performance Strategy](./OTelWebSdk-Performance.md). + +### Migration Support +Support for migrating from existing telemetry solutions: +- **Application Insights Migration**: API mapping and compatibility layers +- **OpenTelemetry Community Migration**: Configuration and API conversion +- **Custom Telemetry Migration**: Generic migration framework +- **Data Continuity**: Validation and dual-tracking strategies + +For detailed migration guides and tools, see [Migration Guide](./OTelWebSdk-Migration.md). + +## Unified Architecture: Web-Specific Requirements + Multi-Instance SDK Factory + +The OTelWebSdk combines unique web application requirements with a multi-instance SDK factory to deliver enterprise-grade observability. This unified approach addresses both the technical constraints of browser environments and the organizational needs of multi-team development. + +### **Web-Specific Requirements Driving Multi-Instance SDK Factory Design** + +The multi-instance SDK factory directly addresses the unique challenges of web applications: + +#### **1. Browser Environment Optimizations** +- **Bundle Size Management**: Factory coordinates selective loading to minimize impact + - Shared module loading across teams reduces redundant code + - Tree-shaking optimization through centralized dependency management + - Progressive loading of instrumentation based on actual team needs + +- **Memory Management**: Factory prevents leaks through coordinated lifecycle management + - Centralized cleanup coordination across all team instances + - Shared object pools and resource reuse to minimize memory footprint + - Automatic garbage collection triggering for long-running SPAs + +- **Network Efficiency**: Factory optimizes telemetry transmission + - Intelligent batching across team boundaries for fewer network calls + - Shared connection pooling and keep-alive management + - Coordinated retry logic and circuit breaker patterns + +```typescript +// Factory optimizes resource usage across teams +const factory = createSdkFactory(); + +// Team A gets access to shared instrumentations automatically +const teamA = factory.newInst('team-a', { + connectionString: 'InstrumentationKey=team-a-key', + samplingRate: 100 +}); + +// Team B also gets shared instrumentations + their own additions +const teamB = factory.newInst('team-b', { + connectionString: 'InstrumentationKey=team-b-key', + instrumentations: ['user-interaction'], // Team-specific instrumentations + samplingRate: 50 +}); + +// Factory coordinates: Single XHR/fetch/errors hooks shared across teams +// Each team gets isolated telemetry context while sharing infrastructure +``` + +#### **2. Rich Web Telemetry Through Factory Coordination** +- **Cross-Team Performance Correlation**: Factory enables holistic performance monitoring + - Unified Core Web Vitals tracking across team components + - Cross-team user journey tracking and session correlation + - Global performance budgets and alerting across team boundaries + +- **Shared Context Management**: Factory maintains consistent context across teams + - User session state shared across team components + - Device context and browser capabilities detected once, shared everywhere + - Consistent correlation IDs and distributed tracing across team boundaries + +```typescript +// Manager provides unified web telemetry context +const factory = createSdkFactory(); + +// Teams automatically get rich web context +const ecommerceSDK = factory.newInst('ecommerce', { + contextOverrides: { 'team.domain': 'shopping' }, + // Global web context shared across all teams + webVitalsTracking: true, + userSessionManagement: true, + globalPerformanceBudgets: { + 'core.web.vitals.lcp': 2500, + 'core.web.vitals.fid': 100 + } +}); + +// Automatic Core Web Vitals tracking with team context +ecommerceSDK.trackPageView({ + name: 'Product Listing', + // Automatically includes shared web vitals + team context +}); +``` + +### **Multi-Instance SDK Factory Goals** + +The factory serves as the **primary entry point** for all SDK creation and lifecycle management, directly addressing web application needs: + +#### **3. Centralized SDK Factory with Web Optimizations** +- **Smart Instance Creation**: Factory creates SDK instances with web-optimized configurations + - Automatic platform detection (browser, web worker, SSR) and adaptation + - Browser capability detection and graceful degradation + - Optimal configuration inheritance based on environment constraints + +- **Resource Efficiency**: Factory coordinates shared resources for minimal web impact + - Single timer pool shared across all team instances + - Shared network connections with optimal batching strategies + - Coordinated instrumentation hooks to prevent duplicate event listeners + +```typescript +// Manager automatically optimizes for web environment +const factory = createSdkFactory(); + +// Automatic platform detection and optimization +if (factory.platform === 'browser') { + // Browser-specific optimizations automatically applied + console.log('Web Vitals tracking enabled'); + console.log('User interaction monitoring active'); +} else if (factory.platform === 'webworker') { + // Web worker optimizations + console.log('Limited DOM instrumentation for worker context'); +} + +// All team instances inherit optimal web configuration +const teamSDK = factory.newInst('analytics', {}); +``` + +#### **4. Multi-Tenant Architecture for Enterprise Web Apps** +- **Shared Infrastructure**: Teams share web-optimized resources without conflicts + - Single DOM modification coordinator to prevent team interference + - Shared performance observers (Intersection, Performance, Mutation) + - Consolidated error handling for unhandled exceptions and promise rejections + +- **Configuration Inheritance**: Web-specific defaults with team customization + - Enterprise-wide web performance standards and compliance settings + - Team-specific sampling rates while maintaining global telemetry coherence + - Consistent privacy and GDPR compliance across all team components + +```typescript +// Enterprise-wide web standards with team flexibility +const factory = createSdkFactory(); + +// Team A - inherits enterprise standards + their own needs +const teamA = factory.newInst('frontend-team', { + instrumentations: ['user-interaction'], + // Enterprise web policy applied to all teams + webStandards: { + enableCoreWebVitals: true, + privacyCompliance: 'GDPR', + performanceBudgets: { lcp: 2500, fid: 100, cls: 0.1 }, + requiredInstrumentations: ['xhr', 'fetch', 'errors'] + }, + samplingRate: 100 // Higher sampling for critical team +}); + +// Team B - different needs, same enterprise compliance +const teamB = factory.newInst('widget-team', { + samplingRate: 10, // Lower sampling for high-volume widgets + excludeUrls: ['/internal/', '/admin/'] +}); +``` + +#### **5. Coordinated Web Telemetry Management** +- **Cross-Team Correlation**: Manager enables holistic web application monitoring + - Unified user journey tracking across team boundaries + - Cross-team performance impact analysis and attribution + - Global web application health monitoring and alerting + +- **Intelligent Export Coordination**: Manager optimizes telemetry transmission for web + - Coordinated batching to minimize network impact on user experience + - Adaptive sampling based on network conditions and battery status + - Progressive Web App support with offline telemetry queuing + +```typescript +// Manager enables cross-team web application insights +const factory = createSdkFactory(); + +// Configure global web telemetry coordination +// Teams automatically participate in coordinated telemetry +const checkoutTeam = factory.newInst('checkout', { + crossTeamCorrelation: { + enableUserJourneyTracking: true, + enablePerformanceAttribution: true, + enableGlobalErrorCorrelation: true + }, + exportOptimization: { + adaptiveToNetworkConditions: true, + respectBatteryStatus: true, + enableOfflineQueuing: true, + maxBatchSize: 50 // Optimized for web transmission + } +}); +const paymentTeam = factory.newInst('payment', {}); + +// Manager automatically correlates user journey across teams +// checkout -> payment flow tracked holistically +``` + +#### **6. ⚙️ Dynamic Configuration with Web-Specific Adaptations** +- **Runtime Configuration Updates**: Manager supports hot configuration changes for web apps + - Dynamic feature flag updates without application restart + - A/B testing configuration changes with immediate effect + - Performance budget adjustments based on real user monitoring data + +- **Progressive Enhancement**: Manager adapts configuration based on browser capabilities + - Feature detection and graceful fallback for unsupported ES2020+ runtimes + - Progressive enhancement for modern browser APIs + - Adaptive telemetry collection based on device capabilities + +```typescript +// Dynamic web configuration management +const factory = createSdkFactory(); + +// Manager adapts to browser capabilities automatically +factory.onCapabilityChange((capabilities) => { + if (capabilities.supportsPerformanceObserver) { + // Enable advanced performance monitoring + factory.updateConfig({ + enableAdvancedPerformanceTracking: true + }); + } + + if (capabilities.hasServiceWorker) { + // Enable offline telemetry capabilities + factory.updateConfig({ + enableOfflineTelemetry: true + }); + } +}); + +// Hot configuration updates for A/B testing +await factory.updateConfig({ + experimentalFeatures: { + 'new-user-interaction-tracking': true + }, + samplingRateOverrides: { + 'experiment-group-a': 50, + 'experiment-group-b': 100 + } +}); +``` + +#### **7. Complete Lifecycle Management with Web Cleanup** +- **Coordinated Unload**: Manager ensures complete cleanup across all team instances + - DOM cleanup coordination to prevent memory leaks + - Event listener removal across all team instrumentations + - Service worker cleanup and cache management + +- **Web-Specific Resource Management**: Manager handles browser-specific cleanup + - Page visibility API integration for resource optimization + - Beacon API usage for reliable telemetry transmission on unload + - Progressive Web App state management + +```typescript +// Comprehensive web lifecycle management +const factory = createSdkFactory(); + +// Page unload coordination +window.addEventListener('beforeunload', async (event) => { + // Manager coordinates graceful shutdown across all teams + await factory.unloadAllInstances(); + + // Uses beacon API for final telemetry transmission + factory.flushPendingTelemetry(); +}); + +// Automatic cleanup on visibility change +document.addEventListener('visibilitychange', () => { + if (document.hidden) { + factory.pauseNonCriticalTelemetry(); + } else { + factory.resumeTelemetry(); + } +}); +``` + +### **Application Insights SDK Requirements Integration** + +The multi-instance SDK factory seamlessly integrates proven Application Insights requirements: + +#### **8. Enterprise Multi-Team Architecture** +- **Team Isolation**: Independent SDK instances with isolated telemetry contexts +- **Shared Resource Efficiency**: Common infrastructure shared without conflicts +- **Enterprise Governance**: Centralized policy enforcement with team flexibility + +#### **9. Web-Optimized Bundle Management** +- **Intelligent Loading**: Manager coordinates module loading to minimize bundle impact +- **Tree-Shaking Optimization**: Centralized dependency management for better elimination +- **Progressive Enhancement**: Feature detection and adaptive loading + +#### **10. Universal Runtime Support with Web Focus** +- **Platform Adaptation**: Manager detects and adapts to runtime environment +- **SSR Compatibility**: Seamless server-side rendering with client-side hydration +- **Web Worker Support**: Full functionality in worker contexts + +### **Benefits of Unified Architecture** + +This combined approach delivers unique value: + +1. **Web-Native Performance**: Manager coordination ensures minimal impact on web application performance +2. **Enterprise Scale**: Multi-team support without sacrificing individual team needs +3. **Comprehensive Observability**: Rich web telemetry with cross-team correlation capabilities +4. **Migration Friendly**: Easy adoption path from existing Application Insights implementations +5. **Universal Compatibility**: Supports all JavaScript environments while optimizing for web + +The multi-instance SDK factory transforms web-specific requirements from constraints into coordinated capabilities, enabling enterprise-scale observability that enhances rather than hinders web application performance. diff --git a/docs/planning/otel/specs/OTelWebSdk-Implementation.md b/docs/planning/otel/specs/OTelWebSdk-Implementation.md new file mode 100644 index 000000000..15f4a2ab8 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Implementation.md @@ -0,0 +1,282 @@ + +## Implementation Roadmap + +### Phase 1: Core Architecture Foundation +- **Multi-Tenant SDK Factory**: Implement centralized SDK factory with IoC patterns +- **Interface Definitions**: Define comprehensive TypeScript interfaces for all components +- **Closure OR DynamicProto**: Implement bundle-size-optimized patterns (closure for simple implementations, DynamicProto for complex inheritance) +- **Dependency Injection**: Build IoC container with explicit dependency management +- **Basic Configuration**: Implement reactive configuration management system + +### Phase 2: Core Telemetry Providers +- **Trace Provider**: Implement distributed tracing with span management +- **Logger Provider**: Build structured logging with Application Insights extensions +- **Meter Provider**: Create metrics collection with web-specific instruments +- **Context Management**: Implement W3C trace context and baggage propagation + +### Phase 3: Web-Specific Features +- **Auto-Instrumentation**: Implement XHR, fetch, and user interaction tracking +- **Core Web Vitals**: Add performance monitoring with LCP, FID, CLS tracking +- **Application Insights Extensions**: Build AI-specific telemetry APIs +- **Error Handling**: Implement comprehensive error tracking and reporting + +### Phase 4: Enterprise Features +- **Multi-Team Support**: Complete multi-tenant isolation +- **Dynamic Configuration**: Implement hot configuration updates and A/B testing +- **Unload Management**: Build comprehensive cleanup and resource management +- **Bundle Optimization**: Optimize for tree-shaking and minimal bundle size + +### Phase 5: Advanced Features +- **Universal Runtime**: Support Browser, Web Worker, Node.js, SSR environments +- **Plugin Architecture**: Build extensible plugin system for custom telemetry +- **Migration Tools**: Create migration utilities from Application Insights v2 +- **Performance Monitoring**: Self-monitoring and diagnostics capabilities + +For detailed implementation specifications, refer to the individual component design documents in this planning directory. + +## Implementation Phases + +### Phase 1: Core Architecture Foundation +**Objective**: Establish the foundational architecture and development environment + +**Key Tasks**: +- **Interface Design**: Define comprehensive interfaces for all components following interface-first design pattern + - Create core OpenTelemetry interface definitions from OpenTelemetry specification + - Design public interfaces with `I` prefix (e.g., `ISdkLoader`, `IUnloadResult`) + - Create OpenTelemetry interfaces with `IOTel` prefix (e.g., `IOTelTraceProvider`, `IOTelLogger`, `IOTelMeterProvider`) + - Design internal interfaces with `_I` prefix and `@internal` TypeDoc tags + - Ensure all interfaces have comprehensive TypeDoc documentation with examples and default values + - *See [Interface-First Design](./OTelWebSdk-Architecture.md#1-interface-first-design) for detailed architectural requirements* +- **Factory Functions**: Create factory functions for component instantiation following `create*` naming pattern + - Implement `createOTelWebSdk`, `createTracerProvider`, `createLoggerProvider`, `createMeterProvider` + - Ensure factory functions return interface types, not implementation types + - Handle all initialization complexity including configuration validation and dependency setup + - *See [Factory Function Pattern](./OTelWebSdk-Architecture.md#3-factory-function-pattern-with-dependency-injection) for architectural details* +- **Project Structure**: Set up npm package configuration and monorepo structure + - Configure TypeScript with strict settings for interface validation + - Set up build tools (Rollup) with multiple output formats and tree-shaking optimization + - Configure linting and formatting tools aligned with ApplicationInsights-JS standards +- **Development Infrastructure**: Create build tools and test infrastructure with interface validation + - Set up automated testing framework with interface mocking capabilities + - Create performance testing infrastructure for bundle size monitoring + - Implement continuous integration pipeline with cross-browser testing + +### Phase 2: Core Implementation +**Objective**: Implement the foundational SDK components and configuration management + +**Key Tasks**: +- **Unified SDK Implementation**: Implement the unified OTelWebSdk class behind IOTelWebSdk interface + - Create factory function to return IOTelWebSdk interface implementation using closure pattern for bundle size optimization + - Implement lifecycle management (initialize, shutdown, flush, unload methods) + - Add comprehensive error handling and diagnostics with proper logging + - Implement configuration validation and parsing (connection strings, endpoints) + - *See [Implementation Patterns](./OTelWebSdk-Architecture.md#implementation-patterns-closures-or-dynamicproto-classes) for bundle size optimization patterns* +- **Context Management**: Implement the context manager with explicit context handling via IOTelContextManager interface + - Create factory function to return IOTelContextManager interface implementation with proper context storage and propagation + - Implement W3C Trace Context and Baggage propagation standards + - Add support for async context propagation without global state + - Implement context injection and extraction for HTTP headers + - *See [Inversion of Control Architecture](./OTelWebSdk-Architecture.md#inversion-of-control-architecture) for IoC requirements* +- **Configuration Handling**: Set up configuration handling with dynamic configuration support using interface-based configuration + - Implement IOTelWebSdkConfig interface with comprehensive validation + - Add support for Azure Monitor connection string parsing + - Implement runtime configuration updates and hot-reloading capabilities + - Add configuration inheritance and override mechanisms +- **Resource Management**: Implement resource management for telemetry source identification through IResource interface + - Create resource attribute collection and validation + - Implement service identification (name, version, environment detection) + - Add platform-specific resource detection (browser, version, user-agent) + - Implement resource merging and conflict resolution +- **Common Utilities**: Create common utilities for attribute handling and validation with proper interface contracts + - Implement attribute validation and sanitization utilities + - Create telemetry item enrichment and processing utilities + - Add performance monitoring utilities for SDK overhead tracking + - Implement throttling and sampling utilities + +### Phase 3: Tracer Provider Implementation +**Objective**: Complete distributed tracing capabilities with full OpenTelemetry compliance + +**Key Tasks**: +- **Tracer Provider & Tracer**: Implement tracer provider and tracer interfaces (IOTelTraceProvider, IOTelTracer, IOTelSpan) + - Create factory functions to return interface implementations using closure pattern for bundle size optimization + - Implement span lifecycle management (start, end, context linking) + - Add span attribute validation and limits enforcement + - Implement span event recording and exception tracking + - *See [Interface-First Design](./OTelWebSdk-Architecture.md#1-interface-first-design) for interface requirements* +- **Span Management**: Implement span creation and management with proper context via interface-based design + - Create factory function to return IOTelSpan interface implementation with comprehensive span data model + - Implement span context creation and propagation + - Add span relationships (parent-child, links) and trace tree construction + - Implement span status codes and error handling +- **Sampling Strategies**: Add sampling strategies including parent-based sampling through ISampler interface + - Implement TraceIdRatioBasedSampler for probabilistic sampling + - Create ParentBasedSampler for distributed sampling decisions + - Add custom sampling strategies with configurable rules + - Implement sampling decision recording and propagation +- **Span Processors**: Implement span processors (simple and batch) via ISpanProcessor interface + - Create SimpleSpanProcessor for immediate processing + - Implement BatchSpanProcessor with configurable batching parameters + - Add span processor chaining and parallel execution + - Implement processor lifecycle management and error handling +- **Span Exporters**: Create span exporters including Azure Monitor exporter through ISpanExporter interface + - Implement AzureMonitorSpanExporter with proper data transformation + - Create OTLPSpanExporter for OpenTelemetry Protocol compliance + - Add ConsoleSpanExporter for development and debugging + - Implement exporter retry logic and failure handling +- **Context Propagation**: Implement W3C trace context propagation via IPropagator interface + - Create W3CTraceContextPropagator for standard propagation + - Implement B3Propagator for compatibility with existing systems + - Add JaegerPropagator for Jaeger tracing integration + - Implement baggage propagation and context injection/extraction + +### Phase 4: Logger Provider Implementation +**Objective**: Structured logging with correlation to traces and comprehensive filtering + +**Key Tasks**: +- **Logger Provider & Logger**: Implement logger provider and logger interfaces (IOTelLogProvider, IOTelLogger, ILogRecord) + - Create factory functions to return interface implementations using closure pattern for bundle size optimization + - Implement log record creation with severity levels and structured data + - Add log correlation with active spans and trace context + - Implement logger hierarchy and inheritance + - *See [Implementation Patterns](./OTelWebSdk-Architecture.md#implementation-patterns-closures-or-dynamicproto-classes) for bundle size optimization patterns* +- **Log Record Management**: Implement log record creation with severity levels via interface-based design + - Create factory function to return ILogRecord interface implementation with comprehensive log data model + - Implement log severity mapping and validation (TRACE to FATAL) + - Add structured logging support with nested objects and arrays + - Implement log correlation with traces using span context +- **Filtering Capabilities**: Add filtering capabilities based on severity and attributes through ILogFilter interface + - Implement SeverityLogFilter for level-based filtering + - Create AttributeLogFilter for attribute-based filtering + - Add composite filters with AND/OR logic operations + - Implement dynamic filter configuration and updates +- **Log Processors**: Implement log record processors (simple and batch) via ILogProcessor interface + - Create SimpleLogRecordProcessor for immediate processing + - Implement BatchLogRecordProcessor with configurable batching + - Add log enrichment processors for automatic attribute addition + - Implement log processor chaining and error handling +- **Log Exporters**: Create log record exporters including Azure Monitor exporter through ILogExporter interface + - Implement AzureMonitorLogExporter with proper data transformation + - Create OTLPLogExporter for OpenTelemetry Protocol compliance + - Add ConsoleLogExporter for development and debugging + - Implement log exporter retry logic and failure handling +- **Structured Logging**: Add structured logging support with proper interface contracts + - Implement support for complex object serialization + - Add log template processing and parameter substitution + - Implement log correlation with distributed traces + - Add support for log aggregation and pattern detection + +### Phase 5: Basic Metric Provider Implementation +**Objective**: Simple metrics collection with basic instrument types only (no views or complex aggregations) + +**Key Tasks**: +- **Meter Provider & Meter**: Implement basic meter provider and meter interfaces (IOTelMeterProvider, IOTelMeter) + - Create factory functions to return interface implementations using closure pattern for bundle size optimization + - Implement basic meter registry and instrument management + - Add meter versioning and scope management + - Implement meter lifecycle and cleanup + - *See [SDK Instance Factory Architecture](./OTelWebSdk-Architecture.md#sdk-instance-factory-architecture) for factory patterns* +- **Basic Metrics Collection**: Implement simple metrics collection for basic instrument types only + - Create factory function to return ICounter interface implementation for simple monotonic value accumulation + - Create factory function to return IHistogram interface implementation for basic value distribution measurement + - Create factory function to return IGauge interface implementation for current value observations + - Note: Advanced instruments and observable metrics are intentionally excluded +- **Simple Aggregation Support**: Add basic aggregation support for simple metric types + - Implement SumAggregator for counter aggregation + - Create HistogramAggregator with basic bucket boundaries + - Add LastValueAggregator for gauge metrics + - Note: Complex aggregation strategies and custom aggregators are excluded for simplicity +- **Basic Metric Exporters**: Create simple metric exporters through IMetricExporter interface + - Implement basic AzureMonitorMetricExporter with simple data transformation + - Add basic metric exporter configuration and lifecycle management + - Note: Advanced exporters like Prometheus and complex OTLP features are excluded + +### Phase 6: Exporters and Integrations +**Objective**: Production-ready exporters and framework integrations for real-world deployment + +**Key Tasks**: +- **Enhanced Azure Monitor Exporters**: Enhance Azure Monitor exporters with advanced features via interface-based extensibility + - Implement connection string parsing and endpoint resolution + - Add authentication and authorization mechanisms + - Implement data compression and batching optimization + - Add retry logic with exponential backoff and circuit breaker patterns + - Implement data transformation and schema mapping + - Add telemetry correlation across spans, logs, and basic metrics (counters, histograms, gauges only) + - *See [Modular Architecture](./OTelWebSdk-Architecture.md#5-modular-architecture) for extensibility patterns* +- **Browser Instrumentations**: Create browser-specific instrumentations (XHR, fetch, user interaction) through IInstrumentation interface + - Implement XMLHttpRequest instrumentation for automatic span creation + - Create Fetch API instrumentation with request/response correlation + - Add user interaction instrumentation (clicks, navigation, page loads) + - Implement resource timing instrumentation for performance monitoring + - Create navigation timing instrumentation for page load metrics + - Add error and exception tracking with stack trace capture +- **Framework Integrations**: Add framework integrations (React, Angular, Vue) with interface-based plugin architecture + - Create React integration with hooks and context providers + - Implement Angular integration with services and decorators + - Add Vue.js integration with mixins and plugins + - Create framework-agnostic base classes for easy extension + - Implement component lifecycle tracking and performance monitoring +- **Automatic Instrumentation**: Implement automatic instrumentation capabilities via IAutoInstrumentation interface + - Create auto-instrumentation loader with dynamic patching + - Implement module detection and selective instrumentation + - Add configuration-driven instrumentation enabling/disabling + - Create instrumentation for popular libraries (axios, jquery, etc.) + - Implement database query instrumentation (when applicable) +- **Migration Tools**: Create compatibility layers for easier migration using interface adapters + - Implement ApplicationInsights SDK compatibility layer + - Create OpenTelemetry JS compatibility adapters + - Add migration utilities and data transformation tools + - Implement feature parity mapping and gap analysis tools +- **Documentation & Examples**: Provide comprehensive examples and documentation demonstrating interface usage patterns + - Create comprehensive API documentation with TypeDoc + - Implement interactive examples and tutorials + - Add best practices guides and troubleshooting documentation + - Create migration guides from existing solutions + - Implement sample applications for different frameworks + +### Related Architecture Documents + +For comprehensive architectural specifications and design patterns referenced above: + +- **[Core Architectural Principles](./OTelWebSdk-Architecture.md#core-architectural-principles)** - Interface-first design, factory patterns, IoC principles +- **[Inversion of Control Architecture](./OTelWebSdk-Architecture.md#inversion-of-control-architecture)** - Dependency injection and no-global-state patterns +- **[SDK Instance Factory Architecture](./OTelWebSdk-Architecture.md#sdk-instance-factory-architecture)** - Multi-tenant factory patterns and resource coordination + +## Implementation Timeline + +### Detailed Timeline with Milestones + +**Month 1 - Foundation & Core (Phases 1-2)** +- **Week 1-2**: Infrastructure Setup (Phase 1) + - Interface design and factory function creation + - Project structure and development environment setup + - Build tools and testing infrastructure +- **Week 3-4**: Core Implementation (Phase 2) + - Unified SDK implementation with lifecycle management + - Context manager and configuration handling + - Resource management and common utilities + +**Month 2 - Telemetry Providers (Phases 3-4)** +- **Week 1-2**: Trace Provider Implementation (Phase 3) + - Trace provider, tracer, and span implementation + - Sampling strategies and span processors + - Span exporters and W3C context propagation +- **Week 3-4**: Log Provider Implementation (Phase 4) + - Log provider, logger, and log record implementation + - Filtering capabilities and log processors + - Log exporters and structured logging support + +**Month 3 - Metrics & Integration (Phases 5-6)** +- **Week 1-2**: Metric Provider Implementation (Phase 5) + - Meter provider and various instrument types + - Aggregation support and views configuration + - Metric exporters and exemplars support +- **Week 3-4**: Exporters and Integrations (Phase 6) + - Enhanced Azure Monitor exporters + - Browser instrumentations and framework integrations + - Automatic instrumentation and migration tools + +### Critical Milestones +- **End of Month 1**: Core SDK functional with basic tracing +- **End of Month 2**: Complete telemetry stack (traces, logs, metrics) +- **End of Month 3**: Production-ready with full Azure Monitor integration + diff --git a/docs/planning/otel/specs/OTelWebSdk-Instrumentation.md b/docs/planning/otel/specs/OTelWebSdk-Instrumentation.md new file mode 100644 index 000000000..85612c74a --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Instrumentation.md @@ -0,0 +1,403 @@ +# OpenTelemetry Web SDK - Dynamic Instrumentation Management + +This document details the comprehensive dynamic instrumentation system that enables teams to manage telemetry collection granularly without impacting the entire SDK lifecycle. + +## Dynamic Instrumentation Management + +The OTelWebSdk provides comprehensive support for dynamic instrumentation loading and unloading, enabling teams to manage telemetry collection granularly without impacting the entire SDK lifecycle. + +### Individual Instrumentation Lifecycle + +Unlike traditional SDKs that require full restart for instrumentation changes, the OTelWebSdk supports fine-grained instrumentation management: + +#### **Loading Instrumentations Dynamically** + +The SDK provides a comprehensive instrumentation management interface that supports: + +**Core Operations:** +- **Runtime Loading**: Load individual instrumentations at runtime with configuration options +- **Status Checking**: Query whether specific instrumentations are currently loaded +- **Instance Access**: Retrieve loaded instrumentation instances for direct manipulation +- **Inventory Management**: List all currently loaded instrumentations for monitoring +- **Selective Unloading**: Unload specific instrumentations without affecting others +- **Hot-Swap Capability**: Replace instrumentations for A/B testing by unloading and loading different variants + +**Configuration Options:** +- **Enable/Disable State**: Control instrumentation activation state +- **Custom Properties**: Add instrumentation-specific metadata and context +- **Performance Monitoring**: Optional Application Insights `doPerf`/`IPerfManager` integration + +**Load Result Information:** +- **Success/Failure Status**: Track instrumentation loading outcomes +- **Error Details**: Access detailed error information for failed loads +- **Performance Metrics**: Monitor load time and memory impact +- **Validation Results**: Ensure instrumentation compatibility and requirements + +#### **Dynamic Instrumentation Usage Patterns** + +**Basic Instrumentation Management:** +Teams can directly load instrumentations through the SDK instance after initialization. The SDK provides methods to dynamically load instrumentations like XHR tracking with full configuration including custom properties for team identification. The SDK tracks all loaded instrumentations and allows selective unloading for performance optimization while maintaining operation. + +**Runtime Monitoring:** +Teams can query the SDK directly for the current state of loaded instrumentations to understand their telemetry collection scope. Individual instrumentations can be unloaded based on performance requirements or business logic while the SDK continues operating with remaining instrumentations. + +**Performance Integration:** +All instrumentation operations may include optional performance monitoring integration with the Application Insights performance monitoring framework for observability. + +### A/B Testing and Experimentation + +The dynamic instrumentation system enables sophisticated A/B testing scenarios through the underlying capabilities of dynamic loading/unloading and configuration management. Teams can implement their own experiment frameworks using these capabilities: + +**Experiment Implementation Capabilities:** +The SDK provides the foundational capabilities that teams can use to implement experiment group management. Teams can determine user experiment groups and load different instrumentations based on their assignment logic. Control groups might use basic click tracking, while variant groups use enhanced interaction tracking or experimental gesture tracking with different experiment metadata. + +**Runtime Experiment Support:** +The system's hot-swapping capabilities enable experiment transitions during active sessions. Teams can implement experiment switching logic that transitions from basic tracking to enhanced tracking mid-session while maintaining experiment attribution and continuity. + +**Experiment Attribution Support:** +All instrumentation configurations support custom properties for experiment group identification, enabling teams to implement proper attribution and analysis of A/B test results across different tracking approaches. + +### Performance-Aware Loading + +The SDK is designed to be performance-conscious when loading instrumentations, with guidelines and best practices to respect browser performance constraints. Teams should be mindful of the 5ms frame budget guideline to prevent UI jank, ensuring smooth user experience. + +**Optional Performance Monitoring:** +When enabled, instrumentation operations can optionally integrate with the Application Insights `doPerf`/`IPerfManager` framework for performance observability including timing instrumentation, operation monitoring, and resource usage tracking. + +### Performance Metrics and Frame Budget Guidelines + +When performance monitoring is enabled, comprehensive metrics are available: + +**Operation Metrics:** +- Total operation counts and execution duration tracking +- Frame budget compliance monitoring (5ms guideline) +- Memory footprint monitoring to prevent bloat in long-running applications + +**Benefits of Optional Performance Framework:** + +1. **Optional Monitoring**: Performance tracking can be enabled when needed for observability +2. **Performance Metrics**: Built-in observability of SDK performance impact when enabled +3. **Performance Trends**: Historical performance data for optimization when monitoring is active +4. **Detailed Diagnostics**: Precise timing and resource usage tracking when configured +5. **Configurable Overhead**: Teams can choose between minimal overhead or detailed observability +6. **Targeted Analysis**: Enable monitoring for specific instrumentations or scenarios + +### Understanding the 5ms Frame Budget Guideline + +The OTelWebSdk follows a 5ms frame budget guideline to help ensure telemetry operations don't impact user experience. This conservative guideline provides significant headroom for smooth animations and interactions: + +#### **Frame Budget vs Frame Rate Relationship** + +| Frame Rate | Frame Duration | Telemetry Budget (5ms) | Remaining for App | +|------------|----------------|------------------------|-------------------| +| **120fps** | 8.33ms | 5ms (60% of frame) | 3.33ms | +| **60fps** | 16.67ms | 5ms (30% of frame) | 11.67ms | +| **30fps** | 33.33ms | 5ms (15% of frame) | 28.33ms | + +#### **Why 5ms is Conservative** + +- **High Refresh Displays**: On 120Hz displays (8.33ms frames), 5ms still leaves 3.33ms for application logic +- **Standard Displays**: On 60Hz displays (16.67ms frames), 5ms leaves 11.67ms for smooth animations +- **Performance Headroom**: Even on slower devices targeting 30fps (33.33ms frames), 5ms uses only 15% of the frame budget +- **Real-World Margin**: Accounts for browser overhead, garbage collection, and other background tasks + +#### **Practical Benefits** + +**Telemetry Operation Breakdown:** +The SDK is designed with a 5ms guideline across all telemetry operations including event capture (~1ms), data processing (~2ms), batching/queuing (~1ms), and buffer time (~1ms). This guideline helps ensure total telemetry time stays within performance boundaries. + +**Application Performance Protection:** +Following the guideline helps preserve frame time for critical application functions including animation rendering, user input handling, business logic execution, and DOM updates. This helps ensure telemetry collection doesn't interfere with user experience. + +#### **Conservative Design Philosophy** + +The 5ms guideline helps ensure telemetry is **non-intrusive**: +- Designed to avoid visible UI jank or stuttering +- Helps preserve smooth scrolling and animations +- Designed to maintain responsive user interactions +- Provides safety margin for unexpected performance variations +- Designed to work well across all device performance tiers + +### Coordinated Multi-Team Instrumentation + +The factory coordinates instrumentation loading across teams to prevent conflicts: + +**Shared Resource Coordination:** +When multiple teams load the same instrumentation (like XHR tracking), the factory efficiently reuses existing hooks and event listeners rather than creating duplicates. This prevents DOM pollution and resource waste while maintaining team isolation. + +**Independent Team Management:** +Each team can independently unload their instrumentations without affecting other teams. The factory maintains reference counting to ensure shared instrumentations remain active as long as any team requires them, only cleaning up resources when all teams have unloaded. + +**Conflict Prevention:** +The coordination system prevents teams from interfering with each other's instrumentation while maximizing resource sharing efficiency. + +### Resource Cleanup and Memory Management + +Individual instrumentation unloading ensures proper resource cleanup through SDK methods: + +**SDK Instrumentation Management:** +The SDK provides direct methods for instrumentation lifecycle management including loading, unloading, and status checking. When unloading an instrumentation, the SDK ensures proper cleanup by stopping telemetry collection, removing event listeners and hooks, disposing of resources and timers, and clearing instrumentation-specific memory allocations. + +**Automatic Resource Management:** +The SDK automatically handles reference counting for shared instrumentations across teams, ensuring resources are only cleaned up when no teams require them. This prevents premature cleanup while ensuring efficient memory management. + +### Third-Party Instrumentation Registration + +The OTelWebSdk supports registration and dynamic loading of custom third-party instrumentations: + +**Third-Party Instrumentation Interface:** +Custom instrumentations must implement a standardized interface including name, version, vendor identification, and optional dependencies. The lifecycle includes initialize, enable, disable, and dispose methods for complete resource management. + +**Framework-Specific Instrumentations:** +Teams can create framework-specific instrumentations like React component tracking or Vue.js lifecycle monitoring. React instrumentations might hook into the component render cycle to track performance and errors, while Vue instrumentations could integrate with Vue's mixin system for lifecycle tracking. + +**Registration and Discovery:** +The factory provides registration mechanisms for custom instrumentations, enabling teams to register and use framework-specific or business logic instrumentations. Multiple instrumentations can be registered simultaneously, and the system provides discovery capabilities to list available instrumentations. + +**Business Logic Instrumentations:** +Custom instrumentations can track domain-specific events like e-commerce operations (cart additions, purchases) or application-specific workflows. These instrumentations integrate with custom events and business logic to provide comprehensive observability. + +**Validation and Loading:** +The system includes validation capabilities to ensure instrumentation compatibility before loading, preventing runtime errors and ensuring smooth operation of custom telemetry collection. + +## Implementation Examples + +### Custom React Instrumentation + +```typescript +// Example: Custom React instrumentation +class ReactInstrumentation implements IThirdPartyInstrumentation { + readonly name = 'react-components'; + readonly version = '1.2.0'; + readonly vendor = 'acme-corp'; + readonly dependencies = ['react', 'react-dom']; + + private _hooks: Set = new Set(); + private _enabled = false; + + async initialize(config: IOTelInstrumentationConfig): Promise { + // Validate React is available + if (typeof React === 'undefined') { + throw new Error('React is required for ReactInstrumentation'); + } + + // Setup component render tracking + this._setupComponentTracking(config); + } + + async enable(): Promise { + if (this._enabled) return; + + // Install React DevTools hooks + this._installReactHooks(); + this._enabled = true; + } + + async disable(): Promise { + if (!this._enabled) return; + + // Remove all React hooks + this._hooks.forEach(unhook => unhook()); + this._hooks.clear(); + this._enabled = false; + } + + async dispose(): Promise { + await this.disable(); + // Additional cleanup + } + + private _setupComponentTracking(config: IOTelInstrumentationConfig): void { + // Implementation details... + } + + private _installReactHooks(): void { + // Hook into React render cycle + const originalRender = React.Component.prototype.render; + const self = this; + + React.Component.prototype.render = function(this: any) { + const span = self._startComponentSpan(this.constructor.name); + try { + const result = originalRender.call(this); + span.setStatus({ code: SpanStatusCode.OK }); + return result; + } catch (error) { + span.recordException(error); + span.setStatus({ code: SpanStatusCode.ERROR }); + throw error; + } finally { + span.end(); + } + }; + + // Store unhook function + this._hooks.add(() => { + React.Component.prototype.render = originalRender; + }); + } +} + +// Register third-party instrumentation with the factory +const factory = createSdkFactory(); + +// Register custom instrumentation +factory.registerInstrumentation('react-components', () => new ReactInstrumentation()); + +// Teams can now use the custom instrumentation +const sdk = factory.newInst('my-team', { + connectionString: 'InstrumentationKey=your-key-here' +}); + +// Load the custom React instrumentation directly through SDK +await sdk.loadInstrumentation('react-components', { + enabled: true, + customProperties: { + 'team.name': 'frontend', + 'instrumentation.vendor': 'acme-corp' + } +}); +``` + +### Vue.js Instrumentation + +```typescript +// Example: Third-party Vue.js instrumentation +class VueInstrumentation implements IThirdPartyInstrumentation { + readonly name = 'vue-components'; + readonly version = '2.1.0'; + readonly vendor = 'vue-tools-inc'; + readonly dependencies = ['vue']; + + async initialize(config: IOTelInstrumentationConfig): Promise { + if (typeof Vue === 'undefined') { + throw new Error('Vue.js is required for VueInstrumentation'); + } + + // Setup Vue component lifecycle tracking + Vue.mixin({ + beforeCreate() { + this._instrumentationSpan = this.$tracer?.startSpan(`vue.component.${this.$options.name || 'anonymous'}`); + }, + + mounted() { + this._instrumentationSpan?.addEvent('mounted'); + }, + + beforeDestroy() { + this._instrumentationSpan?.end(); + } + }); + } + + async enable(): Promise { + // Enable Vue-specific telemetry collection + } + + async disable(): Promise { + // Disable and cleanup Vue instrumentation + } + + async dispose(): Promise { + await this.disable(); + } +} + +// Register multiple third-party instrumentations +factory.registerInstrumentation('vue-components', () => new VueInstrumentation()); +``` + +### Business Logic Instrumentation + +```typescript +// Example: Custom business logic instrumentation +class EcommerceInstrumentation implements IThirdPartyInstrumentation { + readonly name = 'ecommerce-business'; + readonly version = '1.0.0'; + readonly vendor = 'internal'; + + async initialize(config: IOTelInstrumentationConfig): Promise { + // Setup business event tracking + window.addEventListener('cart-add', this._trackCartAdd.bind(this)); + window.addEventListener('purchase-complete', this._trackPurchase.bind(this)); + } + + private _trackCartAdd(event: CustomEvent): void { + const tracer = this._getTracer(); + const span = tracer.startSpan('ecommerce.cart.add'); + span.setAttributes({ + 'product.id': event.detail.productId, + 'product.price': event.detail.price, + 'cart.total_items': event.detail.cartSize + }); + span.end(); + } + + private _trackPurchase(event: CustomEvent): void { + const tracer = this._getTracer(); + const span = tracer.startSpan('ecommerce.purchase.complete'); + span.setAttributes({ + 'purchase.order_id': event.detail.orderId, + 'purchase.total': event.detail.total, + 'purchase.currency': event.detail.currency + }); + span.end(); + } + + async enable(): Promise { + // Business instrumentation is always passive, no additional enable needed + } + + async disable(): Promise { + window.removeEventListener('cart-add', this._trackCartAdd); + window.removeEventListener('purchase-complete', this._trackPurchase); + } + + async dispose(): Promise { + await this.disable(); + } +} + +// Bulk registration of instrumentations +factory.registerInstrumentations({ + 'ecommerce-business': () => new EcommerceInstrumentation(), + 'custom-analytics': () => new CustomAnalyticsInstrumentation(), + 'performance-monitoring': () => new CustomPerformanceInstrumentation() +}); + +// Discovery and validation +console.log('Available instrumentations:', factory.getAvailableInstrumentations()); +// Output: ['xhr', 'fetch', 'errors', 'react-components', 'vue-components', 'ecommerce-business', ...] + +// Validate instrumentation before loading +const isValid = await factory.validateInstrumentation('react-components'); +if (isValid) { + await sdk.loadInstrumentation('react-components'); +} +``` + +### Package-Based Third-Party Instrumentations + +For npm packages, instrumentations can be automatically discovered and registered: + +**Package Registration:** +Teams can register instrumentations from npm packages that follow OpenTelemetry conventions. These packages provide pre-built instrumentations for popular frameworks like React and Vue.js, with configurable options for component tracking, lifecycle monitoring, and privacy controls. + +**Auto-Discovery:** +The system supports automatic discovery of instrumentation packages based on naming conventions and package prefixes. This enables teams to automatically detect and register community or vendor-provided instrumentations without manual configuration. + +**Configuration Options:** +Package-based instrumentations support comprehensive configuration including feature toggles (component tracking, lifecycle monitoring), privacy controls (prop tracking), and discovery timeouts to prevent blocking during package scanning. + +## Benefits of Dynamic Instrumentation Management + +1. **Zero-Downtime Updates**: Change instrumentation without restarting the application +2. **Safe Experimentation**: A/B test different tracking approaches with easy rollback +3. **Performance Optimization**: Unload heavy instrumentations during critical user flows +4. **Targeted Collection**: Load specific instrumentations only when needed +5. **Memory Efficiency**: Reclaim memory by unloading unused instrumentations +6. **Error Isolation**: Faulty instrumentations can be unloaded without affecting others +7. **Granular Control**: Fine-tune telemetry collection at the instrumentation level +8. **Team Independence**: Teams can manage their instrumentations independently +9. **Third-Party Ecosystem**: Support for community and vendor-specific instrumentations +10. **Package Integration**: Seamless integration with npm-based instrumentation packages \ No newline at end of file diff --git a/docs/planning/otel/specs/OTelWebSdk-Interfaces.md b/docs/planning/otel/specs/OTelWebSdk-Interfaces.md new file mode 100644 index 000000000..96fe11d68 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Interfaces.md @@ -0,0 +1,883 @@ +# OpenTelemetry Web SDK Interface Definitions + +This document contains the complete interface definitions for the OpenTelemetry Web SDK, following interface-first design principles with strict Inversion of Control (IoC) architecture. + +## Core Design Principles + +- **No Global State**: All dependencies must be explicitly injected through factory functions +- **Interface-First**: All public APIs defined as TypeScript interfaces before implementation +- **Dependency Injection**: Components receive dependencies through configuration, not global access +- **Closure OR DynamicProto**: Implementation uses EITHER closure pattern OR DynamicProto-JS for complex inheritance +- **Multi-Tenant Support**: Multiple SDK instances can coexist without interference + +## Core SDK Interfaces + +### Main SDK Interface + +```typescript +/** + * Main interface for the OpenTelemetry Web SDK instance + * Provides access to all telemetry providers and SDK management functions + * Created through factory functions with explicit dependency injection + */ +export interface IOTelWebSdk { + /** Get a tracer for creating spans */ + getTracer(name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer; + + /** Get a logger for emitting log records */ + getLogger(name: string, version?: string, options?: IOTelLoggerOptions): IOTelLogger; + + /** Get a meter for recording metrics */ + getMeter(name: string, version?: string, options?: IOTelMeterOptions): IOTelMeter; + + /** Update SDK configuration at runtime (where supported) */ + updateConfig(config: Partial): Promise; + + /** Completely unload the SDK instance and cleanup all resources */ + unload(onDone?: (result: IUnloadResult) => void, timeoutMs?: number): Promise; + + /** Get current SDK configuration (read-only) */ + getConfig(): Readonly; + + /** Get SDK instance metrics and statistics */ + getStats(): ISDKInstanceStats; +} + +/** + * Configuration interface for the OpenTelemetry Web SDK + * All dependencies must be explicitly provided - no global state access + */ +export interface IOTelWebSdkConfig { + /** REQUIRED: Resource information for this SDK instance */ + resource: IOTelResource; + + /** REQUIRED: Logger for SDK internal diagnostics */ + logger: IOTelLogger; + + /** REQUIRED: Performance timing function (injected for testability) */ + performanceNow: () => number; + + /** Connection string for telemetry ingestion */ + connectionString?: string; + + /** Sampling rate for traces (0.0 - 1.0) */ + samplingRate?: number; + + /** Custom properties to add to all telemetry */ + customProperties?: Record; + + /** Maximum queue size for batching */ + maxQueueSize?: number; + + /** Export interval in milliseconds */ + exportInterval?: number; + + /** Enable/disable SDK */ + enabled?: boolean; + + /** Debug mode */ + debug?: boolean; + + /** REQUIRED: Span processors (must be explicitly provided) */ + spanProcessors: IOTelSpanProcessor[]; + + /** REQUIRED: Log processors (must be explicitly provided) */ + logProcessors: IOTelLogProcessor[]; + + /** REQUIRED: Metric readers (must be explicitly provided) */ + metricReaders: IOTelMetricReader[]; + + /** REQUIRED: Context manager implementation */ + contextManager: IOTelContextManager; + + /** REQUIRED: Span ID generator implementation */ + idGenerator: IOTelIdGenerator; + + /** REQUIRED: Span sampler implementation */ + sampler: IOTelSampler; + + /** Instrumentation configurations */ + instrumentations?: IOTelInstrumentationConfig[]; +} + +/** + * Result interface for SDK unload operations + */ +export interface IUnloadResult { + /** Number of spans exported during unload */ + spansExported: number; + + /** Number of logs exported during unload */ + logsExported: number; + + /** Number of metrics exported during unload */ + metricsExported: number; + + /** Total cleanup time in milliseconds */ + cleanupTimeMs: number; + + /** Success status */ + success: boolean; + + /** Any errors encountered during unload */ + errors?: Error[]; +} +``` + +### Telemetry Provider Interfaces + +```typescript +/** + * Trace provider interface - created through factory with injected dependencies + */ +export interface IOTelTraceProvider { + /** Get a tracer instance */ + getTracer(name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer; + + /** Add a span processor */ + addSpanProcessor(processor: IOTelSpanProcessor): void; + + /** Get active span processors */ + getActiveSpanProcessors(): IOTelSpanProcessor[]; + + /** Force flush all processors */ + forceFlush(): Promise; + + /** Shutdown the provider */ + shutdown(): Promise; +} + +/** + * Log provider interface - created through factory with injected dependencies + */ +export interface IOTelLogProvider { + /** Get a logger instance */ + getLogger(name: string, version?: string, options?: IOTelLoggerOptions): IOTelLogger; + + /** Add a log processor */ + addLogProcessor(processor: IOTelLogProcessor): void; + + /** Get active log processors */ + getActiveLogProcessors(): IOTelLogProcessor[]; + + /** Force flush all processors */ + forceFlush(): Promise; + + /** Shutdown the provider */ + shutdown(): Promise; +} + +/** + * Meter provider interface - created through factory with injected dependencies + */ +export interface IOTelMeterProvider { + /** Get a meter instance */ + getMeter(name: string, version?: string, options?: IOTelMeterOptions): IOTelMeter; + + /** Add a metric reader */ + addMetricReader(reader: IOTelMetricReader): void; + + /** Get active metric readers */ + getActiveMetricReaders(): IOTelMetricReader[]; + + /** Force flush all readers */ + forceFlush(): Promise; + + /** Shutdown the provider */ + shutdown(): Promise; +} + +/** + * Tracer interface for creating and managing spans + */ +export interface IOTelTracer { + /** Start a new span */ + startSpan(name: string, options?: IOTelSpanOptions): IOTelSpan; + + /** Start a span and set it as active in the current context */ + startActiveSpan(name: string, fn: (span: IOTelSpan) => T): T; + startActiveSpan(name: string, options: IOTelSpanOptions, fn: (span: IOTelSpan) => T): T; + startActiveSpan(name: string, options: IOTelSpanOptions, context: IOTelContext, fn: (span: IOTelSpan) => T): T; +} + +/** + * Span interface for recording trace data + */ +export interface IOTelSpan { + /** Set an attribute on the span */ + setAttribute(key: string, value: any): void; + + /** Set multiple attributes on the span */ + setAttributes(attributes: Record): void; + + /** Add an event to the span */ + addEvent(name: string, attributes?: Record): void; + + /** Set the span status */ + setStatus(status: ISpanStatus): void; + + /** Update the span name */ + updateName(name: string): void; + + /** End the span */ + end(endTime?: number): void; + + /** Check if the span is recording */ + isRecording(): boolean; + + /** Get the span context */ + getSpanContext(): ISpanContext; +} + +/** + * Logger interface for emitting log records + */ +export interface IOTelLogger { + /** Emit a log record */ + emit(logRecord: ILogRecord): void; + + /** Log an info message */ + info(message: string, attributes?: Record): void; + + /** Log a warning message */ + warn(message: string, attributes?: Record): void; + + /** Log an error message */ + error(message: string, attributes?: Record): void; + + /** Log a debug message */ + debug(message: string, attributes?: Record): void; +} + +/** + * Meter interface for recording basic metrics + * Note: Only includes basic metric types - advanced features like observable metrics are excluded + */ +export interface IOTelMeter { + /** Create a counter metric */ + createCounter(name: string, options?: IMetricOptions): ICounter; + + /** Create a histogram metric */ + createHistogram(name: string, options?: IMetricOptions): IHistogram; + + /** Create a gauge metric */ + createGauge(name: string, options?: IMetricOptions): IGauge; +} +} +``` + +### Context Management Interfaces + +```typescript +/** + * Context manager interface for managing execution context + */ +export interface IOTelContextManager { + /** Get the active context */ + active(): IOTelContext; + + /** Execute a function with a specific context as active */ + with(context: IOTelContext, fn: () => T): T; + + /** Create a new context with a span set as active */ + setSpan(context: IOTelContext, span: IOTelSpan): IOTelContext; + + /** Get the active span from a context */ + getSpan(context: IOTelContext): IOTelSpan | undefined; +} + +/** + * Context interface representing execution context + */ +export interface IOTelContext { + /** Get a value from the context */ + getValue(key: symbol): any; + + /** Set a value in the context */ + setValue(key: symbol, value: any): IOTelContext; + + /** Delete a value from the context */ + deleteValue(key: symbol): IOTelContext; +} +``` + +### Required Dependency Interfaces + +```typescript +/** + * ID generator interface - must be injected into trace provider + */ +export interface IOTelIdGenerator { + /** Generate a new trace ID */ + generateTraceId(): string; + + /** Generate a new span ID */ + generateSpanId(): string; +} + +/** + * Sampler interface - must be injected into trace provider + */ +export interface IOTelSampler { + /** Make sampling decision for a span */ + shouldSample(context: IOTelContext, traceId: string, spanName: string, spanKind: ISpanKind, attributes?: Record, links?: ISpanLink[]): ISamplingResult; +} + +/** + * Sampling result interface + */ +export interface ISamplingResult { + /** Sampling decision */ + decision: ISamplingDecision; + + /** Additional attributes to add to the span */ + attributes?: Record; + + /** Trace state to propagate */ + traceState?: string; +} + +/** + * Sampling decision enumeration + */ +export enum ISamplingDecision { + /** Do not record or export the span */ + NOT_RECORD = 0, + + /** Record the span but do not export */ + RECORD = 1, + + /** Record and export the span */ + RECORD_AND_SAMPLED = 2 +} + +/** + * Metric reader interface - must be injected into meter provider + */ +export interface IOTelMetricReader { + /** Collect metrics */ + collect(): Promise; + + /** Force flush metrics */ + forceFlush(): Promise; + + /** Shutdown the reader */ + shutdown(): Promise; +} + +/** + * SDK instance statistics interface + */ +export interface ISDKInstanceStats { + /** Instance creation timestamp */ + createdAt: number; + + /** Last activity timestamp */ + lastActivityAt: number; + + /** Number of spans created */ + spansCreated: number; + + /** Number of logs emitted */ + logsEmitted: number; + + /** Number of metrics recorded */ + metricsRecorded: number; + + /** Current queue sizes */ + queueSizes: { + spans: number; + logs: number; + metrics: number; + }; + + /** Memory usage in bytes */ + memoryUsageBytes: number; +} +``` + +### Configuration and Options Interfaces + +```typescript +/** + * Options for creating a tracer + */ +export interface IOTelTracerOptions { + /** Schema URL for semantic conventions */ + schemaUrl?: string; + + /** Additional tracer attributes */ + attributes?: Record; +} + +/** + * Options for creating a logger + */ +export interface IOTelLoggerOptions { + /** Schema URL for semantic conventions */ + schemaUrl?: string; + + /** Additional logger attributes */ + attributes?: Record; +} + +/** + * Options for creating a meter + */ +export interface IOTelMeterOptions { + /** Schema URL for semantic conventions */ + schemaUrl?: string; + + /** Additional meter attributes */ + attributes?: Record; +} + +/** + * Options for creating a span + */ +export interface IOTelSpanOptions { + /** Span kind */ + kind?: ISpanKind; + + /** Start time for the span */ + startTime?: number; + + /** Initial attributes for the span */ + attributes?: Record; + + /** Links to other spans */ + links?: ISpanLink[]; + + /** Parent context */ + parent?: IOTelContext; +} + +/** + * Configuration for instrumentations + */ +export interface IOTelInstrumentationConfig { + /** Name of the instrumentation */ + name: string; + + /** Version of the instrumentation */ + version?: string; + + /** Whether the instrumentation is enabled */ + enabled?: boolean; + + /** Instrumentation-specific configuration */ + config?: Record; +} +``` + +### Processor and Exporter Interfaces + +```typescript +/** + * Span processor interface + */ +export interface IOTelSpanProcessor { + /** Called when a span is started */ + onStart(span: IOTelSpan, parentContext?: IOTelContext): void; + + /** Called when a span is ended */ + onEnd(span: IOTelSpan): void; + + /** Force flush any pending spans */ + forceFlush(): Promise; + + /** Shutdown the processor */ + shutdown(): Promise; +} + +/** + * Span exporter interface + */ +export interface ISpanExporter { + /** Export spans */ + export(spans: IOTelSpan[]): Promise; + + /** Shutdown the exporter */ + shutdown(): Promise; +} + +/** + * Log processor interface + */ +export interface IOTelLogProcessor { + /** Called when a log record is emitted */ + onEmit(logRecord: ILogRecord): void; + + /** Force flush any pending log records */ + forceFlush(): Promise; + + /** Shutdown the processor */ + shutdown(): Promise; +} + +/** + * Log exporter interface + */ +export interface ILogExporter { + /** Export log records */ + export(logRecords: ILogRecord[]): Promise; + + /** Shutdown the exporter */ + shutdown(): Promise; +} + +/** + * Metric processor interface + */ +export interface IMetricProcessor { + /** Process metric data */ + process(metrics: IMetricData[]): IMetricData[]; + + /** Shutdown the processor */ + shutdown(): Promise; +} + +/** + * Metric exporter interface + */ +export interface IMetricExporter { + /** Export metrics */ + export(metrics: IMetricData[]): Promise; + + /** Shutdown the exporter */ + shutdown(): Promise; +} +``` + +### Data Model Interfaces + +```typescript +/** + * Log record interface + */ +export interface ILogRecord { + /** Log timestamp */ + timestamp?: number; + + /** Log severity level */ + severityLevel?: ISeverityLevel; + + /** Log severity text */ + severityText?: string; + + /** Log message */ + body?: string; + + /** Log attributes */ + attributes?: Record; + + /** Resource attributes */ + resource?: IOTelResource; + + /** Instrumentation scope */ + instrumentationScope?: IInstrumentationScope; +} + +/** + * Metric data interface + */ +export interface IMetricData { + /** Metric name */ + name: string; + + /** Metric description */ + description?: string; + + /** Metric unit */ + unit?: string; + + /** Metric type */ + type: IMetricType; + + /** Metric data points */ + dataPoints: IDataPoint[]; + + /** Resource attributes */ + resource?: IOTelResource; + + /** Instrumentation scope */ + instrumentationScope?: IInstrumentationScope; +} + +/** + * Span context interface + */ +export interface ISpanContext { + /** Trace ID */ + traceId: string; + + /** Span ID */ + spanId: string; + + /** Trace flags */ + traceFlags: number; + + /** Trace state */ + traceState?: string; + + /** Whether the context is remote */ + isRemote?: boolean; +} + +/** + * Resource interface + */ +export interface IOTelResource { + /** Resource attributes */ + attributes: Record; + + /** Merge with another resource */ + merge(other: IOTelResource): IOTelResource; +} + +/** + * Instrumentation scope interface + */ +export interface IInstrumentationScope { + /** Instrumentation name */ + name: string; + + /** Instrumentation version */ + version?: string; + + /** Schema URL */ + schemaUrl?: string; + + /** Additional attributes */ + attributes?: Record; +} +``` + +### Factory Function Interfaces + +```typescript +/** + * Factory function to create standalone SDK instance + * All dependencies must be explicitly provided through config + */ +export function createOTelWebSdk(config: IOTelWebSdkConfig): IOTelWebSdk; + +/** + * Factory function for trace provider with dependency injection + */ +export function createTraceProvider(config: ITraceProviderConfig): IOTelTraceProvider; + +/** + * Factory function for log provider with dependency injection + */ +export function createLogProvider(config: ILogProviderConfig): IOTelLogProvider; + +/** + * Factory function for meter provider with dependency injection + */ +export function createMeterProvider(config: IMeterProviderConfig): IOTelMeterProvider; + +/** + * Trace provider configuration with explicit dependencies + */ +export interface ITraceProviderConfig { + /** REQUIRED: Resource information */ + resource: IOTelResource; + + /** REQUIRED: Span processors */ + spanProcessors: IOTelSpanProcessor[]; + + /** REQUIRED: Sampler implementation */ + sampler: IOTelSampler; + + /** REQUIRED: ID generator for span and trace IDs */ + idGenerator: IOTelIdGenerator; + + /** REQUIRED: Context manager */ + contextManager: IOTelContextManager; + + /** REQUIRED: Logger for internal diagnostics */ + logger: IOTelLogger; + + /** REQUIRED: Performance timing function */ + performanceNow: () => number; +} + +/** + * Log provider configuration with explicit dependencies + */ +export interface ILogProviderConfig { + /** REQUIRED: Resource information */ + resource: IOTelResource; + + /** REQUIRED: Log processors */ + logProcessors: IOTelLogProcessor[]; + + /** REQUIRED: Context manager */ + contextManager: IOTelContextManager; + + /** REQUIRED: Logger for internal diagnostics */ + logger: IOTelLogger; + + /** REQUIRED: Performance timing function */ + performanceNow: () => number; +} + +/** + * Meter provider configuration with explicit dependencies + */ +export interface IMeterProviderConfig { + /** REQUIRED: Resource information */ + resource: IOTelResource; + + /** REQUIRED: Metric readers */ + metricReaders: IOTelMetricReader[]; + + /** REQUIRED: Logger for internal diagnostics */ + logger: IOTelLogger; + + /** REQUIRED: Performance timing function */ + performanceNow: () => number; +} +``` + +## Enums and Constants + +```typescript +/** + * Span kind enumeration + */ +export enum ISpanKind { + INTERNAL = 0, + SERVER = 1, + CLIENT = 2, + PRODUCER = 3, + CONSUMER = 4 +} + +/** + * Span status code enumeration + */ +export enum ISpanStatusCode { + UNSET = 0, + OK = 1, + ERROR = 2 +} + +/** + * Severity level enumeration + */ +export enum ISeverityLevel { + TRACE = 1, + DEBUG = 5, + INFO = 9, + WARN = 13, + ERROR = 17, + FATAL = 21 +} + +/** + * Metric type enumeration + */ +export enum IMetricType { + COUNTER = 'Counter', + HISTOGRAM = 'Histogram', + UP_DOWN_COUNTER = 'UpDownCounter', + OBSERVABLE_GAUGE = 'ObservableGauge', + OBSERVABLE_COUNTER = 'ObservableCounter', + OBSERVABLE_UP_DOWN_COUNTER = 'ObservableUpDownCounter' +} + +/** + * Export result status enumeration + */ +export enum IExportResultCode { + SUCCESS = 0, + FAILED = 1, + FAILED_RETRYABLE = 2 +} +``` + +This comprehensive interface definition enables: + +1. **Type Safety**: Full TypeScript support with strict typing +2. **Interface-First Design**: All APIs defined as contracts before implementation +3. **Multi-Tenant Support**: Multiple SDK instance management with resource sharing +4. **Extensibility**: Plugin architecture through processor and exporter interfaces +5. **Testability**: All interfaces can be mocked for testing +6. **Standards Compliance**: Follows OpenTelemetry API specifications +7. **Resource Management**: Complete lifecycle and cleanup control +8. **Dependency Injection**: All dependencies explicitly provided, no global state +9. **Performance Optimization**: Designed for closure OR DynamicProto implementation patterns + +## Implementation Pattern Guidelines + +### Implementation Pattern Guidelines + +All interfaces are designed to be implemented using EITHER the closure pattern OR DynamicProto-JS pattern: + +```typescript +// Example implementation pattern for IOTelTraceProvider +export function createTraceProvider(config: ITraceProviderConfig): IOTelTraceProvider { + // Validate all required dependencies are provided + if (!config.resource) { + throw new Error("Resource must be provided to TraceProvider"); + } + // ... validate other required dependencies + + // Private closure variables - completely encapsulated + let _config = { ...config }; + let _tracers = new Map(); + let _processors = [...config.spanProcessors]; + let _isShutdown = false; + + // Create the interface instance + let _self = {} as IOTelTraceProvider; + + // Define methods directly on the interface instance + _self.getTracer = (name: string, version?: string, options?: IOTelTracerOptions): IOTelTracer => { + if (_isShutdown) { + throw new Error("TraceProvider is shutdown"); + } + + const key = `${name}@${version || 'unknown'}`; + let tracer = _tracers.get(key); + + if (!tracer) { + tracer = createTracer({ + name, + version, + resource: _config.resource, // Injected dependency + processors: _processors, // Injected dependency + contextManager: _config.contextManager, // Injected dependency + logger: _config.logger, // Injected dependency + // All dependencies come from injected config + ...options + }); + _tracers.set(key, tracer); + } + + return tracer; + }; + + // Other method implementations... + + return _self; +} +``` + +### Key Implementation Principles + +1. **All Dependencies Injected**: No global variable access or singleton patterns +2. **Closure Encapsulation**: Private state completely hidden from external access +3. **Interface Compliance**: Implementation must satisfy the interface contract exactly +4. **Error Handling**: Comprehensive validation of injected dependencies +5. **Performance**: DynamicProto pattern optimizes method calls and memory usage +6. **Testability**: All dependencies can be mocked through the config parameter + +## Related Documentation + +### Operational Guides +- **[Testing Strategy](./OTelWebSdk-Testing.md)** - Interface testing patterns and mock strategies +- **[Performance Strategy](./OTelWebSdk-Performance.md)** - Performance optimization with interface constraints +- **[Migration Guide](./OTelWebSdk-Migration.md)** - Interface compatibility and migration patterns diff --git a/docs/planning/otel/specs/OTelWebSdk-Migration.md b/docs/planning/otel/specs/OTelWebSdk-Migration.md new file mode 100644 index 000000000..35efc15af --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Migration.md @@ -0,0 +1,169 @@ +# OpenTelemetry Web SDK - Migration Planning + +## Overview + +This document outlines the migration planning strategy for transitioning from existing telemetry solutions to the OpenTelemetry Web SDK. This is currently a planning document that will be expanded with detailed migration guides, tools, and examples once the OpenTelemetry Web SDK implementation is complete. + +## Future Migration Support + +### Planned Migration Scenarios + +The OpenTelemetry Web SDK will support migration from the following telemetry solutions: + +1. **Application Insights JavaScript SDK** + - Configuration mapping utilities + - API compatibility wrappers + - Data format converters + - Gradual migration tools + +2. **Other OpenTelemetry Implementations** + - Community OpenTelemetry Web SDK + - Custom OpenTelemetry setups + - Configuration migration tools + +3. **Custom Telemetry Solutions** + - Generic telemetry event mappers + - Custom instrumentation converters + - Data validation frameworks + +### Migration Strategy Framework + +#### Phase 1: Assessment and Planning +- Inventory current telemetry implementation +- Map existing telemetry to OpenTelemetry concepts +- Define migration scope and timeline +- Establish rollback procedures + +#### Phase 2: Parallel Implementation +- Install OpenTelemetry SDK alongside existing solution +- Configure basic instrumentation +- Validate data collection and export +- Compare telemetry data between systems + +#### Phase 3: Feature Migration +- Migrate core telemetry functionality +- Implement custom instrumentation +- Configure advanced features +- Validate business-critical scenarios + +#### Phase 4: Cutover and Cleanup +- Gradually increase OpenTelemetry traffic +- Monitor system performance and data quality +- Remove legacy instrumentation +- Complete migration validation + +## Planned Migration Tools + +### Configuration Converters +- **Application Insights Config Converter**: Automated conversion of AI configuration to OpenTelemetry format +- **Generic Config Mapper**: Flexible mapping system for custom telemetry configurations +- **Validation Utilities**: Tools to verify configuration correctness and completeness + +### API Compatibility Wrappers +- **Application Insights Compatibility Layer**: Drop-in replacement APIs for gradual migration +- **OpenTelemetry Community Bridge**: Compatibility with existing OpenTelemetry community implementations +- **Custom API Adapters**: Flexible adapters for proprietary telemetry APIs + +### Data Validation Framework +- **Dual Tracking Validator**: Side-by-side data collection and comparison +- **Migration Test Suite**: Automated testing for migration scenarios +- **Data Consistency Checker**: Validation of data integrity during migration + +## Migration Challenges and Solutions + +### Data Format Differences +**Challenge**: Different attribute naming conventions and data structures between telemetry systems. + +**Planned Solution**: +- Automated attribute mapping utilities +- Configurable data transformation pipelines +- Semantic convention alignment tools + +### Missing Functionality +**Challenge**: Legacy SDK features not directly available in OpenTelemetry. + +**Planned Solution**: +- Feature gap analysis tools +- Custom implementation bridges +- Alternative approach documentation + +### Performance Concerns +**Challenge**: Different performance characteristics during migration. + +**Planned Solution**: +- Performance comparison tools +- Optimization configuration guides +- Gradual rollout strategies + +## Migration Timeline Template + +### Pre-Migration Phase +- Document current telemetry implementation +- Analyze migration requirements +- Plan migration strategy +- Set up testing environments + +### Migration Implementation Phase +- Install OpenTelemetry Web SDK +- Configure parallel data collection +- Implement gradual migration +- Validate data consistency + +### Post-Migration Phase +- Monitor system performance +- Complete legacy system removal +- Update documentation +- Conduct migration review + +## Future Documentation + +Once the OpenTelemetry Web SDK implementation is complete, this document will be expanded to include: + +### Detailed Migration Guides +- Step-by-step migration procedures for each supported telemetry solution +- Code examples and configuration samples +- Common migration patterns and best practices + +### Migration Tools and Utilities +- Downloadable migration tools and converters +- Automated testing frameworks +- Data validation utilities + +### Example Migration Projects +- Sample migration implementations +- Before/after code comparisons +- Performance impact analysis + +### Troubleshooting and Support +- Common migration issues and solutions +- Performance optimization guides +- Expert consultation resources + +## Next Steps + +1. **Complete OpenTelemetry Web SDK Core Implementation** + - Finalize SDK architecture and APIs + - Implement core telemetry functionality + - Establish configuration patterns + +2. **Develop Migration Framework** + - Build configuration conversion tools + - Create API compatibility layers + - Implement data validation utilities + +3. **Create Migration Documentation** + - Detailed migration guides for each scenario + - Code examples and best practices + - Performance optimization guides + +4. **Test Migration Scenarios** + - Validate migration tools with real applications + - Performance impact assessment + - Data consistency verification + +## References + +- [OpenTelemetry Web SDK Architecture](./OTelWebSdk-Architecture.md) +- [Testing Strategy](./OTelWebSdk-Testing.md) +- [Performance Strategy](./OTelWebSdk-Performance.md) +- [Performance Strategy](./OTelWebSdk-Performance.md) diff --git a/docs/planning/otel/specs/OTelWebSdk-Performance.md b/docs/planning/otel/specs/OTelWebSdk-Performance.md new file mode 100644 index 000000000..03ac6b327 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Performance.md @@ -0,0 +1,227 @@ +# OpenTelemetry Web SDK - Performance Strategy + +## Overview + +This document outlines the performance strategy for the OpenTelemetry Web SDK implementation, covering performance targets, optimization techniques, monitoring strategies, and performance testing approaches. + +## Performance Philosophy + +### Core Principles +1. **Minimal Impact**: The SDK should have negligible impact on application performance +2. **Lazy Initialization**: Components should be initialized only when needed +3. **Efficient Data Structures**: Use memory-efficient data structures and algorithms +4. **Batch Processing**: Group operations to reduce overhead +5. **Tree-Shaking Friendly**: Enable aggressive dead code elimination + +### Performance-First Design +- **Zero-cost abstractions** where possible +- **Avoid synchronous operations** that could block the main thread +- **Minimize memory allocations** in hot paths +- **Use object pooling** for frequently created objects +- **Implement efficient cleanup** to prevent memory leaks + +## Performance Optimization Techniques + +### 1. Lazy Initialization Pattern + +Implement lazy initialization to defer expensive operations until they are actually needed: + +- Store provider instances as local variables within the Dynamic Proto closure +- Only create providers when first accessed +- Use performance measurements to track initialization costs +- Ensure all private state remains in the constructor closure + +### 2. Object Pooling for Spans + +Implement object pooling to reduce memory allocations: + +- Maintain a pool of reusable span objects +- Reset objects instead of creating new ones +- Set maximum pool size to prevent memory growth +- Clear data structures efficiently without recreating objects + +### 3. Efficient Attribute Management + +Optimize attribute storage and access: + +- Use type-specific Maps for different attribute types (string, number, boolean) +- Implement lazy Map creation to save memory when attributes are empty +- Provide efficient iteration without object creation +- Minimize memory allocations during attribute operations + +### 4. Batch Processing Optimization + +Group operations to reduce overhead: + +- Batch telemetry items before processing +- Use configurable batch sizes and timeouts +- Implement efficient flush mechanisms +- Clean up timers and resources properly + +## Performance Monitoring and Metrics + +### Internal Performance Tracking + +Implement performance tracking capabilities: + +- Track operation durations and performance metrics +- Monitor memory usage and resource consumption +- Check operations against performance budgets +- Report budget violations and performance regressions + +### Performance Metrics Collection + +Collect comprehensive performance data: + +- **SDK operation metrics**: Initialization, runtime operations, memory usage +- **Browser performance metrics**: Navigation timing, resource timing, frame rates +- **Runtime metrics**: Operation rates, batch sizes, export frequency +- **Memory metrics**: Heap usage, span memory consumption, context overhead + +## Browser Performance Integration + +### Performance Observer Integration + +Leverage browser performance APIs: + +- Use PerformanceObserver to monitor SDK operations +- Track performance entries for SDK-specific operations +- Create performance marks for key SDK operations +- Measure operation durations using browser timing APIs +- Provide fallbacks for browsers without PerformanceObserver support + +### Resource Impact Monitoring + +Monitor SDK impact on application resources: + +- Establish baseline metrics before SDK initialization +- Track memory usage, CPU time, and network requests +- Monitor DOM node count and other resource indicators +- Calculate SDK impact relative to baseline +- Report resource consumption metrics + +## Performance Testing Strategy + +### Benchmark Suite + +Implement comprehensive performance benchmarking: + +- **Initialization benchmarks**: SDK startup, provider creation, first operation +- **Runtime benchmarks**: Span creation, attribute operations, context operations +- **Memory benchmarks**: Memory usage patterns, cleanup efficiency, leak detection +- **Export benchmarks**: Batch preparation, serialization, network operations +- **Regression detection**: Compare against historical performance data + +### Performance Regression Detection + +Monitor for performance degradations: + +- Maintain historical performance data +- Calculate statistical baselines and thresholds +- Detect significant performance regressions +- Report regression severity and impact +- Provide actionable insights for performance issues + +## Performance Optimization Guidelines + +### Code-Level Optimizations + +#### 1. Minimize Object Creation +Avoid creating objects in hot paths: +- **Problem**: Creating closures and objects repeatedly in frequently called functions +- **Solution**: Reuse functions and minimize allocations by creating shared utilities +- **Pattern**: Create helper functions once and reuse them across operations + +#### 2. Use Efficient Data Structures +Choose appropriate data structures for access patterns: +- **Problem**: Using arrays for frequent lookups (O(n) complexity) +- **Solution**: Use Maps for O(1) lookups when key-based access is needed +- **Pattern**: Match data structure to usage pattern for optimal performance + +#### 3. Implement Lazy Evaluation +Defer expensive computations until needed: +- **Pattern**: Cache results and only recompute when data changes +- **Implementation**: Use dirty flags to track when cached data needs refreshing +- **Benefit**: Avoid repeated expensive operations on unchanged data + +### Bundle Size Optimization + +#### Tree-Shaking Configuration +Ensure optimal dead code elimination: +- Export individual functions and classes rather than default exports +- Avoid side effects in module initialization +- Use proper TypeScript configuration for tree-shaking +- Structure code to enable granular imports + +#### Dynamic Imports for Optional Features +Load features on demand: +- Use dynamic imports for advanced or optional functionality +- Implement feature detection and progressive enhancement +- Load plugins and extensions only when needed +- Reduce initial bundle size through code splitting + +## Performance Best Practices + +### Development Guidelines +1. **Profile Before Optimizing**: Use browser dev tools to identify actual bottlenecks +2. **Measure Everything**: Add performance measurements to all critical paths +3. **Optimize for Common Cases**: Focus on the 80% use case first +4. **Use Performance Budgets**: Set and enforce performance limits +5. **Monitor Continuously**: Track performance metrics in CI/CD + +### Runtime Guidelines +1. **Minimize Main Thread Blocking**: Use `requestIdleCallback` for non-critical work +2. **Batch Operations**: Group similar operations together +3. **Use Passive Event Listeners**: Avoid blocking scroll and touch events +4. **Implement Progressive Loading**: Load features as needed +5. **Clean Up Resources**: Always dispose of resources properly + +### Memory Management +1. **Avoid Memory Leaks**: Remove event listeners and clear references +2. **Use Object Pooling**: Reuse objects in hot paths +3. **Implement Weak References**: Use WeakMap/WeakSet where appropriate +4. **Monitor Memory Usage**: Track heap size and garbage collection +5. **Set Memory Limits**: Implement bounds on data structures + +## Performance Targets + +**Note**: Performance targets will be validated through comprehensive benchmarking during implementation. Targets are based on web application requirements and Application Insights SDK performance analysis. + +### Initialization Performance +- SDK initialization: < 5ms (p95) +- Provider creation: < 2ms (p95) +- First span creation: < 1ms (p95) + +### Runtime Performance +- Span creation: < 0.1ms (p95) +- Attribute addition: < 0.05ms (p95) +- Context propagation: < 0.1ms (p95) +- Span completion: < 0.2ms (p95) + +### Memory Usage +- Maximum spans in memory: 1000 spans +- Memory per span: < 1KB +- Total SDK memory usage: < 10MB +- Memory cleanup: > 95% after flush + +### Bundle Size +- Core SDK: < 50KB gzipped +- Full SDK with all providers: < 100KB gzipped +- Individual providers: < 20KB gzipped +- Tree-shaking efficiency: > 80% + +## Future Performance Enhancements + +### Planned Optimizations +1. **WebAssembly Integration**: High-performance serialization and processing +2. **Worker Thread Support**: Offload heavy processing to workers +3. **Streaming Processing**: Process telemetry data in streams +4. **Advanced Caching**: Intelligent caching strategies for repeated operations +5. **Predictive Loading**: Preload resources based on usage patterns + +### Performance Research Areas +1. **Machine Learning Optimization**: Use ML to optimize sampling and batching +2. **Browser API Innovation**: Leverage new browser APIs for better performance +3. **Protocol Optimization**: Optimize export protocols for efficiency +4. **Adaptive Algorithms**: Adjust behavior based on device capabilities +5. **Edge Computing**: Process telemetry closer to users diff --git a/docs/planning/otel/specs/OTelWebSdk-TelemetryInitializers.md b/docs/planning/otel/specs/OTelWebSdk-TelemetryInitializers.md new file mode 100644 index 000000000..207408eb1 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-TelemetryInitializers.md @@ -0,0 +1,936 @@ +# OTelWebSdk Telemetry Initializers Specification + +## Overview + +The OTelWebSdk provides comprehensive telemetry processing capabilities through both **lightweight telemetry initializers** (a core SDK feature) and full OpenTelemetry processors. This dual approach enables teams to choose the optimal processing mechanism for their specific performance requirements while maintaining full OpenTelemetry ecosystem standards compliance. + +## Design Philosophy + +### **Standards Compliance First** + +The OTelWebSdk is built on OpenTelemetry standards as the primary foundation: + +- **Full OpenTelemetry Processor Support**: Complete implementation of OpenTelemetry `SpanProcessor`, `LogRecordProcessor`, and `MetricProcessor` interfaces +- **OpenTelemetry API Compatibility**: All telemetry flows through standard OpenTelemetry APIs (TraceAPI, LogsAPI, MetricsAPI) +- **Ecosystem Interoperability**: Works seamlessly with existing OpenTelemetry instrumentations, exporters, and processors +- **Specification Compliance**: Adheres to OpenTelemetry semantic conventions and data models + +### **Performance-Optimized Lightweight Processors** + +Lightweight telemetry initializers are provided as a **first-class SDK feature** designed specifically for web application performance requirements: + +- **Performance Optimization**: ~1-2μs per telemetry item vs ~5-10μs for full OpenTelemetry processors +- **Bundle Size Efficiency**: ~0.5KB vs 2-3KB per processor for simple scenarios +- **Web Application Focus**: Purpose-built for common web telemetry processing patterns +- **Simplified Development**: Callback-based pattern optimized for data enrichment and filtering +- **Application Insights Compatibility**: Maintains familiar patterns while providing modern OpenTelemetry foundation + +### **Hybrid Processing Pipeline** + +The SDK supports both processing approaches simultaneously: + +```typescript +// Full OpenTelemetry processors (primary approach) +sdk.addSpanProcessor(new BatchSpanProcessor(exporter)); +sdk.addLogRecordProcessor(new SimpleLogRecordProcessor(logExporter)); + +// Lightweight initializers (supplementary approach) +sdk.addTelemetryInitializer(addUserContext); +sdk.addTelemetryInitializer(sanitizeUrls); +``` + +## OpenTelemetry Processor Support + +### **Standard OpenTelemetry Processors** + +The SDK provides complete support for all OpenTelemetry processor types: + +#### **Span Processors** + +```typescript +import { BatchSpanProcessor, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base'; + +// Batch processing for production +const batchProcessor = new BatchSpanProcessor(spanExporter, { + maxExportBatchSize: 100, + scheduledDelayMillis: 5000, + exportTimeoutMillis: 30000, + maxQueueSize: 2048 +}); + +sdk.addSpanProcessor(batchProcessor); + +// Simple processing for development +const simpleProcessor = new SimpleSpanProcessor(spanExporter); +sdk.addSpanProcessor(simpleProcessor); + +// Custom span processor +class CustomSpanProcessor implements SpanProcessor { + onStart(span: Span, parentContext: Context): void { + // Custom span enrichment + span.setAttributes({ + 'custom.processor': 'active', + 'custom.startTime': Date.now() + }); + } + + onEnd(span: ReadableSpan): void { + // Custom span completion logic + const duration = span.duration[0] * 1000 + span.duration[1] / 1000000; + if (duration > 1000) { + console.warn('Slow span detected:', span.name, duration + 'ms'); + } + } + + shutdown(): Promise { + return Promise.resolve(); + } + + forceFlush(): Promise { + return Promise.resolve(); + } +} + +sdk.addSpanProcessor(new CustomSpanProcessor()); +``` + +#### **Log Record Processors** + +```typescript +import { SimpleLogRecordProcessor, BatchLogRecordProcessor } from '@opentelemetry/sdk-logs'; + +// Batch log processing +const logBatchProcessor = new BatchLogRecordProcessor(logExporter, { + maxExportBatchSize: 50, + scheduledDelayMillis: 3000, + exportTimeoutMillis: 10000, + maxQueueSize: 1000 +}); + +sdk.addLogRecordProcessor(logBatchProcessor); + +// Custom log processor +class CustomLogProcessor implements LogRecordProcessor { + onEmit(logRecord: LogRecord): void { + // Enrich log records + logRecord.setAttributes({ + 'log.processor': 'custom', + 'log.timestamp': Date.now() + }); + + // Apply filtering + if (logRecord.severityText === 'DEBUG' && !isDevelopment()) { + return; // Filter out debug logs in production + } + } + + shutdown(): Promise { + return Promise.resolve(); + } + + forceFlush(): Promise { + return Promise.resolve(); + } +} + +sdk.addLogRecordProcessor(new CustomLogProcessor()); +``` + +#### **Metric Processors** + +```typescript +// Custom metric processor for advanced scenarios +class CustomMetricProcessor implements MetricProcessor { + process(metrics: MetricData[]): MetricData[] { + return metrics.map(metric => { + // Add custom attributes to all metrics + return { + ...metric, + attributes: { + ...metric.attributes, + 'metric.processor': 'custom', + 'metric.environment': getEnvironment() + } + }; + }); + } +} + +// Note: Metric processors are less commonly customized +// Most metric processing happens at the exporter level +``` + +### **OpenTelemetry Processing Benefits** + +- **Complete Lifecycle Control**: Full control over telemetry from creation to export +- **Advanced Batching**: Sophisticated batching strategies for optimal performance +- **Error Handling**: Comprehensive error handling with retry logic +- **Resource Management**: Advanced resource management and cleanup +- **Extensibility**: Full processor lifecycle hooks for complex scenarios +- **Export Control**: Fine-grained control over when and how telemetry is exported + +## Application Insights-Style Telemetry Initializers + +### **Lightweight Callback Pattern** + +Telemetry initializers provide a simple callback-based approach for common telemetry processing scenarios: + +```typescript +interface ITelemetryInitializer { + (item: ITelemetryItem): boolean | void; +} + +interface ITelemetryItem { + name: string; + time: string; + baseType: string; + baseData: any; + data: { + customProperties?: { [key: string]: string }; + customMeasurements?: { [key: string]: number }; + }; + tags?: { [key: string]: string }; +} +``` + +### **Common Use Cases** + +#### **Data Enrichment** + +```typescript +// Add application context to all telemetry +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['app.version'] = getAppVersion(); + item.data.customProperties['app.environment'] = getEnvironment(); + item.data.customProperties['app.region'] = getRegion(); + + // Add build information + const buildInfo = getBuildInfo(); + item.data.customProperties['build.number'] = buildInfo.number; + item.data.customProperties['build.commit'] = buildInfo.commit; + + return true; // Continue processing +}); + +// Add user context +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const userContext = getCurrentUser(); + if (userContext) { + item.tags = item.tags || {}; + item.tags['ai.user.id'] = userContext.anonymousId; + + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['user.role'] = userContext.role; + item.data.customProperties['user.subscription'] = userContext.subscription; + item.data.customProperties['user.tenant'] = userContext.tenant; + } +}); + +// Add performance context +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const performance = getPerformanceMetrics(); + if (performance) { + item.data.customMeasurements = item.data.customMeasurements || {}; + item.data.customMeasurements['memory.used'] = performance.memory.usedJSHeapSize; + item.data.customMeasurements['memory.total'] = performance.memory.totalJSHeapSize; + item.data.customMeasurements['timing.domContentLoaded'] = performance.timing.domContentLoadedEventEnd; + } +}); +``` + +#### **Privacy and GDPR Compliance** + +```typescript +// URL sanitization for privacy compliance +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + if (item.baseType === 'PageviewData' && item.baseData?.uri) { + const uri = new URL(item.baseData.uri); + + // Remove sensitive query parameters + const sensitiveParams = [ + 'ssn', 'email', 'password', 'token', 'session', 'auth', + 'api_key', 'secret', 'private', 'confidential' + ]; + + sensitiveParams.forEach(param => { + uri.searchParams.delete(param); + // Also remove variations (case-insensitive) + for (const [key] of uri.searchParams) { + if (key.toLowerCase().includes(param.toLowerCase())) { + uri.searchParams.delete(key); + } + } + }); + + item.baseData.uri = uri.toString(); + } + + return true; +}); + +// PII removal from exception messages +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + if (item.baseType === 'ExceptionData' && item.baseData?.exceptions) { + item.baseData.exceptions.forEach(exception => { + if (exception.message) { + // Remove email patterns + exception.message = exception.message.replace( + /[\w.-]+@[\w.-]+\.\w+/g, + '[email]' + ); + + // Remove phone patterns + exception.message = exception.message.replace( + /\b(\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b/g, + '[phone]' + ); + + // Remove SSN patterns + exception.message = exception.message.replace( + /\b\d{3}-?\d{2}-?\d{4}\b/g, + '[ssn]' + ); + + // Remove credit card patterns + exception.message = exception.message.replace( + /\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/g, + '[credit-card]' + ); + } + + // Sanitize stack traces + if (exception.stack) { + exception.stack = sanitizeStackTrace(exception.stack); + } + }); + } + + return true; +}); + +// Remove sensitive data from custom properties +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + if (item.data.customProperties) { + const sensitiveKeys = [ + 'password', 'secret', 'token', 'key', 'auth', 'credential', + 'ssn', 'social', 'credit', 'card', 'account', 'routing' + ]; + + Object.keys(item.data.customProperties).forEach(key => { + const lowerKey = key.toLowerCase(); + if (sensitiveKeys.some(sensitive => lowerKey.includes(sensitive))) { + item.data.customProperties[key] = '[redacted]'; + } + }); + } + + return true; +}); +``` + +#### **Performance-Based Sampling and Filtering** + +```typescript +// Intelligent sampling based on telemetry characteristics +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + // High-frequency event sampling + if (item.name === 'mouse-move' || item.name === 'scroll') { + // Sample at 1% for high-frequency events + return Math.random() < 0.01; + } + + // Dependency duration-based sampling + if (item.baseType === 'RemoteDependencyData' && item.baseData?.duration) { + const duration = parseFloat(item.baseData.duration); + + // Always keep slow requests (>2 seconds) + if (duration > 2000) return true; + + // Sample fast requests (< 100ms) at 10% + if (duration < 100) return Math.random() < 0.1; + + // Keep medium requests (100ms-2s) at 50% + return Math.random() < 0.5; + } + + // Success vs error sampling + if (item.baseType === 'RequestData') { + const success = item.baseData?.success; + if (success === false) { + // Always keep errors + return true; + } else { + // Sample successful requests at 25% + return Math.random() < 0.25; + } + } + + return true; // Keep by default +}); + +// Rate limiting for specific events +const eventRateLimiter = createRateLimiter({ + 'user-click': { maxPerMinute: 60 }, + 'page-scroll': { maxPerMinute: 30 }, + 'api-call': { maxPerMinute: 100 } +}); + +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + if (item.baseType === 'EventData') { + const eventName = item.name; + if (!eventRateLimiter.allow(eventName)) { + // Event exceeds rate limit + return false; + } + } + + return true; +}); +``` + +#### **Business Context Enrichment** + +```typescript +// Add business-specific context +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const businessContext = getBusinessContext(); + if (businessContext) { + item.data.customProperties = item.data.customProperties || {}; + + // Organizational context + item.data.customProperties['business.unit'] = businessContext.unit; + item.data.customProperties['cost.center'] = businessContext.costCenter; + item.data.customProperties['department'] = businessContext.department; + item.data.customProperties['region'] = businessContext.region; + + // Customer context + item.data.customProperties['customer.tier'] = businessContext.customerTier; + item.data.customProperties['customer.segment'] = businessContext.segment; + + // Feature context + const activeFeatures = getActiveFeatureFlags(); + Object.entries(activeFeatures).forEach(([feature, enabled]) => { + item.data.customProperties[`feature.${feature}`] = enabled.toString(); + }); + } + + return true; +}); + +// Add session-specific business context +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const sessionContext = getSessionContext(); + if (sessionContext) { + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['session.duration'] = sessionContext.getDuration().toString(); + item.data.customProperties['session.pageViews'] = sessionContext.pageViewCount.toString(); + item.data.customProperties['session.userActions'] = sessionContext.userActionCount.toString(); + item.data.customProperties['session.errors'] = sessionContext.errorCount.toString(); + + // Shopping cart context (for e-commerce) + if (sessionContext.cart) { + item.data.customMeasurements = item.data.customMeasurements || {}; + item.data.customMeasurements['cart.items'] = sessionContext.cart.itemCount; + item.data.customMeasurements['cart.value'] = sessionContext.cart.totalValue; + } + } + + return true; +}); +``` + +### **Advanced Initializer Patterns** + +#### **Conditional Processing** + +```typescript +// Environment-specific processing +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const environment = getEnvironment(); + + if (environment === 'development') { + // Add debug information in development + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['debug.enabled'] = 'true'; + item.data.customProperties['debug.timestamp'] = new Date().toISOString(); + + // Add source location for debugging + if (item.baseType === 'ExceptionData') { + const stack = new Error().stack; + item.data.customProperties['debug.sourceLocation'] = extractSourceLocation(stack); + } + } else if (environment === 'production') { + // Remove debug properties in production + if (item.data.customProperties) { + Object.keys(item.data.customProperties).forEach(key => { + if (key.startsWith('debug.')) { + delete item.data.customProperties[key]; + } + }); + } + } + + return true; +}); + +// User role-based processing +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const user = getCurrentUser(); + + if (user?.role === 'admin') { + // Add additional context for admin users + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['admin.privileges'] = user.privileges.join(','); + item.data.customProperties['admin.lastLogin'] = user.lastLogin?.toISOString(); + } else if (user?.role === 'beta-tester') { + // Add beta testing context + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['beta.cohort'] = user.betaCohort; + item.data.customProperties['beta.features'] = user.enabledBetaFeatures.join(','); + } + + return true; +}); +``` + +#### **Error Handling and Resilience** + +```typescript +// Robust error handling in initializers +sdk.addTelemetryInitializer((item: ITelemetryItem) => { + const startTime = performance.now(); + + try { + // Potentially expensive operation + const enrichmentData = getExpensiveEnrichmentData(); + + if (enrichmentData) { + item.data.customProperties = item.data.customProperties || {}; + Object.assign(item.data.customProperties, enrichmentData); + } + + const processingTime = performance.now() - startTime; + + // Monitor initializer performance + if (processingTime > 5) { // 5ms threshold + console.warn(`Slow telemetry initializer: ${processingTime}ms`); + + // Track slow initializer performance + trackSlowInitializer({ + duration: processingTime, + itemType: item.baseType, + itemName: item.name + }); + } + + return true; + } catch (error) { + // Graceful error handling - don't block telemetry + console.error('Telemetry initializer error:', error); + + // Optionally add error context + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['initializer.error'] = error.message; + + return true; // Continue processing despite errors + } +}); +``` + +#### **Chained Processing** + +```typescript +// Initializers execute in registration order +// Design for proper chaining and dependencies + +// 1. First: Add basic context +sdk.addTelemetryInitializer(addBasicContext); + +// 2. Second: Add user context (may depend on basic context) +sdk.addTelemetryInitializer(addUserContext); + +// 3. Third: Add business context (may depend on user context) +sdk.addTelemetryInitializer(addBusinessContext); + +// 4. Fourth: Apply privacy filtering (after all context is added) +sdk.addTelemetryInitializer(applyPrivacyFiltering); + +// 5. Fifth: Apply sampling (final step before sending) +sdk.addTelemetryInitializer(applySampling); + +function addBasicContext(item: ITelemetryItem): boolean { + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['app.version'] = APP_VERSION; + item.data.customProperties['app.environment'] = ENVIRONMENT; + return true; +} + +function addUserContext(item: ITelemetryItem): boolean { + const user = getCurrentUser(); + if (user) { + // User context depends on basic context being present + item.data.customProperties['user.id'] = user.id; + item.data.customProperties['user.type'] = user.type; + } + return true; +} + +function addBusinessContext(item: ITelemetryItem): boolean { + // Business context may depend on user context + const user = getCurrentUser(); + if (user?.organization) { + item.data.customProperties['org.id'] = user.organization.id; + item.data.customProperties['org.tier'] = user.organization.tier; + } + return true; +} + +function applyPrivacyFiltering(item: ITelemetryItem): boolean { + // Apply privacy rules after all context is added + return sanitizeForPrivacy(item); +} + +function applySampling(item: ITelemetryItem): boolean { + // Final sampling decision based on complete context + return shouldSampleItem(item); +} +``` + +## Performance Comparison + +### **Telemetry Initializers vs OpenTelemetry Processors** + +| Aspect | Telemetry Initializers | OpenTelemetry Processors | +|--------|------------------------|---------------------------| +| **Processing Time** | ~1-2μs per item | ~5-10μs per item | +| **Bundle Size** | ~0.5KB per initializer | ~2-3KB per processor | +| **Memory Overhead** | Minimal (callback only) | Higher (processor instance + state) | +| **Initialization Cost** | Near-zero | Class instantiation + setup | +| **Error Handling** | Simple try/catch | Complex state management | +| **Lifecycle Management** | Automatic | Manual (onStart, onEnd, shutdown) | +| **Flexibility** | Simple callback pattern | Full processor lifecycle | +| **Use Cases** | Data enrichment, filtering | Complex transformations, batching | +| **Learning Curve** | Minimal | OpenTelemetry knowledge required | +| **Migration Effort** | Direct from App Insights | Requires refactoring | + +### **When to Use Each Approach** + +#### **Use Telemetry Initializers For:** + +- **Data Enrichment**: Adding context, properties, and measurements +- **Privacy Filtering**: Removing PII and sensitive data +- **Simple Sampling**: Basic sampling and rate limiting +- **Context Addition**: User, session, business context +- **Quick Filtering**: Simple true/false filtering decisions +- **Migration**: Moving from Application Insights +- **Performance Critical**: Low-latency scenarios +- **Simple Logic**: Straightforward transformations + +#### **Use OpenTelemetry Processors For:** + +- **Complex Processing**: Advanced telemetry transformations +- **Batching Logic**: Custom batching strategies +- **Export Control**: Fine-grained export management +- **Lifecycle Hooks**: Need onStart, onEnd callbacks +- **State Management**: Processing that requires state +- **Resource Management**: Advanced resource handling +- **Standard Compliance**: Pure OpenTelemetry compatibility +- **Advanced Scenarios**: Complex business logic + +## Migration from Application Insights + +### **Direct Migration Path** + +Existing Application Insights telemetry initializers can be migrated with minimal changes: + +```typescript +// Existing Application Insights pattern +appInsights.addTelemetryInitializer((envelope) => { + envelope.tags['ai.user.authUserId'] = getCurrentUserId(); + envelope.data.baseData.properties = envelope.data.baseData.properties || {}; + envelope.data.baseData.properties['customProperty'] = 'value'; + return true; +}); + +// Equivalent OTelWebSdk pattern +sdk.addTelemetryInitializer((item) => { + item.tags = item.tags || {}; + item.tags['ai.user.authUserId'] = getCurrentUserId(); + + item.data.customProperties = item.data.customProperties || {}; + item.data.customProperties['customProperty'] = 'value'; + + return true; // Must explicitly return true +}); +``` + +### **Migration Differences** + +| Application Insights | OTelWebSdk | Notes | +|----------------------|-------------|-------| +| `envelope.tags` | `item.tags` | Same structure | +| `envelope.data.baseData.properties` | `item.data.customProperties` | Simplified path | +| `envelope.data.baseData.measurements` | `item.data.customMeasurements` | Simplified path | +| `envelope.name` | `item.name` | Same | +| `envelope.time` | `item.time` | Same | +| Return value optional | Return value required | Must return boolean | + +### **Migration Utilities** + +```typescript +// Utility to wrap existing Application Insights initializers +function wrapAppInsightsInitializer( + aiInitializer: (envelope: any) => boolean | void +): ITelemetryInitializer { + return (item: ITelemetryItem) => { + // Convert OTelWebSdk item to Application Insights envelope format + const envelope = { + name: item.name, + time: item.time, + tags: item.tags || {}, + data: { + baseType: item.baseType, + baseData: { + ...item.baseData, + properties: item.data.customProperties || {}, + measurements: item.data.customMeasurements || {} + } + } + }; + + // Call original initializer + const result = aiInitializer(envelope); + + // Convert back to OTelWebSdk format + item.tags = envelope.tags; + item.data.customProperties = envelope.data.baseData.properties; + item.data.customMeasurements = envelope.data.baseData.measurements; + + return result !== false; // Convert to boolean + }; +} + +// Usage +const existingAIInitializer = (envelope) => { + envelope.tags['custom.tag'] = 'value'; + return true; +}; + +sdk.addTelemetryInitializer( + wrapAppInsightsInitializer(existingAIInitializer) +); +``` + +## Integration with OpenTelemetry Ecosystem + +### **Hybrid Processing Pipeline** + +The SDK supports seamless integration between telemetry initializers and OpenTelemetry processors: + +```typescript +// Processing order: +// 1. Telemetry Initializers (lightweight, fast) +// 2. OpenTelemetry Processors (full featured) +// 3. Export + +// Step 1: Lightweight processing +sdk.addTelemetryInitializer(addUserContext); +sdk.addTelemetryInitializer(sanitizeUrls); +sdk.addTelemetryInitializer(applySampling); + +// Step 2: Full OpenTelemetry processing +sdk.addSpanProcessor(new BatchSpanProcessor(spanExporter)); +sdk.addLogRecordProcessor(new BatchLogRecordProcessor(logExporter)); + +// Step 3: Export happens automatically through processors +``` + +### **Data Flow** + +``` +Telemetry Event + ↓ +Telemetry Initializers (Application Insights style) + ↓ +OpenTelemetry API Layer + ↓ +OpenTelemetry Processors + ↓ +OpenTelemetry Exporters + ↓ +Backend Systems +``` + +### **Best Practices** + +#### **Use Initializers for Fast Path Processing** + +```typescript +// Fast path: Use initializers for simple, common operations +sdk.addTelemetryInitializer(addUserContext); // ~1μs +sdk.addTelemetryInitializer(addAppVersion); // ~1μs +sdk.addTelemetryInitializer(sanitizeUrls); // ~2μs + +// Slow path: Use processors for complex operations +sdk.addSpanProcessor(new ComplexAnalyticsProcessor()); // ~10μs +``` + +#### **Maintain Clear Separation** + +```typescript +// Data enrichment → Initializers +sdk.addTelemetryInitializer(addBasicContext); + +// Complex processing → Processors +sdk.addSpanProcessor(new AdvancedProcessor()); + +// Export control → Processors +sdk.addSpanProcessor(new BatchSpanProcessor(exporter)); +``` + +## Configuration and Management + +### **Dynamic Initializer Management** + +```typescript +// Add initializers +const userContextInitializer = (item) => { + // Add user context + return addUserContext(item); +}; + +sdk.addTelemetryInitializer(userContextInitializer); + +// Remove initializers +sdk.removeTelemetryInitializer(userContextInitializer); + +// Conditional registration +if (featureFlags.enableUserTracking) { + sdk.addTelemetryInitializer(userContextInitializer); +} +``` + +### **Initializer Registry** + +```typescript +class InitializerRegistry { + private initializers = new Map(); + + register(name: string, initializer: ITelemetryInitializer): void { + this.initializers.set(name, initializer); + sdk.addTelemetryInitializer(initializer); + } + + unregister(name: string): void { + const initializer = this.initializers.get(name); + if (initializer) { + sdk.removeTelemetryInitializer(initializer); + this.initializers.delete(name); + } + } + + list(): string[] { + return Array.from(this.initializers.keys()); + } +} + +// Usage +const registry = new InitializerRegistry(); +registry.register('userContext', addUserContext); +registry.register('businessContext', addBusinessContext); +registry.unregister('userContext'); +``` + +## Testing and Debugging + +### **Testing Initializers** + +```typescript +// Unit testing telemetry initializers +describe('User Context Initializer', () => { + it('should add user context when user is present', () => { + // Arrange + const mockUser = { id: '123', role: 'admin' }; + jest.spyOn(userService, 'getCurrentUser').mockReturnValue(mockUser); + + const item: ITelemetryItem = { + name: 'test-event', + time: new Date().toISOString(), + baseType: 'EventData', + baseData: {}, + data: {} + }; + + // Act + const result = addUserContext(item); + + // Assert + expect(result).toBe(true); + expect(item.data.customProperties['user.id']).toBe('123'); + expect(item.data.customProperties['user.role']).toBe('admin'); + }); + + it('should handle missing user gracefully', () => { + // Arrange + jest.spyOn(userService, 'getCurrentUser').mockReturnValue(null); + + const item: ITelemetryItem = { + name: 'test-event', + time: new Date().toISOString(), + baseType: 'EventData', + baseData: {}, + data: {} + }; + + // Act + const result = addUserContext(item); + + // Assert + expect(result).toBe(true); + expect(item.data.customProperties).toBeUndefined(); + }); +}); +``` + +### **Debugging Initializers** + +```typescript +// Debug wrapper for initializers +function debugInitializer( + name: string, + initializer: ITelemetryInitializer +): ITelemetryInitializer { + return (item: ITelemetryItem) => { + console.log(`[${name}] Processing:`, item.name); + + const startTime = performance.now(); + const result = initializer(item); + const duration = performance.now() - startTime; + + console.log(`[${name}] Result: ${result}, Duration: ${duration}ms`); + + if (duration > 5) { + console.warn(`[${name}] Slow initializer detected: ${duration}ms`); + } + + return result; + }; +} + +// Usage +sdk.addTelemetryInitializer( + debugInitializer('userContext', addUserContext) +); +``` + +## Summary + +The OTelWebSdk provides a comprehensive telemetry processing solution that combines the best of both worlds: + +- **OpenTelemetry Standards Compliance**: Full support for OpenTelemetry processors as the primary approach +- **Lightweight Alternatives**: Application Insights-style initializers for performance-critical scenarios +- **Migration Path**: Seamless migration from existing Application Insights implementations +- **Hybrid Architecture**: Use both approaches together for optimal performance and functionality + +This dual approach enables teams to choose the right tool for each use case while maintaining compatibility with both the OpenTelemetry ecosystem and existing Application Insights implementations. \ No newline at end of file diff --git a/docs/planning/otel/specs/OTelWebSdk-Testing.md b/docs/planning/otel/specs/OTelWebSdk-Testing.md new file mode 100644 index 000000000..a75e1151b --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-Testing.md @@ -0,0 +1,157 @@ +# OpenTelemetry Web SDK - Testing Strategy + +## Overview + +This document outlines the comprehensive testing strategy for the OpenTelemetry Web SDK implementation, covering unit testing, integration testing, performance testing, and browser compatibility validation. + +## Testing Architecture + +## Unit Testing Strategy + +### Core Component Testing + +- **Constructor injection validation** +- **Multi-tenant isolation** +- **Resource sharing verification** +- **Disposal cleanup** +- **Initialization state management** +- **Configuration validation** +- **Error handling** +- **Resource cleanup** + +### Browser API Mocking + +- **Performance API mocking for consistent testing** +- **Storage API mocking for state management tests** +- **Network API mocking for export pipeline tests** +- **DOM API mocking for browser-specific functionality** + +## Integration Testing + +- **Provider interaction validation** +- **Data flow verification** +- **Event propagation testing** +- **Configuration synchronization** +- **Export pipeline testing** +- **Cross-component communication** + +## Performance Testing + +### Performance Test Framework + +- **Automated performance measurement** +- **Statistical analysis of performance metrics** +- **Baseline comparison and regression detection** +- **Memory usage monitoring** + +### Performance Benchmarks + +- **SDK initialization performance targets** +- **Telemetry creation efficiency metrics** +- **Export pipeline throughput** +- **Memory leak detection** + +### Memory Testing + +- **Memory usage validation during normal operations** +- **Garbage collection impact assessment** +- **Resource cleanup verification** +- **Long-running application scenarios** + +## Browser Compatibility Testing + +### Cross-Browser Test Matrix + +| Feature | Chrome | Firefox | Safari | Edge | IE11* | +|---------|--------|---------|--------|------|-------| +| Basic SDK | ✓ | ✓ | ✓ | ✓ | ✓ | +| Async/Await | ✓ | ✓ | ✓ | ✓ | Polyfill | +| Performance API | ✓ | ✓ | ✓ | ✓ | Fallback | +| Storage API | ✓ | ✓ | ✓ | ✓ | ✓ | +| Worker Support | ✓ | ✓ | ✓ | ✓ | ✗ | + +*IE11 support through polyfills and fallbacks + +### Browser-Specific Tests + +- **Performance API fallback testing** +- **Storage API fallback validation** +- **Polyfill compatibility verification** +- **Feature detection accuracy** + +## End-to-End Testing + +### Real-World Scenarios + +- **Complete user journey tracking** +- **Multi-provider interaction validation** +- **Export pipeline end-to-end verification** +- **Context propagation across telemetry types** +- **Performance impact assessment** + +## Test Configuration and Setup + +### Jest Configuration +- **TypeScript preset configuration** +- **JSDOM test environment** +- **Module name mapping** +- **Coverage collection settings** +- **Coverage thresholds** + +### Test Setup +- **Global test setup and teardown** +- **Browser API mocking** +- **Custom matchers** +- **Test state reset** + +## Continuous Integration + +### GitHub Actions Workflow +- **Multi-node version testing matrix** +- **Automated dependency installation** +- **Unit test execution** +- **Integration test execution** +- **Browser test execution** +- **Performance benchmark execution** +- **Coverage reporting** + +## Testing Best Practices + +### Test Organization +- **Collocate tests with source code** in dedicated `Tests/` directories +- **Use descriptive test names** that explain the scenario +- **Group related tests** in logical describe blocks +- **Follow AAA pattern**: Arrange, Act, Assert + +### Mock Strategy +- **Mock external dependencies** but test real interactions +- **Use factory functions** for creating test objects +- **Avoid over-mocking** - test real code paths when possible +- **Reset mocks** between tests to ensure isolation + +### Performance Testing Guidelines +- **Establish baseline metrics** before making changes +- **Test on representative hardware** and network conditions +- **Monitor memory usage** and garbage collection +- **Use statistical analysis** for performance metrics + +### Browser Testing Strategy +- **Test core functionality** on all supported browsers +- **Use progressive enhancement** for advanced features +- **Implement graceful degradation** for missing APIs +- **Automated cross-browser testing** in CI/CD pipeline + +## Future Enhancements + +### Planned Testing Improvements +1. **Visual regression testing** for SDK console output +2. **Load testing** with realistic telemetry volumes +3. **Chaos engineering** for reliability testing +4. **A/B testing framework** for performance optimizations +5. **Real user monitoring** integration for validation + +### Testing Tools Evaluation +- **Playwright** for cross-browser E2E testing +- **Lighthouse CI** for performance regression detection +- **Storybook** for component documentation and testing +- **Chromatic** for visual testing automation diff --git a/docs/planning/otel/specs/OTelWebSdk-UsageExamples.md b/docs/planning/otel/specs/OTelWebSdk-UsageExamples.md new file mode 100644 index 000000000..a738672c6 --- /dev/null +++ b/docs/planning/otel/specs/OTelWebSdk-UsageExamples.md @@ -0,0 +1,724 @@ +# OTelWebSdk Usage Examples + +This document provides comprehensive usage examples for the OpenTelemetry Web SDK, demonstrating various patterns and scenarios for implementing telemetry in web applications. + +## Basic Usage Examples + +**Note**: Examples demonstrate patterns and concepts. Exact APIs and configuration options will be finalized during implementation design. + +```typescript +// Standard OTel - Uses global state (problematic for multi-team apps) +import { trace } from '@opentelemetry/api'; + +const tracer = trace.getTracer('my-app'); // Global tracer provider +const span = tracer.startSpan('user-action'); +span.end(); + +// Problem: All teams share the same global configuration +// Problem: Cannot isolate SDK instances or configurations +// Problem: Difficult to unload or cleanup specific team's telemetry + +// OTelWebSdk - Avoids globals, provides isolated instances +const factory = createSdkFactory(); +const sdk = factory.newInst('my-team', { + connectionString: 'InstrumentationKey=team-specific-key' +}); + +// Each SDK instance provides its own tracer provider accessor +const tracer = sdk.getTracer('my-service'); + +const span = tracer.startSpan('page-load'); +span.setAttributes({ + 'page.name': 'Product Page', + 'page.category': 'electronics', + 'web.vitals.lcp': 2100, + 'web.vitals.fid': 50, + 'web.vitals.cls': 0.05, + 'user.session.id': 'session-123', + 'browser.name': 'Chrome', + 'browser.version': '118.0' +}); +span.end(); + +// Basic metrics collection (simple counters, histograms, gauges only) +const meter = sdk.getMeter('my-app-metrics'); +const pageViewCounter = meter.createCounter('page_views_total', { + description: 'Total number of page views' +}); +const loadTimeHistogram = meter.createHistogram('page_load_time_ms', { + description: 'Page load time in milliseconds' +}); + +// Simple metric recording +pageViewCounter.add(1, { 'page.name': 'Product Page' }); +loadTimeHistogram.record(1250, { 'page.type': 'product' }); + +// Optional: Application Insights compatibility layer (secondary goal) +// These methods would be implemented as convenience methods over OpenTelemetry APIs +sdk.trackPageView({ + name: 'Product Page', + uri: '/products/123', + properties: { category: 'electronics' } +}); +``` + +### Instance Isolation Benefits + +```typescript +// Multiple teams can coexist without interference +const teamA = factory.newInst('team-a', { + connectionString: 'InstrumentationKey=team-a-key', + samplingRate: 100 +}); + +const teamB = factory.newInst('team-b', { + connectionString: 'InstrumentationKey=team-b-key', + samplingRate: 10 // Different sampling rate +}); + +// Each team gets isolated API accessors - no global conflicts +const tracerA = teamA.getTracer('service-a'); // Team A's tracer +const tracerB = teamB.getTracer('service-b'); // Team B's tracer + +// Teams can unload independently without affecting others +await teamA.unload(); // Only Team A's telemetry stops +// Team B continues working normally + +// Compare to global approach where unloading affects everyone: +// trace.disable(); // This would break ALL teams using global APIs +``` + +### Multi-Team Benefits + +```typescript +// Standard OpenTelemetry - Global providers cause team conflicts +import { trace } from '@opentelemetry/api'; + +// All teams forced to share same global configuration +const globalTracer = trace.getTracer('team-a'); // Uses global trace provider and it's processors + +// Problems: +// - Teams cannot have different connection strings +// - Teams cannot have different samplers +// - Teams cannot unload independently +// - Teams cannot have isolated configurations +// - Instrumentations are forced to use the globals + +// OTelWebSdk - Team isolation without globals +const factory = createSdkFactory(); + +// Team A - E-commerce (isolated instance) +const ecommerceSDK = factory.newInst('ecommerce-team', { + connectionString: 'InstrumentationKey=ecommerce-key', + tracerConfig: { serviceName: 'ecommerce-frontend' }, + contextOverrides: { 'team.name': 'ecommerce' }, + samplingRate: 100 // High sampling for critical team +}); + +// Team B - Analytics (completely separate instance) +const analyticsSDK = factory.newInst('analytics-team', { + connectionString: 'InstrumentationKey=analytics-key', + tracerConfig: { serviceName: 'analytics-widgets' }, + contextOverrides: { 'team.name': 'analytics' }, + samplingRate: 10 // Lower sampling for high-volume widgets +}); + +// Teams get isolated providers / processors from their SDK instances +const ecomTracer = ecommerceSDK.getTracer('shopping'); // Team A's tracer +const analyticsTracer = analyticsSDK.getTracer('widgets'); // Team B's tracer + +// No global state means no conflicts between teams +// Each team manages their own telemetry lifecycle +// Factory optimizes shared resources (connections, timers) behind the scenes +// Provided OTelWebSdk Instrumentations are isolated to each instance +``` + +## Multi-Instance SDK Factory Usage Examples + +### Overview + +The OTelWebSdk supports various multi-instance patterns for different organizational and technical requirements: + +**Core Usage Patterns:** +- **Multi-Team Coordination**: Teams working independently with shared resource optimization +- **Legacy Compatibility**: Backward-compatible patterns for existing applications +- **OpenTelemetry + Application Insights**: Combined API usage with instance isolation +- **Advanced Multi-Instance**: Complex enterprise scenarios with multiple managers +- **Comprehensive Configuration**: Full configuration examples with all available options + +**Multi-Instance Benefits:** +- **Team Independence**: Each team gets isolated SDK instances with independent configuration +- **Resource Optimization**: Factory coordinates shared resources (connections, timers) efficiently +- **Instance Management**: Factory provides instance tracking, discovery, and lifecycle coordination +- **Dynamic Control**: Runtime instance management with proper cleanup and resource reclamation + +**Key Capabilities Demonstrated:** +- **Isolated Configuration**: Teams can have different connection strings, sampling rates, and contexts +- **Factory Coordination**: Shared resource optimization while maintaining team isolation +- **Instance Discovery**: Check for existing instances and avoid duplicates +- **Coordinated Cleanup**: Proper resource management and memory cleanup + +### Multi-Instance Multi-Team Usage (Recommended) + +```typescript +import { + createSdkFactory +} from '@microsoft/applicationinsights-otelwebsdk-js'; + +// ===== Primary Entry Point: Get or Create Factory ===== +const factory = createSdkFactory(); + +// ===== Team A: E-commerce Team ===== +const ecommerceSDK = factory.createSDKInstance('ecommerce-team', { + // Team-specific overrides while inheriting manager defaults + tracerConfig: { + serviceName: 'ecommerce-frontend', + serviceVersion: '2.1.0' + }, + contextOverrides: { + 'team.name': 'ecommerce', + 'team.component': 'shopping-cart' + }, + appInsightsConfig: { + enableAutoRouteTracking: true, + // Inherits other settings from manager + } +}); + +// ===== Team B: Analytics Team ===== +const analyticsSDK = factory.createSDKInstance('analytics-team', { + tracerConfig: { + serviceName: 'analytics-widgets', + serviceVersion: '1.5.2' + }, + contextOverrides: { + 'team.name': 'analytics', + 'team.component': 'dashboard' + }, + appInsightsConfig: { + enableAjaxPerfTracking: true, + excludeRequestFromAutoTrackingPatterns: [/\/internal-metrics$/] + } +}); + +// ===== Team C: Platform Team ===== +const platformSDK = factory.createSDKInstance('platform-team', { + tracerConfig: { + serviceName: 'platform-services', + serviceVersion: '3.0.0' + }, + contextOverrides: { + 'team.name': 'platform', + 'team.component': 'shared-services' + } +}); + +// Initialize instances individually (each team manages their own lifecycle) +await ecommerceSDK.initialize(); +await analyticsSDK.initialize(); +await platformSDK.initialize(); + +// ===== Teams Use Their Isolated SDK Instances ===== + +// Team A - E-commerce telemetry (OpenTelemetry primary) +// Option 1: Direct provider access +const ecomTracerProvider = ecommerceSDK.getTracerProvider(); +const ecomTracer = ecomTracerProvider.getTracer('shopping-cart', '2.1.0'); + +// Option 2: Convenience method (delegates to provider) +const ecomTracerConv = ecommerceSDK.getTracer('shopping-cart', '2.1.0'); + +const span = ecomTracer.startSpan('add-to-cart'); +span.setAttributes({ + 'product.id': '123', + 'product.category': 'electronics', + 'cart.total_items': 3, + 'team.name': 'ecommerce' +}); +span.end(); + +// Optional: Application Insights compatibility methods +ecommerceSDK.trackPageView?.({ + name: 'Product Page', + uri: '/products/laptop-123', + properties: { category: 'electronics', price: 999 } +}); + +// Team B - Analytics telemetry (shares same connection, different context) +const analyticsMeterProvider = analyticsSDK.getMeterProvider(); +const analyticsMeter = analyticsMeterProvider.getMeter('dashboard', '1.5.2'); +const widgetRenderTime = analyticsMeter.createHistogram('widget.render.duration'); + +widgetRenderTime.record(150, { + 'widget.type': 'sales-chart', + 'team.name': 'analytics' +}); + +// ===== Factory Benefits ===== +console.log(`Total teams sharing SDK: ${factory.getInstanceCount()}`); // 3 +console.log(`Active SDK instances: ${factory.getInstanceNames()}`); // ['ecommerce-team', 'analytics-team', 'platform-team'] + +// Check if a team already has an instance (avoid duplicates) +if (factory.hasInstance('ecommerce-team')) { + console.log('E-commerce team already has an SDK instance'); + // Get existing instance + const existingSDK = factory.getInst('ecommerce-team'); +} + +// ===== Cleanup (Coordinated Through Factory) ===== +// Teams can unload individually (factory tracks automatically) +await ecommerceSDK.unload(); // Factory count decreases automatically + +// Or factory can coordinate shutdown of all instances +await factory.unloadAllInstances(); +console.log(`Remaining instances: ${factory.getInstanceCount()}`); // 0 +``` + +### Legacy Compatibility Usage (Backward Compatible) + +```typescript +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// Consistent IoC pattern +const factory = createSdkFactory(); +const sdk = factory.newInst('legacy-app', { + connectionString: 'InstrumentationKey=your-key-here' +}); + +await sdk.initialize(); +``` +console.log(`Instances managed: ${defaultFactory.getInstanceCount()}`); // 1 +console.log(`Instance names: ${defaultFactory.getInstanceNames()}`); // ['legacy-app'] + +// Standard usage remains the same +sdk.trackPageView({ name: 'Home Page' }); +const tracer = sdk.getTracer('my-service'); + +// Cleanup works as before (automatically unregisters from factory) +await sdk.unload(); +``` + +### OpenTelemetry + Application Insights API Usage + +```typescript +import { + getOTelWebSDKManager +} from '@microsoft/applicationinsights-otelwebsdk-js'; + +// Avoid: Standard OpenTelemetry globals (causes multi-team issues) +// import { trace, metrics } from '@opentelemetry/api'; +// const globalTracer = trace.getTracer('my-service'); // Don't use globals! + +// Correct: Create isolated SDK instance through factory +const factory = getOTelWebSDKManager(); +const sdk = factory.newInst('my-app', { + connectionString: 'InstrumentationKey=your-key-here' +}); + +await sdk.initialize(); + +// ===== Instance-Specific OpenTelemetry API Accessors ===== + +// Use standard OpenTelemetry API accessors from SDK instance +const tracer = sdk.trace.getTracer('my-service', '1.0.0'); +const logger = sdk.logs.getLogger('my-service', '1.0.0'); + +// Create spans using standard OpenTelemetry API but with instance isolation +const span = tracer.startSpan('user-action'); +span.setAttributes({ + 'user.id': '12345', + 'action.type': 'button-click' +}); +span.addEvent('validation-started'); +span.setStatus({ code: SpanStatusCode.OK }); +span.end(); + +// Use active span pattern (works with instance-specific context) +tracer.startActiveSpan('database-query', async (span) => { + try { + // Database operation + const result = await queryDatabase(); + span.setAttributes({ 'db.rows_affected': result.rowCount }); + span.setStatus({ code: SpanStatusCode.OK }); + return result; + } catch (error) { + span.recordException(error); + span.setStatus({ code: SpanStatusCode.ERROR }); + throw error; + } finally { + span.end(); + } +}); + +// Context management through instance +const activeContext = sdk.context.active(); +sdk.context.with(activeContext, () => { + // Work with instance-specific context +}); + +// Emit structured logs with instance-specific context +logger.emit({ + severityText: 'INFO', + body: 'User action completed', + attributes: { + 'user.id': '12345', + 'action.type': 'button-click', + 'session.id': 'session-abc123' + } +}); + +// Create metrics with instance-specific meter +tracer.startActiveSpan('database-query', async (span) => { + try { + const result = await fetchUserData(); + span.setAttributes({ + 'db.operation': 'SELECT', + 'db.table': 'users' + }); + return result; + } catch (error) { + span.recordException(error); + span.setStatus({ + code: SpanStatusCode.ERROR, + message: error.message + }); + throw error; + } finally { + span.end(); + } +}); + +// Get OpenTelemetry logger +const logger = sdk.getLogger('my-service', '1.0.0'); + +// Emit structured logs +logger.emit({ + severityText: 'INFO', + body: 'User logged in successfully', + attributes: { + 'user.id': '12345', + 'session.id': 'abc-def-123' + } +}); + +// Context management +const ctx = sdk.setContextValue(userKey, { id: '12345' }, sdk.getActiveContext()); +sdk.setActiveContext(ctx, () => { + // Code executed with user context active + console.log('User context is active'); +}); + +// ===== Application Insights Extensions ===== + +// Track page views (Web-specific) +sdk.trackPageView({ + name: 'Home Page', + uri: 'https://myapp.com/home', + properties: { + 'page.section': 'dashboard', + 'user.tier': 'premium' + }, + measurements: { + 'load.time': 1250 + } +}); + +// Track custom events +sdk.trackEvent({ + name: 'Feature Used', + properties: { + 'feature.name': 'advanced-search', + 'user.id': '12345' + }, + measurements: { + 'search.results': 42 + } +}); + +// Track exceptions with Application Insights format +sdk.trackException({ + exception: new Error('Payment processing failed'), + severityLevel: SeverityLevel.Error, + properties: { + 'payment.method': 'credit-card', + 'amount': 99.99 + } +}); + +// Track dependencies (AJAX calls, external services) +sdk.trackDependency({ + name: 'GET /api/users', + data: 'https://api.example.com/users', + duration: 250, + success: true, + resultCode: 200, + type: 'HTTP', + target: 'api.example.com', + properties: { + 'request.id': 'req-123' + } +}); + +// Track traces/logs +sdk.trackTrace({ + message: 'User session started', + severityLevel: SeverityLevel.Information, + properties: { + 'user.id': '12345', + 'session.duration': 30 + } +}); + +// Set context information +sdk.setUserContext({ + id: '12345', + authenticatedUserId: 'user@example.com', + accountId: 'account-456' +}); + +sdk.setSessionContext({ + id: 'session-789', + isFirst: false +}); + +sdk.setDeviceContext({ + type: 'Browser', + model: 'Chrome', + os: 'Windows', + osVersion: '10', + resolution: '1920x1080' +}); + +// Add telemetry initializer for custom processing +sdk.addTelemetryInitializer((item) => { + // Add custom tags to all telemetry + item.tags['ai.user.authUserId'] = getCurrentUserId(); + item.tags['ai.cloud.role'] = 'frontend'; + + // Modify or filter telemetry + if (item.baseType === 'ExceptionData') { + // Add additional context to exceptions + item.baseData.properties = item.baseData.properties || {}; + item.baseData.properties['app.version'] = getAppVersion(); + } + + return true; // true = send telemetry, false = filter out +}); + +// Enable automatic tracking +sdk.enableAutoTracking({ + enableRequestHeaderTracking: true, + enableResponseHeaderTracking: false, + enableAjaxPerfTracking: true, + enableUnhandledPromiseRejectionTracking: true, + excludeRequestFromAutoTrackingPatterns: [ + /\/health$/, + /\/metrics$/ + ] +}); + +// Time user actions +const timer = sdk.startTrackingUserAction('checkout-process'); +// ... user performs checkout ... +timer.stop(); // Automatically tracks duration + +// Cleanup when done +await sdk.unload(); +``` + +## Advanced Multi-Instance Usage Patterns + +### Multiple Manager Usage for Complex Scenarios + +```typescript +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// ===== Scenario: Multi-Project Enterprise Application ===== + +// Project Alpha - Main Application +const factory = createSdkFactory(); +const alphaInstance = factory.newInst('main-app', { + connectionString: 'InstrumentationKey=alpha-key' +}); + +// Project Beta - Micro-frontend +const betaInstance = factory.newInst('micro-frontend', { + connectionString: 'InstrumentationKey=beta-key' +}); + +await alphaInstance.initialize(); +await betaInstance.initialize(); + +// ===== Factory-based Management ===== +// All instances managed through factory +const factory = createSdkFactory(); + +// Independent control +await defaultMgr.getSharedResources().exportQueue.flush(); +await alphaMgr.shutdownAll(); +console.log(`Beta instances: ${betaMgr.getInstanceCount()}`); +``` + +### Factory-based Multi-Team Usage + +```typescript +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// Create SDK instances for different teams using factory +const factory = createSdkFactory(); +const teamCInstance = factory.createSDKInstance('team-c', { + connectionString: 'InstrumentationKey=team-c-key' +}); + +const teamDInstance = factory.createSDKInstance('team-d', { + connectionString: 'InstrumentationKey=team-d-key' +}); + +// Initialize the instances +await teamCInstance.initialize(); +await teamDInstance.initialize(); + +// Each team manages their own SDK instance independently +const tracerC = teamCInstance.getTracer('team-c-service'); +const tracerD = teamDInstance.getTracer('team-d-service'); +``` + +### Advanced Resource Control + +```typescript +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// Create SDK instance with resource control +const factory = createSdkFactory(); +const instance = factory.newInst('my-instance', { + resourceControls: { + maxQueueSize: 1000, + maxConnections: 2, + timerMinimumInterval: 1000 + } +}); +``` + +## Comprehensive Configuration Usage + +```typescript +import { createSdkFactory } from '@microsoft/applicationinsights-otelwebsdk-js'; + +// Create SDK with comprehensive configuration +const factory = createSdkFactory(); +const sdk = factory.newInst('comprehensive-example', { + // Connection configuration + connectionString: 'InstrumentationKey=your-key-here;IngestionEndpoint=https://your-endpoint.com/', + + // OpenTelemetry configuration + tracerConfig: { + serviceName: 'my-web-application', + serviceVersion: '1.2.3', + serviceInstanceId: 'instance-456' + }, + + // Context overrides for all telemetry + contextOverrides: { + 'service.namespace': 'production', + 'deployment.environment': 'prod-west-us', + 'team.name': 'frontend', + 'application.tier': 'web' + }, + + // Resource attributes (OpenTelemetry Resource) + resourceAttributes: { + 'host.name': 'web-server-01', + 'os.type': 'windows', + 'container.id': 'container-789' + }, + + // OpenTelemetry providers configuration + providers: { + // Tracer provider with custom configuration + tracerProvider: { + spanLimits: { + attributeValueLengthLimit: 1024, + attributeCountLimit: 128, + eventCountLimit: 128, + linkCountLimit: 128 + }, + + // Span processors + spanProcessors: [{ + type: 'batch', + options: { + maxQueueSize: 2048, + maxExportBatchSize: 512, + exportTimeoutMillis: 30000, + scheduledDelayMillis: 5000 + } + }], + + // Sampling configuration + sampler: { + type: 'traceid-ratio', + ratio: 0.1 // 10% sampling + } + }, + + // Metric provider configuration + meterProvider: { + readers: [{ + type: 'periodic-exporting', + exportIntervalMillis: 30000 + }] + } + }, + + // Application Insights specific configuration + appInsightsConfig: { + enableAutoRouteTracking: true, + enableRequestHeaderTracking: true, + enableAjaxPerfTracking: true, + enableUnhandledPromiseRejectionTracking: true, + samplingPercentage: 100, + maxBatchSizeInBytes: 1024 * 1024, // 1MB + maxBatchInterval: 15000, // 15 seconds + excludeRequestFromAutoTrackingPatterns: [ + /\/health$/, + /\/metrics$/, + /\/ping$/ + ] + }, + + // Context and session management + contextConfig: { + enableCookieUsage: true, + enableSessionStorageBuffer: true, + sessionRenewalMs: 30 * 60 * 1000, // 30 minutes + sessionExpirationMs: 24 * 60 * 60 * 1000 // 24 hours + }, + + // Performance monitoring + performanceConfig: { + enablePerformanceTimingBuffer: true, + enablePerfMgr: true, + perfEvtsSendAll: false + }, + + // Privacy and compliance + privacyConfig: { + enableWebAnalytics: true, + enableDebug: false, + enableCorsCorrelation: true, + correlationHeaderExcludedDomains: [ + 'third-party-analytics.com', + 'cdn.example.com' + ], + distributedTracingMode: DistributedTracingModes.AI_AND_W3C + } +}); + +await sdk.initialize(); +``` + +--- + +For more information about architecture and implementation details, see: +- [OTelWebSdk.md](../OTelWebSdk.md) - Main specification document +- [OTelWebSdk-Instrumentation.md](./OTelWebSdk-Instrumentation.md) - Dynamic instrumentation management \ No newline at end of file diff --git a/docs/planning/otel/specs/README.md b/docs/planning/otel/specs/README.md new file mode 100644 index 000000000..3c6bc64b3 --- /dev/null +++ b/docs/planning/otel/specs/README.md @@ -0,0 +1,35 @@ +# OpenTelemetry Web SDK - Detailed Specifications + +This directory contains the detailed technical specifications for the OpenTelemetry Web SDK components. + +## Overview + +Start with the main specification document: [../OTelWebSdk.md](../OTelWebSdk.md) + +## Component Specifications + +### Core Implementation +- **[OTelWebSdk-Architecture.md](./OTelWebSdk-Architecture.md)** - Architecture patterns, IoC principles, and design details + +### Feature Specifications +- **[OTelWebSdk-TelemetryInitializers.md](./OTelWebSdk-TelemetryInitializers.md)** - Lightweight telemetry processors +- **[OTelWebSdk-Instrumentation.md](./OTelWebSdk-Instrumentation.md)** - Dynamic instrumentation loading and management + +### Usage and Examples +- **[OTelWebSdk-UsageExamples.md](./OTelWebSdk-UsageExamples.md)** - Comprehensive usage examples and patterns + +### Operational Guides +- **[OTelWebSdk-Testing.md](./OTelWebSdk-Testing.md)** - Testing strategies and approaches +- **[OTelWebSdk-Performance.md](./OTelWebSdk-Performance.md)** - Performance optimization and monitoring +- **[OTelWebSdk-Migration.md](./OTelWebSdk-Migration.md)** - Migration guides and tools +- **[OTelWebSdk-Implementation.md](./OTelWebSdk-Implementation.md)** - Implementation phases and roadmap + +### Technical References +- **[OTelWebSdk-Interfaces.md](./OTelWebSdk-Interfaces.md)** - Complete interface definitions and type contracts + +## Reading Order + +1. Start with [../OTelWebSdk.md](../OTelWebSdk.md) for the complete overview +2. Review [OTelWebSdk-Architecture.md](./OTelWebSdk-Architecture.md) for architectural principles +3. Explore component-specific specifications based on your area of interest +4. Consult operational guides for implementation and deployment guidance \ No newline at end of file