Describe the bug
When using OtlpGrpcSpanExporter with the gRPC Netty transport (grpc-netty or grpc-netty-shaded), the exporter creates unbounded grpc-default-worker threads over time, leading to memory exhaustion and eventual OOM.
Steps to reproduce
- Configure
OtlpGrpcSpanExporter using the managed channel gRPC sender (i.e., exclude opentelemetry-exporter-sender-okhttp and add opentelemetry-exporter-sender-grpc-managed-channel)
- Run the application under normal load
- Monitor thread count over time
What did you expect to see?
A stable, bounded number of gRPC/Netty worker threads.
What did you see instead?
Thread count grows continuously. In our case, we observed 5-7 new grpc-default-worker threads created approximately every 10 seconds, with none of them terminating. Each thread uses ~1MB of stack space, leading to significant memory growth (~2.7GB/hour in native memory).
Root cause analysis
When using ManagedChannelBuilder.forTarget() without explicitly configuring an event loop group, Netty defaults to using ThreadPerTaskExecutor for its internal worker threads. Unlike a bounded thread pool, this executor creates a new thread for each task and does not reuse threads.
The GrpcExporterBuilder in opentelemetry-java creates a channel via ManagedChannelBuilder but does not configure:
- A bounded
EventLoopGroup via NettyChannelBuilder.eventLoopGroup()
- Or limit the channel's internal threading behavior
Suggested fix
When building the managed channel for Netty transport, use NettyChannelBuilder directly with a bounded NioEventLoopGroup:
NioEventLoopGroup eventLoopGroup = new NioEventLoopGroup(2); // or some reasonable bounded number
NettyChannelBuilder.forTarget(endpoint)
.eventLoopGroup(eventLoopGroup)
.channelType(NioSocketChannel.class)
// ... other configuration
.build();
This ensures Netty reuses a fixed pool of event loop threads rather than creating unbounded new threads.
Environment
- OS: Linux (containers)
- Java version: 21
- OpenTelemetry version: 1.38.x
- gRPC version: 1.78.0
- Transport: grpc-netty-shaded
Additional context
This issue is distinct from:
The default OkHttp sender may not exhibit this behavior, but users who switch to the managed channel sender with Netty transport will encounter it.
Describe the bug
When using
OtlpGrpcSpanExporterwith the gRPC Netty transport (grpc-nettyorgrpc-netty-shaded), the exporter creates unboundedgrpc-default-workerthreads over time, leading to memory exhaustion and eventual OOM.Steps to reproduce
OtlpGrpcSpanExporterusing the managed channel gRPC sender (i.e., excludeopentelemetry-exporter-sender-okhttpand addopentelemetry-exporter-sender-grpc-managed-channel)What did you expect to see?
A stable, bounded number of gRPC/Netty worker threads.
What did you see instead?
Thread count grows continuously. In our case, we observed 5-7 new
grpc-default-workerthreads created approximately every 10 seconds, with none of them terminating. Each thread uses ~1MB of stack space, leading to significant memory growth (~2.7GB/hour in native memory).Root cause analysis
When using
ManagedChannelBuilder.forTarget()without explicitly configuring an event loop group, Netty defaults to usingThreadPerTaskExecutorfor its internal worker threads. Unlike a bounded thread pool, this executor creates a new thread for each task and does not reuse threads.The
GrpcExporterBuilderin opentelemetry-java creates a channel viaManagedChannelBuilderbut does not configure:EventLoopGroupviaNettyChannelBuilder.eventLoopGroup()Suggested fix
When building the managed channel for Netty transport, use
NettyChannelBuilderdirectly with a boundedNioEventLoopGroup:This ensures Netty reuses a fixed pool of event loop threads rather than creating unbounded new threads.
Environment
Additional context
This issue is distinct from:
SdkTracerProvider#shutdown()#3521 (threads active after shutdown) - that was about cleanup, this is about unbounded creation during normal operationThe default OkHttp sender may not exhibit this behavior, but users who switch to the managed channel sender with Netty transport will encounter it.