-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Description
When using directExecutor() with a ManagedChannel that attempts to connect to an unavailable server, the ManagedChannelOrphanWrapper may incorrectly log a warning about the channel not being properly shut down, even though
shutdown() or shutdownNow() was explicitly called.
gRPC-java version
1.77.0
Environment
- Java 21
- macOS / Linux
Steps to reproduce
// 1. Create a channel with directExecutor() pointing to an unavailable server
ManagedChannel channel = NettyChannelBuilder
.forAddress("10.3.1.210", 10008) // Server is DOWN
.directExecutor() // Critical: uses calling thread for callbacks
.disableRetry()
.usePlaintext()
.build();
// 2. Force immediate connection attempt
channel.getState(true); // Triggers transport creation and immediate failure
// 3. Properly shutdown the channel
channel.shutdownNow();
// 4. Release reference
channel = null;
// 5. Trigger GC
System.gc();
// Result: Warning may be logged despite proper shutdown
Expected behavior
No warning should be logged since shutdownNow() was explicitly called and ManagedChannelOrphanWrapper.clearShutdown was set to true.
Actual behavior
The orphan wrapper cleanup mechanism may log:
WARNING: ManagedChannel was not shutdown properly!
Root cause analysis
This is a race condition caused by the interaction between directExecutor() and the transport failure callback:
Timeline:
T0: channel.getState(true) triggers connection
└─> startNewTransport()
└─> Netty attempts connection to unavailable server
└─> Connection fails immediately
T1: Transport failure callback executes (ON THE SAME THREAD due to directExecutor)
└─> TransportListener.transportShutdown(Status.UNAVAILABLE)
└─> InternalSubchannel.handleTransportShutdown()
└─> channelTracer.reportEvent(...)
└─> Stack frame holds reference to InternalSubchannel
└─> Which holds reference to ManagedChannelImpl
[Callback still executing...]
T2: shutdownNow() called (WHILE T1 callback is still on the stack)
└─> ManagedChannelOrphanWrapper.shutdownNow()
├─ clearShutdown = true ✓ (flag set correctly)
└─> ManagedChannelImpl.shutdown()
└─> InternalSubchannel.shutdown() ✓
T3: User code releases channel reference
└─> channel = null
BUT: Stack frame from T1 still holds:
[Stack] ──> InternalSubchannel ──> channelTracer ──> ManagedChannelImpl
T4: T1 callback finally returns
└─> Stack frame pops
└─> ManagedChannelImpl loses last reference NOW (not at T3)
T5: GC runs
└─> PhantomReference processed
└─> Potential timing issue in cleanup detection
The core problem:
With directExecutor(), the transport failure callback executes synchronously on the calling thread. This creates a window where:
- The callback's stack frame holds an indirect reference to ManagedChannelImpl through InternalSubchannel → channelTracer
- Even after shutdownNow() is called and clearShutdown = true is set, the stack frame reference persists
- The object graph isn't fully eligible for GC until the callback returns
- This creates a timing window where the cleanup mechanism may observe inconsistent state