Skip to content

ManagedChannelOrphanWrapper may incorrectly report "shutdown() not called" warning when using directExecutor() with unavailable server #12641

@EricY019

Description

@EricY019

Description

When using directExecutor() with a ManagedChannel that attempts to connect to an unavailable server, the ManagedChannelOrphanWrapper may incorrectly log a warning about the channel not being properly shut down, even though
shutdown() or shutdownNow() was explicitly called.

gRPC-java version

1.77.0

Environment

  • Java 21
  • macOS / Linux

Steps to reproduce

// 1. Create a channel with directExecutor() pointing to an unavailable server
ManagedChannel channel = NettyChannelBuilder
.forAddress("10.3.1.210", 10008) // Server is DOWN
.directExecutor() // Critical: uses calling thread for callbacks
.disableRetry()
.usePlaintext()
.build();

// 2. Force immediate connection attempt
channel.getState(true); // Triggers transport creation and immediate failure

// 3. Properly shutdown the channel
channel.shutdownNow();

// 4. Release reference
channel = null;

// 5. Trigger GC
System.gc();

// Result: Warning may be logged despite proper shutdown

Expected behavior

No warning should be logged since shutdownNow() was explicitly called and ManagedChannelOrphanWrapper.clearShutdown was set to true.

Actual behavior

The orphan wrapper cleanup mechanism may log:
WARNING: ManagedChannel was not shutdown properly!

Root cause analysis

This is a race condition caused by the interaction between directExecutor() and the transport failure callback:

Timeline:

T0: channel.getState(true) triggers connection
└─> startNewTransport()
└─> Netty attempts connection to unavailable server
└─> Connection fails immediately

T1: Transport failure callback executes (ON THE SAME THREAD due to directExecutor)
└─> TransportListener.transportShutdown(Status.UNAVAILABLE)
└─> InternalSubchannel.handleTransportShutdown()
└─> channelTracer.reportEvent(...)
└─> Stack frame holds reference to InternalSubchannel
└─> Which holds reference to ManagedChannelImpl
[Callback still executing...]

T2: shutdownNow() called (WHILE T1 callback is still on the stack)
└─> ManagedChannelOrphanWrapper.shutdownNow()
├─ clearShutdown = true ✓ (flag set correctly)
└─> ManagedChannelImpl.shutdown()
└─> InternalSubchannel.shutdown() ✓

T3: User code releases channel reference
└─> channel = null

  BUT: Stack frame from T1 still holds:
  [Stack] ──> InternalSubchannel ──> channelTracer ──> ManagedChannelImpl

T4: T1 callback finally returns
└─> Stack frame pops
└─> ManagedChannelImpl loses last reference NOW (not at T3)

T5: GC runs
└─> PhantomReference processed
└─> Potential timing issue in cleanup detection

The core problem:

With directExecutor(), the transport failure callback executes synchronously on the calling thread. This creates a window where:

  1. The callback's stack frame holds an indirect reference to ManagedChannelImpl through InternalSubchannel → channelTracer
  2. Even after shutdownNow() is called and clearShutdown = true is set, the stack frame reference persists
  3. The object graph isn't fully eligible for GC until the callback returns
  4. This creates a timing window where the cleanup mechanism may observe inconsistent state

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions