Skip to content

mcp: skip keepalive pings while session has in-flight requests#979

Closed
tdabasinskas wants to merge 2 commits into
modelcontextprotocol:mainfrom
cogvel:fix/keepalive-multiple-failures
Closed

mcp: skip keepalive pings while session has in-flight requests#979
tdabasinskas wants to merge 2 commits into
modelcontextprotocol:mainfrom
cogvel:fix/keepalive-multiple-failures

Conversation

@tdabasinskas
Copy link
Copy Markdown

mcp: skip keepalive pings while session has in-flight requests

The current keepalive loop pings unconditionally. When the server holds a long-running request from a Streamable HTTP client that reads the SSE stream but does not post a ping response back through the transport (e.g. Claude Desktop), the server's Ping call times out, session.Close() runs mid-handler, and the eventual tool result is dropped on the floor.

The keepalive mechanism exists to detect dead idle peers. An incoming request the session is still processing is itself a continuous liveness signal — and one already paid for. Skipping pings while hasPendingRequests() reports true sidesteps the race entirely without loosening the "MAY terminate on missed ping" behaviour for genuinely idle sessions.

The current keepalive loop pings unconditionally. When the server holds
a long-running request from a Streamable HTTP client that reads the SSE
stream but does not post a ping response back through the transport
(e.g. Claude Desktop), the server's Ping call times out, session.Close()
runs mid-handler, and the eventual tool result is dropped on the floor.

The keepalive mechanism exists to detect dead *idle* peers. An incoming
request the session is still processing is itself a continuous liveness
signal — and one already paid for. Skipping pings while
hasPendingRequests() reports true sidesteps the race entirely without
loosening the "MAY terminate on missed ping" behaviour for genuinely
idle sessions.
@tdabasinskas tdabasinskas force-pushed the fix/keepalive-multiple-failures branch from 0f5d9f5 to d073052 Compare May 27, 2026 14:34
@guglielmo-san
Copy link
Copy Markdown
Contributor

Hi @tdabasinskas, the proposed behavior is not mentioned anywhere in the spec, which is instead requiring

The receiver MUST respond promptly with an empty response:

https://modelcontextprotocol.io/specification/2025-11-25/basic/utilities/ping#behavior-requirements

I believe the implementation of Claude Desktop is not compliant to the spec, as the Go SDK Client, should correctly handle ping requests while handling long running requests from the server

@tdabasinskas
Copy link
Copy Markdown
Author

Hi @tdabasinskas, the proposed behavior is not mentioned anywhere in the spec, which is instead requiring

The receiver MUST respond promptly with an empty response:

https://modelcontextprotocol.io/specification/2025-11-25/basic/utilities/ping#behavior-requirements

I believe the implementation of Claude Desktop is not compliant to the spec, as the Go SDK Client, should correctly handle ping requests while handling long running requests from the server

Agreed - the client is the non-compliant side here; a conformant client answers pings even mid-tool-call, and Claude Desktop doesn't. No dispute on where the bug is.

The one thing I'd still raise is independent of that: the spec says "multiple" failed pings MAY trigger a reset, but the current loop closes on the first miss. That's stricter than the spec's wording regardless of client behavior. A small consecutive-failure threshold - or a configurable KeepaliveFailureThreshold on ServerOptions, defaulting to current behavior - would align with the "multiple" language.

The practical angle is that the client is the part we can't fix - it's a third-party app on someone else's release cycle - whereas a bit of server-side tolerance is something operators can actually address today.

@guglielmo-san
Copy link
Copy Markdown
Contributor

or a configurable KeepaliveFailureThreshold on ServerOptions, defaulting to current behavior - would align with the "multiple" language

Agree on this, if you want you can modify the PR including this option

@tdabasinskas
Copy link
Copy Markdown
Author

tdabasinskas commented May 29, 2026

or a configurable KeepaliveFailureThreshold on ServerOptions, defaulting to current behavior - would align with the "multiple" language

Agree on this, if you want you can modify the PR including this option

Sure! Should the current implementation with pending requests check in this PR be removed?

@guglielmo-san
Copy link
Copy Markdown
Contributor

Yes, I would remove the current implementation

@tdabasinskas
Copy link
Copy Markdown
Author

tdabasinskas commented May 29, 2026

Yes, I would remove the current implementation

As the implementation differs, I've opened separate #982 for this and closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants