Skip to content

ClientSession never sends notifications/cancelled when call_tool is cancelled — server-side coroutines leak #2507

@sherman94062

Description

@sherman94062

Initial Checks

Description

ClientSession.send_request() (and therefore every call_tool, list_tools, etc.) never emits a notifications/cancelled message when its in-flight await is interrupted, regardless of whether the interruption comes from the SDK's own timeout or from the caller's asyncio.wait_for. The MCP spec (cancellation.mdx) requires the sender to issue this notification on timeout, and any cooperative cancellation likewise leaves the server with an orphan request.

Empirical impact: server-side tool coroutines remain suspended after a client cancellation. They hold whatever resources they had acquired (DB connections, cursors, locks, file handles) until the session itself ends. With long-lived sessions and cancellable workloads, every cancelled call is a silent leak.

This was previously raised in #1458 ("Missing Cancellation Notifications on Request Timeout") and closed as DUPLICATE, but the proposed fix never landed and the underlying behavior is still present in 1.29.0. The current report adds a second uncovered path (external CancelledError) and concrete evidence of the resource-leak impact, so I'm filing rather than commenting on the closed issue.

Two paths, neither sends the notification

Path A — SDK-internal timeout (anyio.fail_after): Already documented in #1458. The except TimeoutError branch raises McpError and falls into the finally block that closes the local response stream. No CancelledNotification is sent.

mcp/shared/session.py:290-303 (1.29.0):

try:
    with anyio.fail_after(timeout):
        response_or_error = await response_stream_reader.receive()
except TimeoutError:
    raise McpError(
        ErrorData(
            code=httpx.codes.REQUEST_TIMEOUT,
            message=(
                f"Timed out while waiting for response to "
                f"{request.__class__.__name__}. Waited "
                f"{timeout} seconds."
            ),
        )
    )

Path B — external cancellation: When the caller wraps session.call_tool(...) in asyncio.wait_for(...) (or any other cancellation source), asyncio.CancelledError is raised at the await response_stream_reader.receive() point. There is no except for CancelledError / anyio.get_cancelled_exc_class() anywhere in send_request. The exception flows up through the finally, which only cleans up the client-local response stream:

mcp/shared/session.py:310-313:

finally:
    self._response_streams.pop(request_id, None)
    self._progress_callbacks.pop(request_id, None)
    await response_stream.aclose()
    await response_stream_reader.aclose()

The server is never told the request is gone. Its in-flight tool task continues to completion, then sends back a response that gets dropped because no one is reading the response stream.

Reproduction

Minimal repro script (full version: https://github.com/sherman94062/databricks-ai-steward/blob/main/stress/probe_a1_leak.py):

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

# stress.server is a tiny FastMCP server with two tools:
#   hangs_forever_async_guarded — `await asyncio.sleep(300)`
#   task_count                  — returns len(asyncio.all_tasks()) - 1
async def main():
    params = StdioServerParameters(
        command="python", args=["-m", "stress.server"]
    )
    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            baseline = await session.call_tool("task_count", {})
            print("baseline:", baseline)

            for _ in range(50):
                try:
                    await asyncio.wait_for(
                        session.call_tool("hangs_forever_async_guarded", {}),
                        timeout=0.1,
                    )
                except asyncio.TimeoutError:
                    pass

            await asyncio.sleep(0.5)  # let any cleanup settle
            after = await session.call_tool("task_count", {})
            print("after 50 cancels:", after)

asyncio.run(main())

Output (mcp 1.29.0, Python 3.14):

baseline:        4 tasks
after 50 cancels: 54 tasks

50 cancelled call_tool invocations → 50 leaked server-side coroutines, persistent until the session closes. Same numbers under stdio and streamable_http.

If the client sent notifications/cancelled, the server's existing handler at mcp/shared/session.py:402-406 would cancel each leaked coroutine immediately:

if isinstance(notification.root, CancelledNotification):
    cancelled_id = notification.root.params.requestId
    if cancelled_id in self._in_flight:
        await self._in_flight[cancelled_id].cancel()

The server side already does the right thing on receipt. Only the client-side emit is missing.

Spec citations

Implementations SHOULD establish timeouts for all sent requests… When the request has not received a success or error response within the timeout period, the sender SHOULD issue a cancellation notification for that request and stop waiting for a response.

Either side can cancel an in-progress request by sending a cancellation notification.

https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/docs/specification/draft/basic/utilities/cancellation.mdx

Proposed fix

Two small additions to BaseSession.send_request. The internal-timeout branch is exactly what #1458 proposed; the external-cancellation branch is new and uses anyio.get_cancelled_exc_class() so it works under both asyncio and trio backends:

try:
    with anyio.fail_after(timeout):
        response_or_error = await response_stream_reader.receive()
except TimeoutError:
    await self._send_cancelled_notification(request_id, "request timed out")
    raise McpError(...)
except anyio.get_cancelled_exc_class():
    await self._send_cancelled_notification(request_id, "request cancelled by caller")
    raise


async def _send_cancelled_notification(self, request_id, reason):
    try:
        await self.send_notification(
            ClientNotification(
                CancelledNotification(
                    method="notifications/cancelled",
                    params=CancelledNotificationParams(
                        requestId=request_id, reason=reason
                    ),
                )
            )
        )
    except Exception:
        # Best-effort: if the transport is already gone, nothing to do.
        logger.warning(
            "failed to send cancellation notification for request %s",
            request_id,
        )

The notification must be sent before re-raising — once the cancellation propagates out of send_request, the caller may close the session and the write stream becomes unusable. A small async-shielded wrapper around the send_notification call may be needed to guarantee delivery on the cancellation path; happy to put that into a PR.

Why this matters in practice

Most production MCP servers acquire external resources inside tool handlers — DB connections, HTTP clients, transactions, file locks. Without the cancellation notification, every aborted client call wastes one such resource for the lifetime of the session. We discovered this while building a Databricks-facing MCP server: a connection pool of 10 plus 10 cancelled tool calls = pool exhausted.

A server-side per-tool timeout (asyncio.wait_for inside the tool wrapper) bounds the leak window, but it shouldn't be load-bearing. The client should tell the server when a request is dead.

Environment

  • mcp 1.29.0 (latest at time of writing)
  • Python 3.14.3, macOS 14
  • Same behavior reproduced with streamable_http transport (different transport, identical client cancellation path)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions