Skip to content

fix: allow CLI to flush session transcript before terminating subprocess#614

Open
garythung wants to merge 1 commit intoanthropics:mainfrom
garythung:main
Open

fix: allow CLI to flush session transcript before terminating subprocess#614
garythung wants to merge 1 commit intoanthropics:mainfrom
garythung:main

Conversation

@garythung
Copy link

Summary

SubprocessCLITransport.close() sends SIGTERM immediately after closing stdin, killing the CLI subprocess before it can flush the session transcript to disk. This breaks session resume (--resume <session_id>) because the .jsonl file only contains a single dequeue operation instead of the full conversation history.

close() calls self._process.terminate() unconditionally (when the process is still running). In --input-format stream-json mode, the CLI keeps running after sending ResultMessage, waiting for more input on stdin. So by the time close() fires, the process is always still alive, and terminate() always fires. There's no timing luck involved.

Problem

When ClaudeSDKClient exits, the transport's close() method:

  1. Closes stdin
  2. Immediately calls self._process.terminate() (SIGTERM)
  3. Waits for the (already killed) process

The CLI subprocess detects stdin EOF and begins its shutdown sequence — writing user messages, assistant messages, and tool use entries to ~/.claude/projects/<encoded-cwd>/<session-id>.jsonl — but SIGTERM arrives before the write completes.

The resulting session file contains only:

{"type":"queue-operation","operation":"dequeue"}

Any subsequent --resume call finds no conversation data and exits with code 1.

A secondary effect is that stderr output is also lost: the stderr reader task gets cancelled before it can collect output, so all CLI errors surface as "Check stderr output for details" with no actual details.

Solution

Wait for the CLI to exit gracefully after stdin EOF before falling back to SIGTERM. A 10-second timeout prevents hanging if the process doesn't exit on its own.

Before

if self._process.returncode is None:
    with suppress(ProcessLookupError):
        self._process.terminate()
        with suppress(Exception):
            await self._process.wait()

After

if self._process.returncode is None:
    try:
        with anyio.fail_after(10):
            await self._process.wait()
    except TimeoutError:
        with suppress(ProcessLookupError):
            self._process.terminate()
            with suppress(Exception):
                await self._process.wait()

Test Plan

  • Updated test_connect_close to simulate graceful exit and assert terminate() is not called
  • All 160 existing tests pass
  • Manual verification: run two back-to-back SDK queries using resume=<session_id> from the first query’s result — the second query should succeed and the session .jsonl file should contain the full conversation transcript

SubprocessCLITransport.close() was immediately sending SIGTERM after
closing stdin, not giving the CLI subprocess time to flush the session
transcript to disk. This caused session resume to fail with "No
conversation found" because the .jsonl file was nearly empty.

Now waits up to 10 seconds for the process to exit on its own after
stdin EOF before falling back to SIGTERM.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant