Skip to content

Session hangs ~8 minutes after compaction when prompt cache misses — no timeout or feedback #1614

@loganrosen

Description

@loganrosen

Describe the bug

After compaction runs mid-session, the next model API call can hang for ~8 minutes with no user feedback beyond "Thinking…". The CLI appears completely stuck — there's no way to tell if it's still working or frozen.

Root cause from logs: Compaction replaces the conversation history with a summary, which can invalidate the prompt cache. The next API call sends ~80K tokens with cache_read_tokens: 0, and the server takes an extremely long time to respond. There is no client-side timeout.

Log evidence

19:04:12.493Z  CompactionProcessor: Compaction complete - replaced 182 messages
               with summary + 10 new messages, saved ~87835 tokens
19:04:12.602Z  --- Start of group: Sending request to the AI model ---
19:04:12.624Z  Flushed 3 events to session ...
                 *** 8 MINUTES OF COMPLETE SILENCE ***
19:12:08.176Z  [response finally arrives]

Telemetry for the hung call:

{
  "input_tokens": 80135,
  "output_tokens": 88,
  "cache_read_tokens": 0,
  "duration": 475571
}

It's intermittent — cache hit vs miss

Across all session logs I checked, there were 5 compactions total. 4 got cache hits and completed fast. 1 got a cache miss and hung for 8 minutes:

Session Compaction # cache_read_tokens Duration
7312 1 72,529 ✅ 3.7s
7312 2 72,642 ✅ 3.4s
26123 1 72,666 ✅ 3.4s
26123 2 72,693 ✅ 4.3s
26123 3 0 ❌ 475s

The cache miss case is a 100x+ outlier. It was the 3rd compaction in the same session, so it's not just a first-compaction issue — looks like a server-side cache eviction that the client has no protection against.

Why this is a problem

  1. No timeout: An 8-minute model call is never aborted or retried.
  2. No progress feedback: "Thinking…" with a spinner for 8 minutes. No elapsed time shown, no indication of whether the server is responding.
  3. Users will Escape out: Without feedback, most users will assume it's frozen and cancel — losing the agentic turn that was in progress.

Suggested improvements

  • Add a client-side timeout for model API calls (e.g., 2-3 minutes) with automatic retry
  • Show elapsed time on the "Thinking…" indicator so users can make informed decisions
  • Investigate whether compaction output can be structured to preserve the prompt cache prefix

Affected version

0.0.414

Steps to reproduce

  1. Run a long session until compaction triggers multiple times (150+ turns)
  2. After compaction runs, observe the next model call
  3. If the prompt cache is not hit (intermittent — happened on 1 of 5 compactions in my logs), the response takes 5-10 minutes

Environment

  • macOS darwin arm64
  • Node v24.11.1
  • Model: claude-sonnet-4.6

Additional context

Distinct from #1036 (OAuth agent memory retry loops) and #888 (compaction losing context). This is specifically about the model API call itself hanging when the prompt cache misses after compaction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions