-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
After compaction runs mid-session, the next model API call can hang for ~8 minutes with no user feedback beyond "Thinking…". The CLI appears completely stuck — there's no way to tell if it's still working or frozen.
Root cause from logs: Compaction replaces the conversation history with a summary, which can invalidate the prompt cache. The next API call sends ~80K tokens with cache_read_tokens: 0, and the server takes an extremely long time to respond. There is no client-side timeout.
Log evidence
19:04:12.493Z CompactionProcessor: Compaction complete - replaced 182 messages
with summary + 10 new messages, saved ~87835 tokens
19:04:12.602Z --- Start of group: Sending request to the AI model ---
19:04:12.624Z Flushed 3 events to session ...
*** 8 MINUTES OF COMPLETE SILENCE ***
19:12:08.176Z [response finally arrives]
Telemetry for the hung call:
{
"input_tokens": 80135,
"output_tokens": 88,
"cache_read_tokens": 0,
"duration": 475571
}It's intermittent — cache hit vs miss
Across all session logs I checked, there were 5 compactions total. 4 got cache hits and completed fast. 1 got a cache miss and hung for 8 minutes:
| Session | Compaction # | cache_read_tokens | Duration |
|---|---|---|---|
| 7312 | 1 | 72,529 ✅ | 3.7s |
| 7312 | 2 | 72,642 ✅ | 3.4s |
| 26123 | 1 | 72,666 ✅ | 3.4s |
| 26123 | 2 | 72,693 ✅ | 4.3s |
| 26123 | 3 | 0 ❌ | 475s |
The cache miss case is a 100x+ outlier. It was the 3rd compaction in the same session, so it's not just a first-compaction issue — looks like a server-side cache eviction that the client has no protection against.
Why this is a problem
- No timeout: An 8-minute model call is never aborted or retried.
- No progress feedback: "Thinking…" with a spinner for 8 minutes. No elapsed time shown, no indication of whether the server is responding.
- Users will Escape out: Without feedback, most users will assume it's frozen and cancel — losing the agentic turn that was in progress.
Suggested improvements
- Add a client-side timeout for model API calls (e.g., 2-3 minutes) with automatic retry
- Show elapsed time on the "Thinking…" indicator so users can make informed decisions
- Investigate whether compaction output can be structured to preserve the prompt cache prefix
Affected version
0.0.414
Steps to reproduce
- Run a long session until compaction triggers multiple times (150+ turns)
- After compaction runs, observe the next model call
- If the prompt cache is not hit (intermittent — happened on 1 of 5 compactions in my logs), the response takes 5-10 minutes
Environment
- macOS darwin arm64
- Node v24.11.1
- Model: claude-sonnet-4.6
Additional context
Distinct from #1036 (OAuth agent memory retry loops) and #888 (compaction losing context). This is specifically about the model API call itself hanging when the prompt cache misses after compaction.