Skip to content

fix: ensure agent loop continues after Claude timeout or API errors#165

Open
LeeCampbell wants to merge 1 commit intoHdrHistogram:mainfrom
LeeCampbell:fix/agent-loop-resilience
Open

fix: ensure agent loop continues after Claude timeout or API errors#165
LeeCampbell wants to merge 1 commit intoHdrHistogram:mainfrom
LeeCampbell:fix/agent-loop-resilience

Conversation

@LeeCampbell
Copy link
Copy Markdown
Collaborator

Summary

  • Agent loop would halt when Claude exited with a non-zero, non-124 code, preventing the state machine from advancing to create-pr after a successful execute-tasks where sync_state had already committed and pushed
  • run_claude now returns its exit code directly; callers use || true so sync_state always runs
  • entrypoint.sh logs non-zero exits as warnings and continues iterating instead of breaking

Root cause

Issue #141's container run completed all implementation work and benchmarks, but a sub-agent API timeout cascaded: CLAUDE_RC was set to non-zero, exit $CLAUDE_RC returned that to entrypoint.sh, which broke the loop before the next iteration could reach the create-pr state (which is pure shell — no Claude needed).

Test plan

  • Run ./scripts/run.sh with a short TIMEOUT_SECONDS to trigger a Claude timeout
  • Verify the loop continues to the next state after timeout
  • Verify create-pr state still executes after a timed-out execute-tasks

🤖 Generated with Claude Code

The agent loop would stop iterating when Claude exited with a non-zero,
non-124 exit code. This prevented the state machine from advancing to
the create-pr state after a successful execute-tasks phase where Claude
timed out but sync_state had already committed and pushed the work.

- Remove global CLAUDE_RC; run_claude now returns its exit code directly
- Add || true to all run_claude call sites so sync_state always runs
- Log Claude exit codes as warnings rather than swallowing them
- entrypoint.sh no longer breaks the loop on non-zero exit codes,
  letting the state machine advance to the next phase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant