Skip to content

fix: [AI-6771] cancel() race condition and idle state on normal loop exit#845

Open
altimate-harness-bot[bot] wants to merge 3 commits into
mainfrom
fix/AI-6771-cancel-race-condition
Open

fix: [AI-6771] cancel() race condition and idle state on normal loop exit#845
altimate-harness-bot[bot] wants to merge 3 commits into
mainfrom
fix/AI-6771-cancel-race-condition

Conversation

@altimate-harness-bot
Copy link
Copy Markdown
Contributor

@altimate-harness-bot altimate-harness-bot Bot commented May 26, 2026

Summary

  • Fix B1: Remove premature SessionStatus.set(idle) from cancel() main path. On abort, the processor catch block publishes session.error then sets idle, preserving correct event ordering. The !match fallback still sets idle directly since no processor is running in that case.
  • Fix B2: Add SessionStatus.set(idle) after the while (true) loop exits normally in loop(), so callers waiting on session.status:idle receive the event on clean completion.

Root Cause

cancel() was calling SessionStatus.set(idle) directly after aborting, racing with the processor's catch block which also sets idle after publishing session.error. This meant session.error could arrive after session.status:idle, causing the VSCode extension to miss errors. Additionally, on normal loop completion there was no explicit idle transition.

Test plan

  • Cancel an in-flight session — verify session.error arrives before session.status:idle in the event stream
  • Complete a session successfully — verify the session transitions to idle and the chat panel becomes interactive again

Requested by @saravmajestic via harness


Summary by cubic

Fixes a cancel() race that could emit idle before error, and ensures sessions transition to idle after a normal loop exit. Adds telemetry for streaming errors to help diagnose failures (AI-6771).

  • Bug Fixes

    • Removed idle transition in cancel() main path; processor catch now emits session.error then awaits SessionStatus.set(idle).
    • Kept idle set in the !match fallback where no processor is running.
    • Set idle after the while loop exits normally.
  • New Features

    • Added telemetry for non-retry streaming errors (context: "streaming") with session_id, error_name, and error_message.

Written for commit fb4159b. Summary will update on new commits. Review in cubic

Telemetry

Folded in from #846:

  • error event with context: "streaming" — fired in SessionProcessor.process() catch block on the non-retry, non-overflow path; carries session_id, error_name, error_message
  • Covers: SSE chunk timeouts after retry exhaustion, provider/auth failures, and unhandled streaming errors

…exit

Fix B1: Remove premature SessionStatus.set(idle) from cancel() main path.
On abort, the processor catch block publishes session.error then sets idle,
preserving correct event ordering. The !match fallback still sets idle
directly since no processor is running in that case.

Fix B2: Add SessionStatus.set(idle) after the while loop exits normally
so callers waiting on session.status:idle receive the event on clean completion.
@github-actions
Copy link
Copy Markdown

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@saravmajestic saravmajestic marked this pull request as ready for review May 26, 2026 06:38
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

saravmajestic added 2 commits May 26, 2026 08:19
Add Telemetry.track({ type: "error", context: "streaming" }) in the
non-retry, non-overflow branch of the processor.ts streaming catch block.

Covers:
- MessageAbortedError (Stop button / dispose)
- UnknownError (SSE chunk timeout after retry exhaustion)
- APIError (provider failures after retry exhaustion)
- AuthError and any other unhandled streaming error

Event fields: session_id, error_name, error_message, context
Copy link
Copy Markdown

@dev-punia-altimate dev-punia-altimate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: skipped

Multi-persona review completed.

1/1 agents completed · 9s · 0 findings (0 critical, 0 high, 0 medium)


Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

@dev-punia-altimate
Copy link
Copy Markdown

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

  • baseline [0.51ms]
  • baseline [0.39ms]
  • baseline [0.09ms]
  • baseline [0.35ms]
  • connection_refused [0.38ms]
  • timeout [0.06ms]
  • permission_denied [0.04ms]
  • parse_error [0.04ms]
  • oom [0.04ms]
  • network_error [0.04ms]
  • auth_failure [0.03ms]
  • rate_limit [0.03ms]
  • internal_error [0.04ms]
  • empty_error [0.02ms]
  • connection_refused [0.04ms]

Next Step

Please address the failing cases above and re-run verification.

cc @app/altimate-harness-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants