Skip to content

feat: [AI-6771] add telemetry for streaming error scenarios#846

Closed
altimate-harness-bot[bot] wants to merge 1 commit into
mainfrom
feat/telemetry-streaming-errors
Closed

feat: [AI-6771] add telemetry for streaming error scenarios#846
altimate-harness-bot[bot] wants to merge 1 commit into
mainfrom
feat/telemetry-streaming-errors

Conversation

@altimate-harness-bot
Copy link
Copy Markdown
Contributor

@altimate-harness-bot altimate-harness-bot Bot commented May 26, 2026

Summary

  • Add Telemetry.track({ type: "error", context: "streaming" }) in the non-retry, non-overflow branch of processor.ts streaming catch block
  • Covers MessageAbortedError (Stop/dispose), UnknownError (SSE chunk timeout after retry exhaustion), APIError (provider failures), AuthError, and any other unhandled streaming error
  • Event fields: session_id, error_name, error_message, context: "streaming"

Context

Previously, streaming errors in the catch block of SessionProcessor.process() were published to Bus.publish(Session.Event.Error) but not tracked as telemetry events. This means SSE chunk timeouts, provider failures after retry exhaustion, and unexpected streaming errors were invisible in dashboards.

The change fires Telemetry.track({ type: "error", ... }) after Bus.publish in the non-retry, non-overflow branch only:

  • Context overflow: tracked separately by context_overflow_recovered
  • Retryable errors: tracked as error_recovered events on each retry
  • Stop/abort: intentional — covered by agent_outcome: aborted at session level
  • This change: the "give up" path — non-retryable errors or retry exhaustion

Test plan

  • TypeScript typecheck passes (bun run --cwd packages/opencode turbo typecheck)
  • Verify event fires: trigger a streaming error (e.g., revoke API key mid-session, network drop) and confirm error event with context: "streaming" appears in App Insights

Requested by @saravmajestic via harness

Jira: AI-6771


Summary by cubic

Add telemetry for unhandled streaming errors in SessionProcessor.process() so we can see failures that aren’t retried or context overflows. Emits Telemetry.track({ type: "error", context: "streaming" }) with session_id, error_name, and error_message after the bus publish, covering SSE timeouts, provider/auth failures, and aborts; addresses AI-6771.

Written for commit 36b2390. Summary will update on new commits. Review in cubic

Add Telemetry.track({ type: "error", context: "streaming" }) in the
non-retry, non-overflow branch of the processor.ts streaming catch block.

Covers:
- MessageAbortedError (Stop button / dispose)
- UnknownError (SSE chunk timeout after retry exhaustion)
- APIError (provider failures after retry exhaustion)
- AuthError and any other unhandled streaming error

Event fields: session_id, error_name, error_message, context
@github-actions
Copy link
Copy Markdown

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@altimate-harness-bot
Copy link
Copy Markdown
Contributor Author

Folded into #845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant