Skip to content

feat(worker): add taskworker.task.failed log for failed tasks#635

Open
s-starostin wants to merge 2 commits into
getsentry:mainfrom
s-starostin:feat/taskworker-task-failed-log
Open

feat(worker): add taskworker.task.failed log for failed tasks#635
s-starostin wants to merge 2 commits into
getsentry:mainfrom
s-starostin:feat/taskworker-task-failed-log

Conversation

@s-starostin
Copy link
Copy Markdown

@s-starostin s-starostin commented May 13, 2026

Adds a structured taskworker.task.failed log event when task execution fails in
run_worker.

For non-retriable failure paths, this replaces direct
sentry_sdk.capture_exception(err) calls with:

logger.exception("taskworker.task.failed", extra={...})

inside the existing isolation_scope blocks where they exist. This gives
operators a structured stdout log with the task identity and original exception
details. In environments where the Sentry SDK LoggingIntegration captures
error logs, the same call also produces the corresponding Sentry event.

The log includes:

  • task_id
  • taskname
  • namespace
  • processing_pool
  • exception_type
  • exception_message

Failure paths covered

On the retry-exhausted path, the structured log uses the original task exception
(err) and fires alongside the existing synthetic NoRetriesRemainingError
capture. This keeps retry-exhausted failures visible in worker logs in the same
shape as other task failures, while preserving the existing retry-exhaustion
Sentry grouping/fingerprint.

Existing behavior preserved

  • task_func.report_timeout_errors opt-out continues to silence both the
    Sentry-reporting path and the new structured worker log on the timeout path
  • task_func.silenced_exceptions opt-out continues to silence both surfaces on the general-exception path (non-fexhausted). Retry-exhausted is intentionally unconditional per fix(iswf): Reverts change to reporting logic that silences NoRetriesRemaining exceptions #627.
  • NoRetriesRemainingError synthetic Sentry capture and fingerprint are unchanged
  • Ambient Sentry scope on the general-exception fallback is unchanged; the call
    is not wrapped in a new isolation_scope

Note on retry-exhausted reporting

When Sentry SDK LoggingIntegration is configured to capture error-level logs
(the default), retry-exhausted failures will now produce two Sentry events:

  1. the existing synthetic NoRetriesRemainingError event for retry-exhaustion
    alerting/grouping
  2. an event for the original exception from taskworker.task.failed

These represent different exception objects and have distinct grouping behavior.
The synthetic event preserves the existing retry-exhaustion signal, while the
original-exception event/log makes the underlying failure visible to operators.

Refs #610

@s-starostin s-starostin requested a review from a team as a code owner May 13, 2026 22:09
sentry_sdk.capture_exception(retry_error)
# In this branch, all exceptions should be either
# captured or silenced.
if should_capture_error:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being introduced? This change was recently reverted in #627

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed #627. Restored the unconditional capture and dropped the gate.

Comment on lines +1235 to +1238
When err is silenced AND retries are exhausted, neither the synthetic
NoRetriesRemainingError capture nor the structured taskworker.task.failed
log fires. The silenced_exceptions opt-out covers both surfaces on the
retry-exhausted path.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this test, the behavior for silenced errors shouldn't be changed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, removed.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3782285. Configure here.

# Also emit structured stdout log for retry-exhausted failures.
# Uses the original err, not the synthetic retry_error, so the log
# carries the actual exception that exhausted retries.
_log_task_failed(inflight.activation, err, processing_pool_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-exhausted log not gated by silenced_exceptions

Medium Severity

On the retry-exhausted path, _log_task_failed is called unconditionally without checking should_capture_error. The PR description explicitly states this path is "gated by task_func.silenced_exceptions," but the code doesn't implement that gating. When a silenced exception exhausts retries, the original exception will now appear in structured logs (and potentially as a Sentry event via LoggingIntegration), defeating the purpose of silenced_exceptions on this path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 3782285. Configure here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugbot is right that the PR description is stale, but I think the suggested code change is wrong. Retry-exhausted reporting is intentionally unconditional per #627, so I’ll update the PR description instead of gating _log_task_failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants