Skip to content

Suppress spurious "Failed to detach context" log in OTel interceptor#1596

Open
brianstrauch wants to merge 1 commit into
mainfrom
fix-otel-detach-context-log
Open

Suppress spurious "Failed to detach context" log in OTel interceptor#1596
brianstrauch wants to merge 1 commit into
mainfrom
fix-otel-detach-context-log

Conversation

@brianstrauch

@brianstrauch brianstrauch commented Jun 12, 2026

Copy link
Copy Markdown
Member

What

Flaky test observed here: https://github.com/temporalio/sdk-python/actions/runs/27432588766/job/81086412642?pr=1378

TracingWorkflowInboundInterceptor attaches an OTel context at the start of execute_workflow/handle_query and detaches it in the finally. The workflow event loop runs portions of the workflow inside contextvars.copy_context().run(...) (e.g. wait_condition), so that finally can execute in a copied contextvars.Context. A copy preserves the OTel context value — so the existing context is get_current() guard passes — but invalidates the attach token, so opentelemetry.context.detach fails internally and logs "Failed to detach context".

Because LogCapturer attaches a handler to the process-global opentelemetry.context logger, that stray ERROR log bled into test_opentelemetry_safe_detach whenever a tracing workflow tore down during its capture window (same xdist worker process), making the test flaky. In production it's recurring ERROR-level noise on affected workflow completions.

Fix

Route the best-effort detach through a small _safe_detach helper that drops only the "Failed to detach context" record via a scoped logging.Filter.

It still calls opentelemetry.context.detach, so attach/detach call counts stay balanced — the leak invariant enforced by test_opentelemetry_context_restored_after_activity. (Alternatives that avoid the public detach — restoring via attach(previous), or resetting the runtime contextvar directly — both break that balance test, so suppressing at the logger is the only approach that satisfies balanced calls + no spurious log + no test weakening.)

Caveat

During a detach the filter is attached to the process-global OTel context logger, so it would also drop an unrelated "Failed to detach context" emitted by other code in that brief window. Given the alternatives break a correctness invariant, this is the right trade-off.

Testing

  • tests/contrib/opentelemetry/ and tests/contrib/openai_agents/test_openai_tracing.py pass together (27/27), including the attach/detach balance test.
  • Verified directly that the cross-context detach logs "Failed to detach context" via raw detach but produces no record through _safe_detach.
  • Lint clean (ruff, format, pyright, mypy, basedpyright, pydocstyle).

The tracing workflow interceptor attaches an OTel context at the start of
execute_workflow/handle_query and detaches it in the finally. The workflow
event loop runs portions of the workflow inside
contextvars.copy_context().run(...), so the finally can execute in a copied
contextvars.Context. A copy preserves the OTel context value (so the
existing `context is get_current()` guard passes) but invalidates the attach
token, so opentelemetry.context.detach logs "Failed to detach context".

Because LogCapturer attaches to the process-global opentelemetry.context
logger, that stray log also bled into test_opentelemetry_safe_detach when a
tracing workflow tore down during its capture window, making it flaky.

Route the best-effort detach through _safe_detach, which drops only that one
log record via a scoped logging filter. It still calls
opentelemetry.context.detach so attach/detach calls stay balanced, satisfying
the leak invariant enforced by test_opentelemetry_context_restored_after_activity.
@brianstrauch brianstrauch requested a review from a team as a code owner June 12, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant