feat(tracing): add export-stage OTEL span masking and media detection#1646
feat(tracing): add export-stage OTEL span masking and media detection#1646
Conversation
9092ee3 to
98b2a5e
Compare
98b2a5e to
1415295
Compare
1415295 to
2ed8bcb
Compare
2ed8bcb to
7b67d48
Compare
|
@claude review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6c9aca2e48
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8131633c01
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
langfuse-python/pyproject.toml
Line 9 in 032a6e5
Bumping the project version backwards from 4.6.0b1 to 4.6.0a1 makes this release look older than the previous one under PEP 440 ordering, so environments already on 4.6.0b1 will not upgrade to this build and release automation can mis-handle publication/version checks. This should be a forward version increment to keep upgrade and release behavior correct.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _create_otel_span_data( | ||
| *, | ||
| span: ReadableSpan, | ||
| attributes: Dict[str, AttributeValue], | ||
| identifier: OtelSpanIdentifier, | ||
| ) -> OtelSpanData: | ||
| instrumentation_scope = span.instrumentation_scope | ||
|
|
||
| return OtelSpanData( | ||
| trace_id=identifier.trace_id, | ||
| span_id=identifier.span_id, | ||
| parent_span_id=_get_parent_span_id(span), | ||
| name=span.name, | ||
| instrumentation_scope_name=instrumentation_scope.name | ||
| if instrumentation_scope | ||
| else None, | ||
| instrumentation_scope_version=instrumentation_scope.version | ||
| if instrumentation_scope | ||
| else None, | ||
| attributes=MappingProxyType(dict(attributes)), | ||
| resource_attributes=MappingProxyType(dict(span.resource.attributes or {})), | ||
| ) |
There was a problem hiding this comment.
🟡 _create_otel_span_data only reads span.instrumentation_scope, so a ReadableSpan constructed with the legacy instrumentation_info= parameter (and no instrumentation_scope=) is exposed to mask_otel_spans with instrumentation_scope_name=None / instrumentation_scope_version=None, even though the InstrumentationInfo object on the span carries identical .name/.version. This is asymmetric with _clone_span (line 463), which goes out of its way to preserve the legacy field via getattr(span, '_instrumentation_info', None) — the PR's own test_clone_span_preserves_deprecated_instrumentation_info exercises this exact input shape. Fix is one line: fall back to span.instrumentation_info (or getattr(span, '_instrumentation_info', None)) when instrumentation_scope is None, since both types expose .name and .version.
Extended reasoning...
What the bug is
`_create_otel_span_data` at `langfuse/_client/span_exporter.py:491-512` reads only `span.instrumentation_scope` to populate the `instrumentation_scope_name` and `instrumentation_scope_version` fields on the `OtelSpanData` snapshot that is passed to a user's `mask_otel_spans` callback. The OpenTelemetry SDK's `ReadableSpan` stores `instrumentation_info` (deprecated since 1.11.1, October 2022) and `instrumentation_scope` as two separate fields with no automatic fallback between them. Verified against opentelemetry-sdk: `ReadableSpan.init` writes the constructor args verbatim into `self._instrumentation_info` and `self._instrumentation_scope`, and the `instrumentation_scope` property body is literally `return self._instrumentation_scope` — no fallback to `_instrumentation_info`.
Why existing code does not prevent it
For a `ReadableSpan` constructed with `instrumentation_info=InstrumentationInfo(...)` and no `instrumentation_scope=` argument, `span.instrumentation_scope` returns `None`. The conditional in `_create_otel_span_data` (`instrumentation_scope.name if instrumentation_scope else None`) then collapses both fields on the snapshot to `None`, even though the legacy `InstrumentationInfo` instance on the span has identical `.name` and `.version` attributes that could populate them.
The asymmetry with the same PR's clone path is concrete: `_clone_span` at line 463 explicitly preserves the legacy field via `instrumentation_info=getattr(span, "_instrumentation_info", None)`, and the PR adds `tests/unit/test_mask_otel_spans.py::test_clone_span_preserves_deprecated_instrumentation_info` that constructs a `ReadableSpan(instrumentation_info=InstrumentationInfo('legacy-instrumentation', '1.2.3'))` to exercise this shape. The author treats this input shape as supported on the clone side but inconsistently exposes `None` to the mask callback for the same span.
Step-by-step proof
- A consumer or test constructs `span = ReadableSpan(name='legacy-span', context=ctx, attributes={...}, instrumentation_info=InstrumentationInfo('legacy-instrumentation', '1.2.3'))` — exactly the shape used in the new test.
- `span._instrumentation_info` is the `InstrumentationInfo` object; `span._instrumentation_scope` is `None` (not provided).
- `span.instrumentation_scope` runs the property: `return self._instrumentation_scope` → `None`.
- `_create_otel_span_data` evaluates `instrumentation_scope` → `None`, so both `instrumentation_scope_name` and `instrumentation_scope_version` on the resulting `OtelSpanData` are `None`.
- A user callback that filters by `span_data.instrumentation_scope_name == "openai"` (the example given in the PR's own `mask_otel_spans` docstring) silently misses this span.
Addressing the refutation
The refutation is correct that the standard SDK `Tracer.start_span` (`opentelemetry/sdk/trace/init.py` lines 1253-1257) and `TracerProvider.get_tracer` (lines 1396-1407) always populate both fields, so production flow from any current instrumentation library is unaffected. This is why all three confirming verifiers and I rate this as nit severity. However, the bug is not purely theoretical:
- The PR author explicitly chose to support the legacy input shape in `_clone_span` and added a test that constructs a span with only `instrumentation_info=`, so this is a recognized supported input shape on one side of the export pipeline.
- The fix is mechanical and one line — falling back to `span.instrumentation_info` (or `getattr(span, "_instrumentation_info", None)`) when `instrumentation_scope` is `None` — and aligns the mask path with the clone path.
- Both `InstrumentationInfo` and `InstrumentationScope` expose identical `.name` / `.version` attributes, so the fallback is straightforwardly correct.
Suggested fix
In `_create_otel_span_data`, change:
instrumentation_scope = span.instrumentation_scopeto fall back to the legacy field when scope is missing, e.g.:
instrumentation_scope = (
span.instrumentation_scope
or getattr(span, "_instrumentation_info", None)
)This preserves the existing behavior on modern spans (where both fields are set) while restoring symmetry with `_clone_span` for legacy/hand-constructed spans.
| should_export_span: Optional[Callable[[ReadableSpan], bool]] = None, | ||
| additional_headers: Optional[Dict[str, str]] = None, | ||
| span_exporter: Optional[SpanExporter] = None, | ||
| media_manager: Optional[MediaManager] = None, | ||
| mask_otel_spans: Optional[MaskOtelSpansFunction] = None, | ||
| ): | ||
| self.public_key = public_key | ||
| self.blocked_instrumentation_scopes = ( |
There was a problem hiding this comment.
🟣 Pre-existing bug: User-passed flush_at and flush_interval are silently discarded when their env vars (LANGFUSE_FLUSH_AT/LANGFUSE_FLUSH_INTERVAL) are unset, due to operator precedence on lines 82 and 85-89. The expression flush_at or int(env_flush_at) if env_flush_at is not None else None parses as (flush_at or int(env_flush_at)) if env_flush_at is not None else None because or binds tighter than the ternary, so when the env var is unset the whole expression collapses to None regardless of the user-passed value. The fix is mechanical — wrap the ternary: flush_at = flush_at or (int(env_flush_at) if env_flush_at is not None else None) (same shape for flush_interval). Not introduced by this PR (originated in #1618), but worth folding in since #1646 modifies the same constructor and resource_manager.py now unconditionally wires media_manager into LangfuseSpanProcessor, so every Langfuse client routes through these lines.
Extended reasoning...
What the bug is
In LangfuseSpanProcessor.__init__ at langfuse/_client/span_processor.py:81-89:
env_flush_at = os.environ.get(LANGFUSE_FLUSH_AT, None)
flush_at = flush_at or int(env_flush_at) if env_flush_at is not None else None
env_flush_interval = os.environ.get(LANGFUSE_FLUSH_INTERVAL, None)
flush_interval = (
flush_interval or float(env_flush_interval)
if env_flush_interval is not None
else None
)In Python, the conditional expression (a if c else b) has lower precedence than or. So line 82 parses as:
flush_at = (flush_at or int(env_flush_at)) if env_flush_at is not None else NoneWhen LANGFUSE_FLUSH_AT is unset (the common case), env_flush_at is None, the ternary takes the else branch, and the whole expression evaluates to None — silently throwing away any user-passed flush_at. The same shape applies to flush_interval.
The specific code path that triggers it
User writes Langfuse(flush_at=10, flush_interval=2.0) without setting the env vars. The buggy expressions coerce both to None, then super().__init__ at lines 128-135 passes max_export_batch_size=None and schedule_delay_millis=None to OTel's BatchSpanProcessor, which substitutes its own defaults (512 and 5000ms).
Why existing code does not prevent it
There is no defensive check downstream — BatchSpanProcessor happily accepts None and falls back to defaults. The user gets no warning that their configuration was discarded.
Step-by-step proof
With LANGFUSE_FLUSH_AT unset and user passing flush_at=10:
env_flush_at = os.environ.get("LANGFUSE_FLUSH_AT", None)→env_flush_at = None.- Expression
flush_at or int(env_flush_at) if env_flush_at is not None else Noneparses as(flush_at or int(env_flush_at)) if env_flush_at is not None else None. env_flush_at is not None→False, so the ternary returns theelsebranch:None.flush_at = None. The user's10is silently dropped.super().__init__(..., max_export_batch_size=None, ...)→ OTelBatchSpanProcessorfalls back to its built-in default of512.
Verified empirically by multiple verifiers, including via Python AST: ast.parse(...) confirms the structure is IfExp(test='env_flush_at is not None', body=BoolOp(Or, [flush_at, int(env_flush_at)]), orelse=None).
With flush_at=10 and env_flush_at='5' (env var set): the ternary takes the then branch, flush_at or int('5') → 10 wins (the user value), so the bug only manifests when the env var is unset.
Impact
- Any user that configures
flush_at/flush_intervalprogrammatically without also setting the corresponding env vars gets OTel's defaults instead of their values. The configuration is silently lost — no warning, no error. - Common case: a user wanting more frequent flushes (
flush_interval=2.0) for low-latency observability gets the OTel default of 5 seconds. - Common case: a user wanting smaller batches (
flush_at=10) for memory-constrained environments gets the OTel default of 512.
Pre-existing status
The buggy lines are not in this PR's diff for span_processor.py — they predate this PR (originated in #1618 "feat(client): add custom span exporter support"). However, PR #1646:
- Modifies the same
__init__signature (addsmedia_managerandmask_otel_spansparams). - Adds new wrapping logic just below the buggy lines (the
LangfuseTransformingSpanExporterblock at lines 121-126). resource_manager.pynow unconditionally passesmedia_manager=self._media_managerintoLangfuseSpanProcessor, so every Langfuse client routes through this constructor with the new params.
While the bug is not strictly introduced here, the constructor is being modified in this PR and the fix is one line per expression — a small, mechanical change worth folding in.
How to fix
Wrap the ternary in parentheses so the user-passed value takes precedence:
env_flush_at = os.environ.get(LANGFUSE_FLUSH_AT, None)
flush_at = flush_at or (int(env_flush_at) if env_flush_at is not None else None)
env_flush_interval = os.environ.get(LANGFUSE_FLUSH_INTERVAL, None)
flush_interval = flush_interval or (
float(env_flush_interval) if env_flush_interval is not None else None
)This preserves the intended precedence: user value first, fallback to env, then to None.
Summary
Adds an export-stage transformation layer for spans before they are sent to the downstream OTLP exporter.
mask_otel_spanscallback onLangfuse(...)MaskOtelSpansParams,MaskOtelSpansResult,OtelSpanIdentifier,OtelSpanData, andOtelSpanPatchmask_otel_spansthrough the resource manager into the Langfuse span processormaskbehavior unchangedinline_data/inlineDatapayloadsWhy
Attribute masking and media detection currently happen when Langfuse SDK spans set attributes. That leaves third-party OTEL spans flowing through the Langfuse span processor without equivalent export-stage handling. This PR adds a batch-native export-path hook while preserving existing Langfuse SDK masking behavior.
Behavior
mask_otel_spansruns only after existing export filters accept a span.mask_otel_spans.data:,inline_data,inlineData,media_type,mime_type, ormimeTypeplusdata.mask_otel_spansreceives the whole OTEL export batch asparams.spans, keyed byOtelSpanIdentifier(trace_id, span_id).MaskOtelSpansResult.span_patchesis sparse: omitted spans are exported unchanged.OtelSpanPatchcan delete exact attribute keys and set attribute values; delete runs before set, so set wins.Validation
uv run --frozen ruff check .uv run --frozen mypy langfuse --no-error-summaryuv run --frozen pytest tests/unit/test_mask_otel_spans.py tests/unit/test_media_manager.py -quv run --frozen pytest tests/unit/test_otel.py tests/unit/test_additional_headers_simple.py -quv run --frozen pytest -n auto --dist worksteal tests/unitDisclaimer: Experimental PR review
Greptile Summary
This PR adds an export-stage transformation layer (
LangfuseTransformingSpanExporter) that runs between the OTel batch span processor and the downstream OTLP exporter, enabling media detection and attribute masking on third-party spans that bypass the existing SDK-levelmaskhook.mask_otel_spansas a new publicLangfuse(...)parameter, wired through the resource manager intoLangfuseSpanProcessor, and exposes a batch-native public contract (MaskOtelSpansParams,MaskOtelSpansResult,OtelSpanPatch, etc.) inlangfuse.types.MediaManager._find_and_process_mediawith afail_openflag and adds Geminiinline_data/inlineDatadetection; applies media replacement at export time with a substring prefilter to avoid JSON parsing on non-media strings.resource_manager.pywas moved after the media manager setup somedia_manageris available when the span processor is constructed.Confidence Score: 4/5
Safe to merge; all changes are additive and opt-in, existing mask and media paths are untouched.
The change is well-structured and defensively written, but cloned spans lose their
dropped_attributes_countmetadata — downstream OTLP consumers would see 0 even for spans that originally exceeded the attribute limit.langfuse/_client/span_exporter.py — specifically the _clone_span method and attribute wrapping strategy.
Sequence Diagram
sequenceDiagram participant App as Application participant BSP as BatchSpanProcessor participant LTS as LangfuseTransformingSpanExporter participant MM as MediaManager participant MF as mask_otel_spans callback participant OE as OTLPSpanExporter App->>BSP: span ends (on_end) BSP->>LTS: export(batch of ReadableSpans) loop for each span LTS->>MM: "_find_and_process_media(attributes, fail_open=True)" MM-->>LTS: post-media attribute dict end alt mask_otel_spans configured LTS->>MF: MaskOtelSpansParams(spans) MF-->>LTS: MaskOtelSpansResult or None alt exception or invalid result LTS-->>BSP: SUCCESS (batch dropped) else patch for unknown identifier LTS-->>BSP: SUCCESS (batch dropped) else valid patches loop for each span LTS->>LTS: apply OtelSpanPatch (delete then set) end end end LTS->>LTS: _clone_span(span, patched_attributes) LTS->>OE: export(cloned ReadableSpans) OE-->>LTS: SpanExportResult LTS-->>BSP: SpanExportResultPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "test(tracing): cover export-stage span m..." | Re-trigger Greptile