feat(tracing): add export-stage OTEL span masking and media detection by hassiebp · Pull Request #1646 · langfuse/langfuse-python

hassiebp · 2026-05-07T12:44:55Z

Summary

Adds an export-stage transformation layer for spans before they are sent to the downstream OTLP exporter.

adds the public mask_otel_spans callback on Langfuse(...)
adds the batch-shaped public contract: MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanIdentifier, OtelSpanData, and OtelSpanPatch
wires mask_otel_spans through the resource manager into the Langfuse span processor
runs export-stage media detection over exported span attributes before masking
prefilters serialized string attributes before JSON parsing so JSON-looking strings without supported media hints are left unchanged cheaply
applies sparse keyed span patches to post-media OTEL span attribute snapshots before export
keeps existing synchronous mask behavior unchanged
hardens media detection for Gemini inline_data / inlineData payloads

Why

Attribute masking and media detection currently happen when Langfuse SDK spans set attributes. That leaves third-party OTEL spans flowing through the Langfuse span processor without equivalent export-stage handling. This PR adds a batch-native export-path hook while preserving existing Langfuse SDK masking behavior.

Behavior

mask_otel_spans runs only after existing export filters accept a span.
Media detection runs before mask_otel_spans.
Direct base64 data URI string attributes are processed without JSON parsing.
JSON-looking string attributes are parsed only if they contain supported media-shape hints such as data:, inline_data, inlineData, media_type, mime_type, or mimeType plus data.
mask_otel_spans receives the whole OTEL export batch as params.spans, keyed by OtelSpanIdentifier(trace_id, span_id).
MaskOtelSpansResult.span_patches is sparse: omitted spans are exported unchanged.
OtelSpanPatch can delete exact attribute keys and set attribute values; delete runs before set, so set wins.
Callback exceptions or invalid outer batch results drop the whole export batch.
Patches for unknown span identifiers drop the whole export batch.
Invalid patch objects for known spans drop only those spans.
Invalid returned set values delete only that affected attribute and log the value type, not the value.
Span events and links are not transformed.

Validation

uv run --frozen ruff check .
uv run --frozen mypy langfuse --no-error-summary
uv run --frozen pytest tests/unit/test_mask_otel_spans.py tests/unit/test_media_manager.py -q
uv run --frozen pytest tests/unit/test_otel.py tests/unit/test_additional_headers_simple.py -q
uv run --frozen pytest -n auto --dist worksteal tests/unit

Disclaimer: Experimental PR review

Greptile Summary

This PR adds an export-stage transformation layer (LangfuseTransformingSpanExporter) that runs between the OTel batch span processor and the downstream OTLP exporter, enabling media detection and attribute masking on third-party spans that bypass the existing SDK-level mask hook.

Introduces mask_otel_spans as a new public Langfuse(...) parameter, wired through the resource manager into LangfuseSpanProcessor, and exposes a batch-native public contract (MaskOtelSpansParams, MaskOtelSpansResult, OtelSpanPatch, etc.) in langfuse.types.
Extends MediaManager._find_and_process_media with a fail_open flag and adds Gemini inline_data/inlineData detection; applies media replacement at export time with a substring prefilter to avoid JSON parsing on non-media strings.
The OTEL tracer initialization in resource_manager.py was moved after the media manager setup so media_manager is available when the span processor is constructed.

Confidence Score: 4/5

Safe to merge; all changes are additive and opt-in, existing mask and media paths are untouched.

The change is well-structured and defensively written, but cloned spans lose their dropped_attributes_count metadata — downstream OTLP consumers would see 0 even for spans that originally exceeded the attribute limit.

langfuse/_client/span_exporter.py — specifically the _clone_span method and attribute wrapping strategy.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant BSP as BatchSpanProcessor
    participant LTS as LangfuseTransformingSpanExporter
    participant MM as MediaManager
    participant MF as mask_otel_spans callback
    participant OE as OTLPSpanExporter

    App->>BSP: span ends (on_end)
    BSP->>LTS: export(batch of ReadableSpans)

    loop for each span
        LTS->>MM: "_find_and_process_media(attributes, fail_open=True)"
        MM-->>LTS: post-media attribute dict
    end

    alt mask_otel_spans configured
        LTS->>MF: MaskOtelSpansParams(spans)
        MF-->>LTS: MaskOtelSpansResult or None

        alt exception or invalid result
            LTS-->>BSP: SUCCESS (batch dropped)
        else patch for unknown identifier
            LTS-->>BSP: SUCCESS (batch dropped)
        else valid patches
            loop for each span
                LTS->>LTS: apply OtelSpanPatch (delete then set)
            end
        end
    end

    LTS->>LTS: _clone_span(span, patched_attributes)
    LTS->>OE: export(cloned ReadableSpans)
    OE-->>LTS: SpanExportResult
    LTS-->>BSP: SpanExportResult

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/_client/span_exporter.py:579-596
`_clone_span` converts `span.attributes` (a `BoundedAttributes`) to a plain `dict`. The OTel SDK's `ReadableSpan.dropped_attributes` property guards with `isinstance(self._attributes, BoundedAttributes)` and returns `0` for any other type, so the cloned span will always export `dropped_attributes_count=0` even when the original span had attributes silently dropped by the bounds limit. This loses span fidelity in the OTLP payload for any span that hit the attribute limit.

```suggestion
    @staticmethod
    def _clone_span(
        *, span: ReadableSpan, attributes: Dict[str, AttributeValue]
    ) -> ReadableSpan:
        return ReadableSpan(
            name=span.name,
            context=span.context,
            parent=span.parent,
            resource=span.resource,
            attributes=BoundedAttributes(
                maxlen=None,
                attributes=attributes,
                immutable=True,
                max_value_len=None,
            ),
            events=span.events,
            links=span.links,
            kind=span.kind,
            status=span.status,
            start_time=span.start_time,
            end_time=span.end_time,
            instrumentation_scope=span.instrumentation_scope,
        )
```

### Issue 2 of 2
langfuse/_client/span_exporter.py:661-669
**Prefilter false-positive on `data:` substrings in URLs** — `_may_contain_serialized_media` triggers on any string containing the literal `data:` (e.g., `"https://api.example.com/data:types"`), causing unnecessary `json.loads` parsing before the string is returned unchanged. No data is lost, but large attribute strings with `data:` in a URL incur a redundant parse on every export.

_{Reviews (1): Last reviewed commit: "test(tracing): cover export-stage span m..." | Re-trigger Greptile}

github-actions · 2026-05-07T15:47:18Z

@claude review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c9aca2e48

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8131633c01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

langfuse-python/pyproject.toml

Line 9 in 032a6e5

requires-python = ">=3.10,<4.0"

Restore monotonic package versioning

Bumping the project version backwards from 4.6.0b1 to 4.6.0a1 makes this release look older than the previous one under PEP 440 ordering, so environments already on 4.6.0b1 will not upgrade to this build and release automation can mis-handle publication/version checks. This should be a forward version increment to keep upgrade and release behavior correct.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

claude · 2026-05-08T12:07:57Z

+def _create_otel_span_data(
+    *,
+    span: ReadableSpan,
+    attributes: Dict[str, AttributeValue],
+    identifier: OtelSpanIdentifier,
+) -> OtelSpanData:
+    instrumentation_scope = span.instrumentation_scope
+
+    return OtelSpanData(
+        trace_id=identifier.trace_id,
+        span_id=identifier.span_id,
+        parent_span_id=_get_parent_span_id(span),
+        name=span.name,
+        instrumentation_scope_name=instrumentation_scope.name
+        if instrumentation_scope
+        else None,
+        instrumentation_scope_version=instrumentation_scope.version
+        if instrumentation_scope
+        else None,
+        attributes=MappingProxyType(dict(attributes)),
+        resource_attributes=MappingProxyType(dict(span.resource.attributes or {})),
+    )


🟡 _create_otel_span_data only reads span.instrumentation_scope, so a ReadableSpan constructed with the legacy instrumentation_info= parameter (and no instrumentation_scope=) is exposed to mask_otel_spans with instrumentation_scope_name=None / instrumentation_scope_version=None, even though the InstrumentationInfo object on the span carries identical .name/.version. This is asymmetric with _clone_span (line 463), which goes out of its way to preserve the legacy field via getattr(span, '_instrumentation_info', None) — the PR's own test_clone_span_preserves_deprecated_instrumentation_info exercises this exact input shape. Fix is one line: fall back to span.instrumentation_info (or getattr(span, '_instrumentation_info', None)) when instrumentation_scope is None, since both types expose .name and .version.

Extended reasoning...

What the bug is

`_create_otel_span_data` at `langfuse/_client/span_exporter.py:491-512` reads only `span.instrumentation_scope` to populate the `instrumentation_scope_name` and `instrumentation_scope_version` fields on the `OtelSpanData` snapshot that is passed to a user's `mask_otel_spans` callback. The OpenTelemetry SDK's `ReadableSpan` stores `instrumentation_info` (deprecated since 1.11.1, October 2022) and `instrumentation_scope` as two separate fields with no automatic fallback between them. Verified against opentelemetry-sdk: `ReadableSpan.init` writes the constructor args verbatim into `self._instrumentation_info` and `self._instrumentation_scope`, and the `instrumentation_scope` property body is literally `return self._instrumentation_scope` — no fallback to `_instrumentation_info`.

Why existing code does not prevent it

For a `ReadableSpan` constructed with `instrumentation_info=InstrumentationInfo(...)` and no `instrumentation_scope=` argument, `span.instrumentation_scope` returns `None`. The conditional in `_create_otel_span_data` (`instrumentation_scope.name if instrumentation_scope else None`) then collapses both fields on the snapshot to `None`, even though the legacy `InstrumentationInfo` instance on the span has identical `.name` and `.version` attributes that could populate them.

The asymmetry with the same PR's clone path is concrete: `_clone_span` at line 463 explicitly preserves the legacy field via `instrumentation_info=getattr(span, "_instrumentation_info", None)`, and the PR adds `tests/unit/test_mask_otel_spans.py::test_clone_span_preserves_deprecated_instrumentation_info` that constructs a `ReadableSpan(instrumentation_info=InstrumentationInfo('legacy-instrumentation', '1.2.3'))` to exercise this shape. The author treats this input shape as supported on the clone side but inconsistently exposes `None` to the mask callback for the same span.

Step-by-step proof

A consumer or test constructs `span = ReadableSpan(name='legacy-span', context=ctx, attributes={...}, instrumentation_info=InstrumentationInfo('legacy-instrumentation', '1.2.3'))` — exactly the shape used in the new test.

`span._instrumentation_info` is the `InstrumentationInfo` object; `span._instrumentation_scope` is `None` (not provided).

`span.instrumentation_scope` runs the property: `return self._instrumentation_scope` → `None`.

`_create_otel_span_data` evaluates `instrumentation_scope` → `None`, so both `instrumentation_scope_name` and `instrumentation_scope_version` on the resulting `OtelSpanData` are `None`.

A user callback that filters by `span_data.instrumentation_scope_name == "openai"` (the example given in the PR's own `mask_otel_spans` docstring) silently misses this span.

Addressing the refutation

The refutation is correct that the standard SDK `Tracer.start_span` (`opentelemetry/sdk/trace/init.py` lines 1253-1257) and `TracerProvider.get_tracer` (lines 1396-1407) always populate both fields, so production flow from any current instrumentation library is unaffected. This is why all three confirming verifiers and I rate this as nit severity. However, the bug is not purely theoretical:

The PR author explicitly chose to support the legacy input shape in `_clone_span` and added a test that constructs a span with only `instrumentation_info=`, so this is a recognized supported input shape on one side of the export pipeline.

The fix is mechanical and one line — falling back to `span.instrumentation_info` (or `getattr(span, "_instrumentation_info", None)`) when `instrumentation_scope` is `None` — and aligns the mask path with the clone path.

Both `InstrumentationInfo` and `InstrumentationScope` expose identical `.name` / `.version` attributes, so the fallback is straightforwardly correct.

Suggested fix

In `_create_otel_span_data`, change:

instrumentation_scope = span.instrumentation_scope

to fall back to the legacy field when scope is missing, e.g.:

instrumentation_scope = ( span.instrumentation_scope or getattr(span, "_instrumentation_info", None) )

This preserves the existing behavior on modern spans (where both fields are set) while restoring symmetry with `_clone_span` for legacy/hand-constructed spans.

claude · 2026-05-08T12:07:57Z

        should_export_span: Optional[Callable[[ReadableSpan], bool]] = None,
        additional_headers: Optional[Dict[str, str]] = None,
        span_exporter: Optional[SpanExporter] = None,
+        media_manager: Optional[MediaManager] = None,
+        mask_otel_spans: Optional[MaskOtelSpansFunction] = None,
    ):
        self.public_key = public_key
        self.blocked_instrumentation_scopes = (


🟣 Pre-existing bug: User-passed flush_at and flush_interval are silently discarded when their env vars (LANGFUSE_FLUSH_AT/LANGFUSE_FLUSH_INTERVAL) are unset, due to operator precedence on lines 82 and 85-89. The expression flush_at or int(env_flush_at) if env_flush_at is not None else None parses as (flush_at or int(env_flush_at)) if env_flush_at is not None else None because or binds tighter than the ternary, so when the env var is unset the whole expression collapses to None regardless of the user-passed value. The fix is mechanical — wrap the ternary: flush_at = flush_at or (int(env_flush_at) if env_flush_at is not None else None) (same shape for flush_interval). Not introduced by this PR (originated in #1618), but worth folding in since #1646 modifies the same constructor and resource_manager.py now unconditionally wires media_manager into LangfuseSpanProcessor, so every Langfuse client routes through these lines.

Extended reasoning...

What the bug is

In LangfuseSpanProcessor.__init__ at langfuse/_client/span_processor.py:81-89:

env_flush_at = os.environ.get(LANGFUSE_FLUSH_AT, None) flush_at = flush_at or int(env_flush_at) if env_flush_at is not None else None env_flush_interval = os.environ.get(LANGFUSE_FLUSH_INTERVAL, None) flush_interval = ( flush_interval or float(env_flush_interval) if env_flush_interval is not None else None )

In Python, the conditional expression (a if c else b) has lower precedence than or. So line 82 parses as:

flush_at = (flush_at or int(env_flush_at)) if env_flush_at is not None else None

When LANGFUSE_FLUSH_AT is unset (the common case), env_flush_at is None, the ternary takes the else branch, and the whole expression evaluates to None — silently throwing away any user-passed flush_at. The same shape applies to flush_interval.

The specific code path that triggers it

User writes Langfuse(flush_at=10, flush_interval=2.0) without setting the env vars. The buggy expressions coerce both to None, then super().__init__ at lines 128-135 passes max_export_batch_size=None and schedule_delay_millis=None to OTel's BatchSpanProcessor, which substitutes its own defaults (512 and 5000ms).

Why existing code does not prevent it

There is no defensive check downstream — BatchSpanProcessor happily accepts None and falls back to defaults. The user gets no warning that their configuration was discarded.

Step-by-step proof

With LANGFUSE_FLUSH_AT unset and user passing flush_at=10:

env_flush_at = os.environ.get("LANGFUSE_FLUSH_AT", None) → env_flush_at = None.

Expression flush_at or int(env_flush_at) if env_flush_at is not None else None parses as (flush_at or int(env_flush_at)) if env_flush_at is not None else None.

env_flush_at is not None → False, so the ternary returns the else branch: None.

flush_at = None. The user's 10 is silently dropped.

super().__init__(..., max_export_batch_size=None, ...) → OTel BatchSpanProcessor falls back to its built-in default of 512.

Verified empirically by multiple verifiers, including via Python AST: ast.parse(...) confirms the structure is IfExp(test='env_flush_at is not None', body=BoolOp(Or, [flush_at, int(env_flush_at)]), orelse=None).

With flush_at=10 and env_flush_at='5' (env var set): the ternary takes the then branch, flush_at or int('5') → 10 wins (the user value), so the bug only manifests when the env var is unset.

Impact

Any user that configures flush_at / flush_interval programmatically without also setting the corresponding env vars gets OTel's defaults instead of their values. The configuration is silently lost — no warning, no error.

Common case: a user wanting more frequent flushes (flush_interval=2.0) for low-latency observability gets the OTel default of 5 seconds.

Common case: a user wanting smaller batches (flush_at=10) for memory-constrained environments gets the OTel default of 512.

Pre-existing status

The buggy lines are not in this PR's diff for span_processor.py — they predate this PR (originated in #1618 "feat(client): add custom span exporter support"). However, PR #1646:

Modifies the same __init__ signature (adds media_manager and mask_otel_spans params).

Adds new wrapping logic just below the buggy lines (the LangfuseTransformingSpanExporter block at lines 121-126).

resource_manager.py now unconditionally passes media_manager=self._media_manager into LangfuseSpanProcessor, so every Langfuse client routes through this constructor with the new params.

While the bug is not strictly introduced here, the constructor is being modified in this PR and the fix is one line per expression — a small, mechanical change worth folding in.

How to fix

Wrap the ternary in parentheses so the user-passed value takes precedence:

env_flush_at = os.environ.get(LANGFUSE_FLUSH_AT, None) flush_at = flush_at or (int(env_flush_at) if env_flush_at is not None else None) env_flush_interval = os.environ.get(LANGFUSE_FLUSH_INTERVAL, None) flush_interval = flush_interval or ( float(env_flush_interval) if env_flush_interval is not None else None )

This preserves the intended precedence: user value first, fallback to env, then to None.

hassiebp force-pushed the codex/export-stage-span-mask branch from 9092ee3 to 98b2a5e Compare May 7, 2026 12:45

hassiebp changed the title ~~[codex] feat(tracing): add export-stage span mask~~ feat(tracing): add export-stage span mask May 7, 2026