fix(span): clear dangling root_span aliases on free (fixes baggage SIGSEGV)#3943
fix(span): clear dangling root_span aliases on free (fixes baggage SIGSEGV)#3943jdani-airalo wants to merge 1 commit into
Conversation
|
Hey @jdani-airalo. the big question is - how does that become NULL. This is an invalid state and probably direct access to I was unable to pin down a possible path to get this to crash. Do have any reproducer? Any information your AI could extract from the core dump / source code to reconstruct a path to getting this crashing? This PR is a band-aid, not a proper fix :-( |
8991876 to
a155215
Compare
|
Dug into the lifetime and I think I have the actual root cause — it's not invalid baggage, it's a use-after-free of the parent root span.
Baggage is just the trigger, which is what makes it look like an "invalid state":
I've replaced the guard with a fix at the free site: when a root span is freed, scrub any stack still aliasing it so the weak pointer can't outlive the object. I haven't been able to compile/run it locally, so it'll need CI + your build. If useful I can also share (a) a candidate ASAN reproducer — fork a child stack so it captures |
Fixes a SIGSEGV in ddtrace_inherit_span_properties that surfaced in production (PHP 8.3 NTS, fpm-fcgi) as a GC_ADDREF on a NULL/freed pointer while copying a parent span's baggage. Root cause: a span stack keeps a NON-owning pointer to its root span (ddtrace_init_span_stack copies it without taking a reference). The parent-stack reference chain normally keeps the root span alive while any stack aliases it, but out-of-order root span drops break that invariant: ddtrace_span_alter_root_span_config() (runtime config change) and the rejection branch of ddtrace_drop_span() free the root span while only NULLing the *current* stack's pointer, leaving sibling/descendant stacks dangling. The next cross-stack inherit (ddtrace_set_root_span_properties -> ddtrace_inherit_span_properties) then dereferences the freed span. It crashes on the baggage copy specifically: with baggage populated, property_baggage points at a heap zend_array freed with the span, so the ZVAL_COPY_DEREF -> GC_ADDREF dereferences freed memory. With empty baggage it stays the immutable ZVAL_EMPTY_ARRAY default whose pointer survives the freed read -- which is why disabling baggage extraction masked the bug. Fix: when a root span object is actually freed, scrub every span stack that still aliases it so the weak pointer can never outlive the object, regardless of which drop path freed it. NULL is already the "no root span" sentinel handled throughout (e.g. ddtrace_set_root_span_properties guards on parent_root), so this introduces no new state. The scan runs once per root span free (~once per trace), mirroring ddtrace_mark_all_span_stacks_flushable. This supersedes the previous defensive guard in ddtrace_inherit_span_properties (reverted here): the malformed (refcounted-flagged, NULL/freed counted pointer) zval can no longer arise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
a155215 to
49cc658
Compare
Root cause. A span stack holds a non-owning pointer to its root span (
ddtrace_init_span_stackcopies it without taking a reference,ext/span.c). The parent-stack reference chain normally keeps the root span alive while any stack aliases it, but out-of-order root span drops break that invariant:ddtrace_span_alter_root_span_config()(runtime config change) and the rejection branch ofddtrace_drop_span()free the root span while only NULLing the current stack's pointer, leaving sibling/descendant stacks dangling. The next cross-stack inherit (ddtrace_set_root_span_properties→ddtrace_inherit_span_properties) then dereferences the freed span.It faults on the baggage copy specifically: with baggage populated,
property_baggagepoints at a heapzend_arrayfreed with the span, soZVAL_COPY_DEREF→GC_ADDREFdereferences freed memory (addl $0x1,(%rdx)with%rdx=0). With empty baggage it stays the immutableZVAL_EMPTY_ARRAYdefault whose pointer survives the freed read — which is why disabling baggage extraction masked it, and why\DDTrace\active_span()->baggagein normal user code does not crash (the root span is alive there).Fix. When a root span object is freed, scrub every span stack still aliasing it so the weak pointer can never outlive the object, regardless of which drop path freed it.
NULLis already the "no root span" sentinel handled throughout (e.g.ddtrace_set_root_span_propertiesguards onparent_root), so this introduces no new state. The scan runs ~once per trace, mirroringddtrace_mark_all_span_stacks_flushable.This supersedes the previous defensive guard in
ddtrace_inherit_span_properties(reverted here): the malformed (refcounted-flagged, NULL/freed counted pointer) zval can no longer arise.