Skip to content

fix(tracer): prevent stale ctx.tracing crash on HTTPS keepalive connections#13232

Open
janiussyafiq wants to merge 2 commits intoapache:masterfrom
janiussyafiq:fix/tracing-https
Open

fix(tracer): prevent stale ctx.tracing crash on HTTPS keepalive connections#13232
janiussyafiq wants to merge 2 commits intoapache:masterfrom
janiussyafiq:fix/tracing-https

Conversation

@janiussyafiq
Copy link
Copy Markdown
Contributor

Description

When apisix.tracing is enabled, the core tracer instruments every request phase — including ssl_client_hello_phase — by allocating a tracing table from lua-tablepool and storing it in ngx.ctx.tracing. On HTTPS keepalive connections, OpenResty reuses the same ngx.ctx object across multiple HTTP requests on the same TLS session.

The bug occurs in the following sequence:

  1. ssl_client_hello_phase calls tracer.start(), which allocates ctx.tracing via tablepool and initialises tracing.spans.
  2. The first HTTP request completes and tracer.release() is called in the log phase, returning the tracing table to the pool. lua-tablepool zeroes all fields on release — tracing.spans becomes nil — but ctx.tracing is never cleared, leaving a stale non-nil pointer to the zeroed table.
  3. On the second HTTP request (same keepalive connection), tracer.start() finds ctx.tracing is non-nil (a zeroed table is still truthy in Lua) and skips re-initialisation.
  4. span.new() then crashes at table.insert(tracing.spans, self) because spans is nil.

This fix addresses the root cause at two layers:

  • tracer.release(): ctx.tracing = nil is now cleared at the very beginning of the function, before any tablepool operations, so a stale pointer is never left in ngx.ctx for the next request to inherit. A if spans then guard is also added to make release safe when called on an already-partially-cleared table.
  • tracer.start(): The initialisation guard is extended from if not tracing then to if not tracing or not tracing.spans then, so tracing is correctly re-initialised even if a stale state is somehow encountered (e.g. diverged HTTP/2 contexts).

Which issue(s) this PR fixes:

Fixes #13200

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change (t/node/tracer.t)
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: apisix 3.16.0 comprehensive tracing breaks with HTTPS keepalive connections

1 participant