Context
The Python SDK hit a deadlock in the logs telemetry buffer (Slack thread). The Ruby SDK has a very similar buffer implementation, so we should audit it for the same class of issue.
Problem (from Python)
- The logs buffer acquires a lock when adding a log and when flushing/clearing the buffer
- During a flush (lock held), GC ran and emitted a log → the logging integration tried to add it to the buffer → attempted to acquire the already-held lock → deadlock
- Re-entrant locks didn't help since the GC callback runs on a different thread
What to check in Ruby
- Audit the telemetry buffer lock usage — ensure no code path can trigger a re-entrant lock acquisition (e.g. via callbacks, GC, instrumentation side-effects during flush)
- Minimize critical sections — the lock should only protect fetch/pop/clear/add operations on the buffer data structure. Envelope construction and other side-effects should happen outside the lock
- Consider the .NET approach — their implementation is mostly lock-free (atomic increments/decrements), only locking briefly during flush to extract a copy of the buffer array before releasing
Related
- Python fix by Ivana: moved side-effect-producing work outside the locked section
- Java was checked and looks fine (Alexander)
- .NET uses lock-free atomics, had a separate recursion issue with
Debug=true