Review the thread safety of the SDK to make sure there's no potential concurrency issues, particularly around state maintained in the global API object, clients, and evaluation context objects.
See here for a similar discussion in the Java SDK and others.
Background
Python's GIL is sometimes mistaken for making threading a non-issue (like Node's single-threaded model), but they're not the same. The GIL only makes individual bytecode operations atomic; compound operations like check-then-act or read-modify-write are not safe. Providers commonly run background threads for connection management and emit events from them, so these races aren't just theoretical.
Audit
Here's what we found. The only locks in the SDK today are in _event_support.py (and even those have some gaps). Everything else is unprotected.
ProviderRegistry singleton (provider/_registry.py) - this is the big one. No locks at all. set_provider() and set_default_provider() have check-then-act patterns with side effects (initialize/shutdown), so concurrent calls can double-shutdown, lose providers, or initialize providers that never get registered. clear_providers() has a window where shut-down providers are still returned to callers.
- Global
_hooks list (hook/__init__.py) - add_hooks does _hooks = _hooks + hooks which is a read-concat-rebind; two concurrent calls can lose one set of hooks
api.clear_providers() (api.py) - calls provider_registry.clear_providers() then _event_support.clear() as two separate steps; event handlers still exist but providers are gone in between
_event_support.py - has RLock (good), but immediate handler execution in add_*_handler runs outside the lock, and clear() acquires two separate locks non-atomically
- Global
_evaluation_context (evaluation_context/__init__.py) - no lock; TOCTOU across the evaluation pipeline (context can change between read and use)
- Global transaction context propagator (
transaction_context/__init__.py) - no lock; read-then-call-method races with propagator swap
- Client instance state (
client.py) - self.hooks has the same lost-update pattern as global hooks; _assert_provider_status reads self.provider again instead of using the reference captured earlier in the evaluation flow, so the status check can be for a different provider than the one actually used
AbstractProvider._on_emit (provider/__init__.py) - emit() does if hasattr(self, "_on_emit"): self._on_emit(...) which is a TOCTOU; detach() during shutdown can delete _on_emit while a background thread is between the check and the call
Definition of done
Review the thread safety of the SDK to make sure there's no potential concurrency issues, particularly around state maintained in the global API object, clients, and evaluation context objects.
See here for a similar discussion in the Java SDK and others.
Background
Python's GIL is sometimes mistaken for making threading a non-issue (like Node's single-threaded model), but they're not the same. The GIL only makes individual bytecode operations atomic; compound operations like check-then-act or read-modify-write are not safe. Providers commonly run background threads for connection management and emit events from them, so these races aren't just theoretical.
Audit
Here's what we found. The only locks in the SDK today are in
_event_support.py(and even those have some gaps). Everything else is unprotected.ProviderRegistrysingleton (provider/_registry.py) - this is the big one. No locks at all.set_provider()andset_default_provider()have check-then-act patterns with side effects (initialize/shutdown), so concurrent calls can double-shutdown, lose providers, or initialize providers that never get registered.clear_providers()has a window where shut-down providers are still returned to callers._hookslist (hook/__init__.py) -add_hooksdoes_hooks = _hooks + hookswhich is a read-concat-rebind; two concurrent calls can lose one set of hooksapi.clear_providers()(api.py) - callsprovider_registry.clear_providers()then_event_support.clear()as two separate steps; event handlers still exist but providers are gone in between_event_support.py- has RLock (good), but immediate handler execution inadd_*_handlerruns outside the lock, andclear()acquires two separate locks non-atomically_evaluation_context(evaluation_context/__init__.py) - no lock; TOCTOU across the evaluation pipeline (context can change between read and use)transaction_context/__init__.py) - no lock; read-then-call-method races with propagator swapclient.py) -self.hookshas the same lost-update pattern as global hooks;_assert_provider_statusreadsself.provideragain instead of using the reference captured earlier in the evaluation flow, so the status check can be for a different provider than the one actually usedAbstractProvider._on_emit(provider/__init__.py) -emit()doesif hasattr(self, "_on_emit"): self._on_emit(...)which is a TOCTOU;detach()during shutdown can delete_on_emitwhile a background thread is between the check and the callDefinition of done