cuda.core: harden graph user-object destructor during Python shutdown#2074
cuda.core: harden graph user-object destructor during Python shutdown#2074aryanputta wants to merge 3 commits into
Conversation
Signed-off-by: Aryan <aryansputta@gmail.com>
| cdef void _py_host_destructor(void* data) noexcept with gil: | ||
| _py_decref(data) | ||
| cdef void _py_host_destructor(void* data) noexcept nogil: | ||
| if Py_IsInitialized(): |
There was a problem hiding this comment.
Mostly generated with Cursor GPT-5.4 Extra High Fast, after a few rounds of prompting and having it dig into the cpython sources (git clone).
__
I think this should also gate on finalization, not just initialization.
Something along these lines seems like the best fit here:
cdef extern from *:
"""
#include <Python.h>
static inline int _py_runtime_is_finalizing(void) {
#if PY_VERSION_HEX >= 0x030D0000
return Py_IsFinalizing();
#else
return _Py_IsFinalizing(); /* CPython-private, available in 3.10-3.12 */
#endif
}
"""
int _py_runtime_is_finalizing() nogil
cdef void _py_host_destructor(void* data) noexcept nogil:
if data == NULL:
return
# Leak once shutdown has started or completed.
if not Py_IsInitialized():
return
if _py_runtime_is_finalizing():
return
with gil:
_py_decref(data)I inspected the CPython sources before suggesting this. The key detail is that during shutdown CPython marks the runtime as finalizing before it clears the initialized flag:
Python/pylifecycle.c lines 2077-2081:
/* Remaining daemon threads will be trapped in PyThread_hang_thread
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
_PyInterpreterState_SetFinalizing(tstate->interp, tstate);
_PyRuntimeState_SetFinalizing(runtime, tstate);
runtime->initialized = 0;
In other words, there is a real source-level window where Py_IsInitialized() is still true even though attempting to enter Python is already unsafe. In the cpython source tree (git tag: v3.14.4) this ordering is not only visible in Python/pylifecycle.c, but the docs in Doc/c-api/interp-lifecycle.rst / Doc/c-api/threads.rst also explicitly describe GIL/thread-state attachment during finalization as unsafe.
That means the current proposed change:
cdef void _py_host_destructor(void* data) noexcept nogil:
if Py_IsInitialized():
with gil:
_py_decref(data)still leaves a shutdown race: the destructor can observe Py_IsInitialized() == 1, then finalization can begin, and then with gil can run into CPython's shutdown path.
Why I think the version above is the best we can do:
- It preserves the desired behavior during normal operation: we still decref instead of leaking.
- It matches the stated shutdown policy: once finalization has started (or Python is already gone), we intentionally leak rather than trying to force a decref through a fragile runtime state.
- It closes the specific
initialized == true but runtime is already finalizinghole that the current patch still has. - For
3.10+, it is implementable today:Py_IsFinalizing()is public in3.13+, and CPython exposes_Py_IsFinalizing()in3.10-3.12.
Remaining risk: this is still only best effort, not a proof of safety. There is still an unavoidable race between the finalization check and with gil. If finalization starts in that tiny window, CPython may still take the shutdown path when the thread tries to re-enter Python. From the CPython sources, that outcome is version-dependent: older versions can terminate the thread, while newer versions can hang it. I do not think we can eliminate that residual risk here without either (1) deferring cleanup onto a known Python-owned execution context, or (2) leaking more aggressively before finalization. Considering both of those as out of scope, this looks like the narrowest and most defensible compromise.
There was a problem hiding this comment.
Got it, that makes sense. I was only gating on Py_IsInitialized(), but I see the issue now: there is a shutdown window where Python is still initialized while the runtime is already finalizing, so entering the GIL can still be unsafe. I’ll update the destructor to also check finalization and intentionally leak once shutdown has started. Thanks for the quick responce!
Summary
Closes #2042.
_py_host_destructorunconditionally enteredwith gilbefore callingPy_DECREF. That is fine during normal runtime, but it is unsafe in thegraph user-object path because CUDA may invoke the destructor
asynchronously after interpreter finalization has begun.
This change makes
_py_host_destructornogil, checksPy_IsInitialized(), and only enters a GIL section when Python is stillinitialized.
Changes
cuda_core/cuda/core/graph/_utils.pyx: declarePy_IsInitialized()andharden
_py_host_destructorso it only acquires the GIL when Python isstill initialized.
cuda_core/cuda/core/graph/_utils.pxd: update the destructor signatureto match.
The normal runtime path is unchanged: if Python is still alive, the
destructor still decrefs the attached object. The change only affects the
shutdown edge where acquiring the GIL is no longer safe.
Validation
Locally verified:
git diff --checkpasses._utils.pyx/_utils.pxd.Not fully verified on this machine:
cuda.coretest suitecuda_coreRelated Work
user-object lifetime handling was extended to kernel arguments.