Skip to content

cuda.core: harden graph user-object destructor during Python shutdown#2074

Open
aryanputta wants to merge 3 commits into
NVIDIA:mainfrom
aryanputta:fix-graph-destructor-shutdown
Open

cuda.core: harden graph user-object destructor during Python shutdown#2074
aryanputta wants to merge 3 commits into
NVIDIA:mainfrom
aryanputta:fix-graph-destructor-shutdown

Conversation

@aryanputta
Copy link
Copy Markdown

@aryanputta aryanputta commented May 12, 2026

Summary

Closes #2042.

_py_host_destructor unconditionally entered with gil before calling
Py_DECREF. That is fine during normal runtime, but it is unsafe in the
graph user-object path because CUDA may invoke the destructor
asynchronously after interpreter finalization has begun.

This change makes _py_host_destructor nogil, checks
Py_IsInitialized(), and only enters a GIL section when Python is still
initialized.

Changes

  • cuda_core/cuda/core/graph/_utils.pyx: declare Py_IsInitialized() and
    harden _py_host_destructor so it only acquires the GIL when Python is
    still initialized.
  • cuda_core/cuda/core/graph/_utils.pxd: update the destructor signature
    to match.

The normal runtime path is unchanged: if Python is still alive, the
destructor still decrefs the attached object. The change only affects the
shutdown edge where acquiring the GIL is no longer safe.

Validation

Locally verified:

  • git diff --check passes.
  • The change is limited to the existing graph user-object destructor path in
    _utils.pyx / _utils.pxd.
  • The commit is DCO-signed and SSH-signed.

Not fully verified on this machine:

  • full cuda.core test suite
  • editable build of cuda_core

Related Work

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label May 12, 2026
Comment thread cuda_core/cuda/core/graph/_utils.pyx Outdated
cdef void _py_host_destructor(void* data) noexcept with gil:
_py_decref(data)
cdef void _py_host_destructor(void* data) noexcept nogil:
if Py_IsInitialized():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly generated with Cursor GPT-5.4 Extra High Fast, after a few rounds of prompting and having it dig into the cpython sources (git clone).

__

I think this should also gate on finalization, not just initialization.

Something along these lines seems like the best fit here:

cdef extern from *:
    """
    #include <Python.h>

    static inline int _py_runtime_is_finalizing(void) {
    #if PY_VERSION_HEX >= 0x030D0000
        return Py_IsFinalizing();
    #else
        return _Py_IsFinalizing();  /* CPython-private, available in 3.10-3.12 */
    #endif
    }
    """
    int _py_runtime_is_finalizing() nogil

cdef void _py_host_destructor(void* data) noexcept nogil:
    if data == NULL:
        return

    # Leak once shutdown has started or completed.
    if not Py_IsInitialized():
        return
    if _py_runtime_is_finalizing():
        return

    with gil:
        _py_decref(data)

I inspected the CPython sources before suggesting this. The key detail is that during shutdown CPython marks the runtime as finalizing before it clears the initialized flag:

Python/pylifecycle.c lines 2077-2081:

    /* Remaining daemon threads will be trapped in PyThread_hang_thread
       when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
    _PyInterpreterState_SetFinalizing(tstate->interp, tstate);
    _PyRuntimeState_SetFinalizing(runtime, tstate);
    runtime->initialized = 0;

In other words, there is a real source-level window where Py_IsInitialized() is still true even though attempting to enter Python is already unsafe. In the cpython source tree (git tag: v3.14.4) this ordering is not only visible in Python/pylifecycle.c, but the docs in Doc/c-api/interp-lifecycle.rst / Doc/c-api/threads.rst also explicitly describe GIL/thread-state attachment during finalization as unsafe.

That means the current proposed change:

cdef void _py_host_destructor(void* data) noexcept nogil:
    if Py_IsInitialized():
        with gil:
            _py_decref(data)

still leaves a shutdown race: the destructor can observe Py_IsInitialized() == 1, then finalization can begin, and then with gil can run into CPython's shutdown path.

Why I think the version above is the best we can do:

  • It preserves the desired behavior during normal operation: we still decref instead of leaking.
  • It matches the stated shutdown policy: once finalization has started (or Python is already gone), we intentionally leak rather than trying to force a decref through a fragile runtime state.
  • It closes the specific initialized == true but runtime is already finalizing hole that the current patch still has.
  • For 3.10+, it is implementable today: Py_IsFinalizing() is public in 3.13+, and CPython exposes _Py_IsFinalizing() in 3.10-3.12.

Remaining risk: this is still only best effort, not a proof of safety. There is still an unavoidable race between the finalization check and with gil. If finalization starts in that tiny window, CPython may still take the shutdown path when the thread tries to re-enter Python. From the CPython sources, that outcome is version-dependent: older versions can terminate the thread, while newer versions can hang it. I do not think we can eliminate that residual risk here without either (1) deferring cleanup onto a known Python-owned execution context, or (2) leaking more aggressively before finalization. Considering both of those as out of scope, this looks like the narrowest and most defensible compromise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense. I was only gating on Py_IsInitialized(), but I see the issue now: there is a shutdown window where Python is still initialized while the runtime is already finalizing, so entering the GIL can still be unsafe. I’ll update the destructor to also check finalization and intentionally leak once shutdown has started. Thanks for the quick responce!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden _py_host_destructor against invocation after Py_Finalize

2 participants