Skip to content

Add explicit CUDA graph construction API (GraphDef, GraphNode)#1772

Open
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Andy-Jost:explicit-graph-api
Open

Add explicit CUDA graph construction API (GraphDef, GraphNode)#1772
Andy-Jost wants to merge 1 commit intoNVIDIA:mainfrom
Andy-Jost:explicit-graph-api

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Mar 16, 2026

Summary

  • Introduces GraphDef and GraphNode types for explicit CUDA graph construction, complementing the existing stream-capture path via GraphBuilder.
  • Adds a full node hierarchy (kernel, memset, memcpy, alloc, free, child graph, event record/wait, host callback, and conditional nodes) with a fluent builder API.
  • Extracts graph instantiation logic into a shared helper so both GraphBuilder.complete() and GraphDef.instantiate() support GraphCompleteOptions, and removes the pre-CUDA 12.0 cuGraphInstantiateWithFlags fallback.

Depends on #1762 (merged).

Closes #1317.

Changes

  • New files: _graphdef.pxd / _graphdef.pyx containing GraphDef, GraphNode, and all node subclasses
  • _graph/__init__.py: Extracted _instantiate_graph() helper; updated exports for new types
  • resource_handles.cpp/.hpp: Added graph and graph-node RAII handles, HandleRegistry for reverse handle lookup
  • _resource_handles.pxd/.pyx: Cython declarations for graph handle types
  • _memory/_buffer.pyx: Graph memory allocation support
  • _utils/cuda_utils.pxd/.pyx: Utility additions for graph construction
  • Tests: Comprehensive test suite across 4 new test files (unit, error, integration, lifetime) plus object protocol tests; instantiation and execution tests parametrized over GraphCompleteOptions variants

Test plan

  • Existing graph capture tests pass (no regressions from instantiation refactor)
  • New explicit graph tests pass: test_explicit.py, test_explicit_errors.py, test_explicit_integration.py, test_explicit_lifetime.py
  • Object protocol tests pass: test_object_protocols.py

@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 16, 2026
@Andy-Jost Andy-Jost added feature New feature or request cuda.core Everything related to the cuda.core module labels Mar 16, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 16, 2026
@Andy-Jost Andy-Jost added feature New feature or request cuda.core Everything related to the cuda.core module labels Mar 16, 2026
@Andy-Jost
Copy link
Contributor Author

/ok to test

@github-actions
Copy link

@Andy-Jost Andy-Jost force-pushed the explicit-graph-api branch from f07dba3 to 6cb61af Compare March 16, 2026 20:26
@Andy-Jost Andy-Jost changed the title Add explicit CUDA graph construction API (GraphDef, Node) Add explicit CUDA graph construction API (GraphDef, GraphNode) Mar 16, 2026
@Andy-Jost
Copy link
Contributor Author

/ok to test

@Andy-Jost Andy-Jost force-pushed the explicit-graph-api branch from 6cb61af to 49efb6b Compare March 16, 2026 20:29
@Andy-Jost
Copy link
Contributor Author

/ok to test

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit graph model looks promising, but I found a few correctness/API issues that make me uncomfortable approving this as-is. The biggest ones are entry-node identity, GraphDef() failing before CUDA init in a fresh process, and the ctypes host-callback lifetime. I left inline comments with concrete fixes.

@Andy-Jost Andy-Jost force-pushed the explicit-graph-api branch 2 times, most recently from 86f7179 to 310b994 Compare March 16, 2026 20:56
Introduces GraphDef and GraphNode types for explicit CUDA graph
construction, with a full node hierarchy, shared instantiation
helper with GraphCompleteOptions support, and comprehensive tests.

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the explicit-graph-api branch from 310b994 to b965385 Compare March 16, 2026 21:01
@Andy-Jost
Copy link
Contributor Author

/ok to test

@Andy-Jost
Copy link
Contributor Author

@cpcloud The latest upload fixes the issues you pointed out, except the CUDA init problem, which needs clarification.

@cpcloud
Copy link
Contributor

cpcloud commented Mar 16, 2026

@Andy-Jost Since your PR is not a draft you shouldn't need ok-to-test anymore.

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the latest head. The earlier GraphNode identity and callback lifetime issues are fixed, and I do not have any remaining blocking review items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA graph phase N - explicit graph construction

2 participants