Skip to content

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-Jost Andy-Jost commented Feb 3, 2026

Summary

Converts _program.py to Cython (_program.pyx) with full cythonization of NVRTC and NVVM paths, RAII-based resource management, and nogil error handling.

Changes

Cythonization

  • _program.py_program.pyx with cdef class Program
  • New _program.pxd with typed attribute declarations
  • Program_init cythonized with direct cynvrtc/cynvvm calls in nogil blocks
  • Program_compile_nvrtc and Program_compile_nvvm use nogil for C API calls
  • Long methods factored into cdef inline helpers

Resource Handles

  • Add NvrtcProgramHandle and NvvmProgramHandle (std::shared_ptr-based RAII)
  • Add create_nvrtc_program_handle() and create_nvvm_program_handle()
  • Remove _MembersNeededForFinalize / weakref.finalize pattern
  • as_py(NvvmProgramHandle) returns Python int (matches cuda_bindings design)

Error Handling

  • Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log
  • Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log
  • Add NVVMError exception class
  • Simplify HANDLE_RETURN to directly take cydriver.CUresult

Validation & API

  • Add SUPPORTED_TARGETS map for unified backend/target validation
  • Add public Program.driver_can_load_nvrtc_ptx_output() API
  • Consistent error message format for unsupported targets

Code Quality

  • Module docstring, __all__, type alias section added
  • Docstrings updated to use :class: refs and public paths
  • Type annotations added to public methods
  • Union import removed in favor of | syntax
  • Compiler options use std::vector<const char*> instead of malloc/free

Tests

  • Extend test_object_protocols.py with Program and ObjectCode variations
  • Update test_program.py: test close() idempotency instead of handle state
  • Update error message regex for unified format

Test plan

  • All existing test_program.py tests pass
  • All existing test_module.py tests pass
  • test_object_protocols.py passes with new variations
  • pre-commit passes

Closes #1082

@Andy-Jost Andy-Jost added this to the cuda.core beta 12 milestone Feb 3, 2026
@Andy-Jost Andy-Jost added P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Feb 3, 2026
@Andy-Jost Andy-Jost self-assigned this Feb 3, 2026
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Contributor Author

/ok to test 85dbbb5

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

@leofang leofang self-requested a review February 3, 2026 06:02
mod_obj = ObjectCode.from_ptx(ptx, symbol_mapping=sym_map)
assert mod.code == ptx
if not Program._can_load_generated_ptx():
if not Program.driver_can_load_nvrtc_ptx_output():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worthy of discussion. I changed this private API into a public one after refactoring because it seems somewhat useful to an end user. An alternative would be to continue using the private API if the value here is too low.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely not make the API change in this PR, in part so that such a subtle change doesn't get drowned out by the many other changes here, but also to avoid entangled unintended side-effects from the disjoint changes (those tend to be hard to sort out, especially later, after this PR is merged already).

@Andy-Jost Andy-Jost requested review from cpcloud and rwgk February 3, 2026 15:27
- Rename _program.py to _program.pyx
- Convert Program to cdef class with _program.pxd declarations
- Extract _MembersNeededForFinalize to module-level _ProgramMNFF
  (nested classes not allowed in cdef class)
- Add __repr__ method to Program
- Keep ProgramOptions as @DataClass (unchanged)
- Keep weakref.finalize pattern for handle cleanup
- Move _translate_program_options to Program_translate_options (cdef)
- Move _can_load_generated_ptx to Program_can_load_generated_ptx (cdef)
- Remove unused TYPE_CHECKING import block
- Follow _memory/_buffer.pyx helper function patterns
- Reorganize file structure per developer guide (principal class first)
- Add module docstring, __all__, type alias section
- Factor long methods into cdef inline helpers
- Add proper exception specs to cdef functions
- Fix docstrings (use :class: refs, public paths)
- Add type annotations to public methods
- Inline _nvvm_exception_manager (single use)
- Remove Union import, use | syntax
- Add public Program.driver_can_load_nvrtc_ptx_output() API
- Update tests to use new public API

Closes NVIDIA#1082
Add fixtures for different Program backends (NVRTC, PTX, NVVM) and
ObjectCode code types (cubin, PTX, LTOIR). Split API_TYPES into more
precise HASH_TYPES, EQ_TYPES, and WEAKREF_TYPES lists. Derive
DICT_KEY_TYPES and WEAK_KEY_TYPES for collection tests.
- Add NvrtcProgramHandle and NvvmProgramHandle to resource handles module
- Add function pointer initialization for nvrtcDestroyProgram and nvvmDestroyProgram
- Forward-declare nvvmProgram to avoid nvvm.h dependency
- Refactor detail::make_py to accept module name parameter
- Remove _ProgramMNFF class from _program.pyx
- Program now uses typed handles directly with RAII cleanup
- Update handle property to return None when handle is null
- Add NVVMError exception class
- Add HANDLE_RETURN_NVRTC for nogil NVRTC error handling with program log
- Add HANDLE_RETURN_NVVM for nogil NVVM error handling with program log
- Remove vestigial supported_error_type fused type
- Simplify HANDLE_RETURN to directly take cydriver.CUresult
@Andy-Jost
Copy link
Contributor Author

/ok to test 44fd98e

Comment on lines 13 to 17
// Forward declaration for NVVM - avoids nvvm.h dependency
// Use void* to match cuda.bindings.cynvvm's typedef for compatibility
#ifndef CYTHON_EXTERN_C
typedef void *nvvmProgram;
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nvvm is an optional dependency, I opted to define nvvmProgram this way rather than include the header.

- Change cdef function return types from ObjectCode to object (Cython limitation)
- Remove unused imports: intptr_t, NvrtcProgramHandle, NvvmProgramHandle, as_intptr
- Update as_py(NvvmProgramHandle) to return Python int via PyLong_FromSsize_t
- Update test assertions: remove handle checks after close(), test idempotency instead
- Update NVVM error message regex to match new unified format
@Andy-Jost
Copy link
Contributor Author

/ok to test 0f06313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cythonize _program.py

2 participants