Skip to content

Refactor graphics interop around RAII-mapped buffers#1701

Open
rparolin wants to merge 8 commits intoNVIDIA:mainfrom
rparolin:rparolin/graphics_feedback_fixes_1
Open

Refactor graphics interop around RAII-mapped buffers#1701
rparolin wants to merge 8 commits intoNVIDIA:mainfrom
rparolin:rparolin/graphics_feedback_fixes_1

Conversation

@rparolin
Copy link
Collaborator

@rparolin rparolin commented Feb 27, 2026

Summary

  • Reworks graphics interop around the existing handle/RAII pattern instead of making GraphicsResource inherit from Buffer
  • Keeps GraphicsResource responsible for registration/unregistration only; map() now returns a mapped Buffer
  • Adds a mapped-graphics device pointer handle whose deleter captures both the GraphicsResourceHandle and StreamHandle and calls cuGraphicsUnmapResources(...) on release
  • Preserves with resource.map(stream=s) as buf: by giving Buffer context-manager support
  • Supports both with GraphicsResource.from_gl_buffer(vbo) as resource: for automatic resource cleanup and with GraphicsResource.from_gl_buffer(vbo, stream=s) as buf: for register-and-map-in-one-step usage
  • Restores default-stream compatibility for map() / unmap() and keeps GraphicsResource.handle as the raw CUgraphicsResource handle

Motivation

Review feedback pointed toward Andy Jost's resource_handles direction rather than turning GraphicsResource into a mutable Buffer.

This version follows the same ownership model used elsewhere in cuda_core: the mapped pointer owns unmap lifetime, and the resource registration owns unregister lifetime. That keeps stream-ordered cleanup in the handle layer, avoids changing GraphicsResource.handle semantics based on mapping state, and removes the need to treat an unmapped resource as a partially valid Buffer.

Key changes

  • Handle-based mapped buffers: map() creates a real Buffer from a new mapped-graphics device pointer handle. The handle deleter captures the graphics resource and mapping stream, and unmaps on release.
  • GraphicsResource stays a resource owner: GraphicsResource no longer inherits from Buffer; it tracks registration state, forwards unmap() / close() to the active mapped buffer, and keeps a strong reference to that buffer so is_mapped, unmap(), and double-map protection remain correct even if callers drop their own Python reference.
  • Context-manager behavior:
    • with resource.map(stream=s) as buf: still works and automatically unmaps on exit.
    • with GraphicsResource.from_gl_buffer(vbo, stream=s) as buf: registers, maps on entry, and unregisters on exit.
    • with GraphicsResource.from_gl_buffer(vbo) as resource: scopes the registration lifetime and closes automatically on exit, avoiding a manual resource.close().
  • Compatibility cleanup: map() / unmap() again accept stream=None and use the default stream, and GraphicsResource.handle once again exposes the raw graphics-resource handle.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 27, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rparolin rparolin changed the title Rparolin/graphics feedback fixes 1 Make GraphicsResource inherit from Buffer Feb 27, 2026
@rparolin rparolin requested a review from leofang February 27, 2026 19:48
@rparolin rparolin self-assigned this Feb 27, 2026
@rparolin rparolin added the enhancement Any code-related improvements label Feb 27, 2026
@rparolin rparolin added this to the cuda.core v0.7.0 milestone Feb 27, 2026
@rparolin rparolin marked this pull request as ready for review February 27, 2026 19:49
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Feb 27, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rparolin
Copy link
Collaborator Author

/ok to test

@rparolin rparolin closed this Feb 27, 2026
@rparolin rparolin reopened this Feb 27, 2026
@github-actions
Copy link

@leofang leofang added P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Feb 28, 2026
@rparolin
Copy link
Collaborator Author

rparolin commented Mar 2, 2026

/ok to test

@rparolin rparolin enabled auto-merge (squash) March 3, 2026 16:11
Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor here — the GraphicsResource(Buffer) direction looks good overall.

Requesting changes for two behavior issues called out inline:

  1. close() currently unmaps on stream 0 even when mapping happened on a non-default stream, which can break stream-ordering guarantees.
  2. __enter__ returns self even when unmapped, allowing context-manager usage where handle/size are invalid.

Please address these and I’m happy to re-review.

@cpcloud
Copy link
Contributor

cpcloud commented Mar 4, 2026

ping @rparolin. happy to fix this up if you want.

@leofang
Copy link
Member

leofang commented Mar 4, 2026

Thanks for the refactor here — the GraphicsResource(Buffer) direction looks good overall.

Yeah and thinking about it more, I suspect this is a must:

  • If GL buffer registration is context-independent: This PR makes the UX nicer
  • If GL buffer registration is locked to the current CUDA context: We need to track the underlying device/context, so we either inherit from Buffer (this PR) or add Buffer.from_gl_buffer() constructor (and keep GraphicsResource internal).
    • From the viewpoint of staying consistent across cuda.core, the latter might be worth considering too, but one challenge is that Buffer today does not have the map/unmap concept.
    • I would be happy for you @rparolin @Andy-Jost @cpcloud to brainstorm and I can review later 🙂

I will ask internally if it's context-independent or not, but it's not a blocker.

@Andy-Jost
Copy link
Contributor

Andy-Jost commented Mar 4, 2026

Thanks for the discussion here. I'd like to propose an alternative approach that I think addresses both of Phillip's concerns and Leo's observation about Buffer not having map/unmap.

Proposal: Use the resource_handles module instead of modifying Buffer

We already have a C++ handle layer (cuda/core/_cpp/resource_handles.hpp) that provides RAII-based lifetime management with std::shared_ptr custom deleters. This pattern already solves the exact problems raised here:

  1. Stream capture: DevicePtrHandle already captures the allocation stream in its deleter and uses it for cuMemFreeAsync. We can do the same for graphics resources - capture the map stream and use it for unmap.

  2. Dependency tracking: Handles can hold references to other handles, preventing out-of-order destruction. A mapped buffer handle can hold a reference to the GraphicsResourceHandle, ensuring the graphics resource outlives the mapped pointer.

  3. Arbitrary release actions: The shared_ptr deleter can chain actions - e.g., unmap then unregister.

Concrete design:

  • GraphicsResource.map(stream) returns a Buffer (or just the existing DevicePtrHandle under the hood)
  • The returned handle's deleter captures:
    • The GraphicsResourceHandle (keeps it alive)
    • The map StreamHandle (for stream-ordered unmap)
  • When the mapped buffer handle is released, the deleter calls cuGraphicsUnmapResources on the captured stream
  • GraphicsResource itself doesn't need to track _mapped state - the existence of mapped buffer handles implies mapped state

Benefits:

  • Buffer stays unchanged (no map/unmap concept needed)
  • Stream ordering is correct by construction (cpcloud's concern 1)
  • No invalid __enter__ state possible (cpcloud's concern 2) - you either have a valid mapped handle or you don't
  • Dependency tracking prevents use-after-free
  • Consistent with how we handle DeviceMemoryResource allocations

This is the same pattern we use for memory pool allocations, where the pointer handle captures the pool handle and stream for proper cleanup ordering.

Thoughts?

@cpcloud
Copy link
Contributor

cpcloud commented Mar 6, 2026

+1 to @Andy-Jost's design. Looks solid.

@leofang
Copy link
Member

leofang commented Mar 7, 2026

FYI

I will ask internally if it's context-independent or not

it is context-dependent.

Proposal: Use the resource_handles module instead of modifying Buffer

Modifying Buffer in what way? Sorry it's not clear to me.

Can we sketch a bit how the user code would look like with this proposal? The changes in this PR were motivated by the sketch in here: #1608 (comment). Rob asked the right question there. Inheritance is probably not necessary, as long as the same UX can be delivered.

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GraphicsResource(Buffer) direction still looks good overall, but I can’t approve this yet because two blocking behavior issues remain in the cleanup/context-manager paths called out inline: close() still ignores the mapping stream, and __enter__() still allows unmapped context-manager usage. I also reran pixi run pytest tests/test_graphics.py -q locally; it passes, but the current test coverage does not catch either case, so please add targeted regressions when you fix them.

@cpcloud
Copy link
Contributor

cpcloud commented Mar 16, 2026

@rparolin Do you want to fire off an agent to implement @Andy-Jost's design? Probably could make pretty short work of it.

@rparolin rparolin changed the title Make GraphicsResource inherit from Buffer Refactor graphics interop around RAII-mapped buffers Mar 16, 2026
@rparolin rparolin requested a review from cpcloud March 16, 2026 23:57
@rparolin
Copy link
Collaborator Author

rparolin commented Mar 16, 2026

@cpcloud made the requested changes and aligned with @Andy-Jost recommendations. Please re-review. Thanks!

Copy link
Contributor

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the latest graphics follow-up changes. The earlier context-manager and stream-ordering concerns look addressed, and passes in my environment.

@rparolin
Copy link
Collaborator Author

Re-reviewed the latest graphics follow-up changes. The earlier context-manager and stream-ordering concerns look addressed, and passes in my environment.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants