Conversation
Reduces the amount of compiler warnings significantly.
|
This looks great, thanks for working on this! To be merged, it'd of course need docs and tests. For lack of GPUs, I don't have usable CI for PyCUDA on Github, but I do have that on a Gitlab instance I run. Mind if I create a user account for you there? cc @kaushikcfd |
|
Hi, thanks for the feedback! Yes, this PR was meant primarily to pitch the idea and get some early feedback :) And access to already usable CI would be great! |
|
Made an account for you, you should have that info in your email. The site is at https://gitlab.tiker.net/inducer/pycuda. |
|
I did some experiments and tests with this and it seems to work without any errors so far. What would be the next steps to bring this to a future release? |
|
It's clear that this should happen, ideally soon. As it happens, there are now two (draft) versions of this, one here: https://gitlab.tiker.net/kaushikcfd/pycuda/-/merge_requests/2/diffs and the other one in this PR. (They got started independently.) @mitkotak, could you comment on your plans with respect to upstreaming your work? |
|
Thanks for your interest in this PR. Right now my estimate is to merge this feature into |
|
Hi there, any updates on the cuda graph feature? |
Thank you very much for the interest ! We are still testing the PR to make sure that we don't break any existing functionality but if you are curious to learn more then you can try it out using |
|
Hi @mitkotak, very much looking forward for this feature! Any idea, when the PR could be ready? |
Hi there!
I wanted to experiment with CUDA Graphs a bit to get a feel for the performance differences between blocking, async and graph execution.
See:
However, while most required functionality is available (async, specifying stream, etc), pycuda does not have Graph support yet.
This PR adds some initial support to launch a kernel pipeline using a CUgraph.
I'd love your comments and feedback, most likely I am not freeing memory correctly etc, let me know!
All in all everything seems to be working enough to be useful already :)
Nice bonus is CUDA Graph API offers a function to output dot files, see picture below and the demo in
examples/demo_graph.py.Note that the demo launches the kernel only once.
Due to overhead, benefits of the Graph API should only really start showing when launching kernels repeatedly.