Add CUDA memory management APIs#1524
Conversation
Add the following torch.cuda APIs: - empty_cache() - Release unoccupied cached memory (dotnet#1521) - memory_allocated() - Current GPU memory occupied by tensors - max_memory_allocated() - Peak GPU memory occupied by tensors - reset_peak_memory_stats() - Reset peak memory tracking - memory_reserved() - Current GPU memory managed by caching allocator - max_memory_reserved() - Peak GPU memory managed by caching allocator - mem_get_info() - Free and total memory on device - set_device() - Set current CUDA device - current_device() - Get current CUDA device index These APIs are commonly used in PyTorch workflows for memory management and debugging, and are needed by TorchSharpExamples users. Native implementations use c10::cuda::CUDACachingAllocator with #if defined(USE_CUDA) guards for CPU-only build compatibility. Includes unit tests for all new APIs.
There was a problem hiding this comment.
Pull request overview
This PR adds 9 new CUDA memory management APIs to TorchSharp, addressing issue #1521. These APIs provide essential functionality for monitoring and managing GPU memory usage, which is commonly needed in PyTorch workflows for debugging and optimization.
Changes:
- Adds CUDA memory management APIs: empty_cache, memory_allocated, max_memory_allocated, reset_peak_memory_stats, memory_reserved, max_memory_reserved, mem_get_info, set_device, and current_device
- Implements native C++ bindings with USE_CUDA guards for CPU-only build compatibility
- Includes comprehensive unit tests for all new APIs
- Bumps version from 0.106.0 to 0.106.1
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/TorchSharp/Torch.cs | Adds 9 new public CUDA memory management methods with XML documentation |
| src/TorchSharp/PInvoke/LibTorchSharp.THSTorchCuda.cs | Adds P/Invoke declarations for the new native methods |
| src/Native/LibTorchSharp/THSTorch.h | Declares native function signatures for CUDA memory APIs |
| src/Native/LibTorchSharp/THSTorch.cpp | Implements native functions using c10::cuda APIs with USE_CUDA guards and CPU-only stubs |
| test/TorchSharpTest/TestTorchSharp.cs | Adds comprehensive unit tests for all 9 new APIs |
| build/BranchInfo.props | Bumps patch version from 0.106.0 to 0.106.1 |
| RELEASENOTES.md | Documents the new APIs in release notes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// so that those can be used in other GPU applications and visible in nvidia-smi. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// empty_cache() doesn't increase the amount of GPU memory available for PyTorch. |
There was a problem hiding this comment.
The documentation refers to "PyTorch" but this is TorchSharp. Update this to say "doesn't increase the amount of GPU memory available for TorchSharp" to accurately reflect the library being documented.
| /// empty_cache() doesn't increase the amount of GPU memory available for PyTorch. | |
| /// empty_cache() doesn't increase the amount of GPU memory available for TorchSharp. |
| // Set to device 0 (always valid if CUDA is available) | ||
| cuda.set_device(0); | ||
| Assert.Equal(0, cuda.current_device()); |
There was a problem hiding this comment.
Consider saving the original device at the beginning of the test and restoring it at the end to avoid potential side effects on other tests. While the tests are marked as Sequential and most explicitly set devices, it's good practice to restore the original state. You can use a try-finally block or wrap it in a using statement pattern to ensure cleanup.
| // Set to device 0 (always valid if CUDA is available) | |
| cuda.set_device(0); | |
| Assert.Equal(0, cuda.current_device()); | |
| // Set to device 0 (always valid if CUDA is available) and restore original device afterwards | |
| try { | |
| cuda.set_device(0); | |
| Assert.Equal(0, cuda.current_device()); | |
| } | |
| finally { | |
| cuda.set_device(device); | |
| } |
Use c10::getDeviceAllocator (from c10/core/) instead of c10::cuda::CUDACachingAllocator (from c10/cuda/) in the non-CUDA build path. This allows memory_allocated, memory_reserved, max_memory_allocated, max_memory_reserved, empty_cache, reset_peak_memory_stats, and mem_get_info to return real values when CUDA is available at runtime, even when LibTorchSharp is compiled against the CPU-only libtorch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e context is set Use non-throwing c10::cuda::GetDevice() instead of c10::cuda::current_device() to safely resolve the device index when -1 is passed (default/current device). Falls back to device 0 if no CUDA device context has been established yet. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes
Add the following torch.cuda APIs:
These APIs are commonly used in PyTorch workflows for memory management and debugging, and are needed by TorchSharpExamples users.
Native implementations use c10::cuda::CUDACachingAllocator with #if defined(USE_CUDA) guards for CPU-only build compatibility.
Includes unit tests for all new APIs.