Skip to content

Add CUDA memory management APIs#1524

Open
alinpahontu2912 wants to merge 4 commits intodotnet:mainfrom
alinpahontu2912:add_cuda_memory_apis
Open

Add CUDA memory management APIs#1524
alinpahontu2912 wants to merge 4 commits intodotnet:mainfrom
alinpahontu2912:add_cuda_memory_apis

Conversation

@alinpahontu2912
Copy link
Member

Fixes
Add the following torch.cuda APIs:

  • empty_cache() - Release unoccupied cached memory (add empty_cache function #1521)
  • memory_allocated() - Current GPU memory occupied by tensors
  • max_memory_allocated() - Peak GPU memory occupied by tensors
  • reset_peak_memory_stats() - Reset peak memory tracking
  • memory_reserved() - Current GPU memory managed by caching allocator
  • max_memory_reserved() - Peak GPU memory managed by caching allocator
  • mem_get_info() - Free and total memory on device
  • set_device() - Set current CUDA device
  • current_device() - Get current CUDA device index

These APIs are commonly used in PyTorch workflows for memory management and debugging, and are needed by TorchSharpExamples users.

Native implementations use c10::cuda::CUDACachingAllocator with #if defined(USE_CUDA) guards for CPU-only build compatibility.

Includes unit tests for all new APIs.

alinpahontu2912 and others added 2 commits February 13, 2026 11:17
Add the following torch.cuda APIs:
- empty_cache() - Release unoccupied cached memory (dotnet#1521)
- memory_allocated() - Current GPU memory occupied by tensors
- max_memory_allocated() - Peak GPU memory occupied by tensors
- reset_peak_memory_stats() - Reset peak memory tracking
- memory_reserved() - Current GPU memory managed by caching allocator
- max_memory_reserved() - Peak GPU memory managed by caching allocator
- mem_get_info() - Free and total memory on device
- set_device() - Set current CUDA device
- current_device() - Get current CUDA device index

These APIs are commonly used in PyTorch workflows for memory
management and debugging, and are needed by TorchSharpExamples users.

Native implementations use c10::cuda::CUDACachingAllocator with
#if defined(USE_CUDA) guards for CPU-only build compatibility.

Includes unit tests for all new APIs.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds 9 new CUDA memory management APIs to TorchSharp, addressing issue #1521. These APIs provide essential functionality for monitoring and managing GPU memory usage, which is commonly needed in PyTorch workflows for debugging and optimization.

Changes:

  • Adds CUDA memory management APIs: empty_cache, memory_allocated, max_memory_allocated, reset_peak_memory_stats, memory_reserved, max_memory_reserved, mem_get_info, set_device, and current_device
  • Implements native C++ bindings with USE_CUDA guards for CPU-only build compatibility
  • Includes comprehensive unit tests for all new APIs
  • Bumps version from 0.106.0 to 0.106.1

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/TorchSharp/Torch.cs Adds 9 new public CUDA memory management methods with XML documentation
src/TorchSharp/PInvoke/LibTorchSharp.THSTorchCuda.cs Adds P/Invoke declarations for the new native methods
src/Native/LibTorchSharp/THSTorch.h Declares native function signatures for CUDA memory APIs
src/Native/LibTorchSharp/THSTorch.cpp Implements native functions using c10::cuda APIs with USE_CUDA guards and CPU-only stubs
test/TorchSharpTest/TestTorchSharp.cs Adds comprehensive unit tests for all 9 new APIs
build/BranchInfo.props Bumps patch version from 0.106.0 to 0.106.1
RELEASENOTES.md Documents the new APIs in release notes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// so that those can be used in other GPU applications and visible in nvidia-smi.
/// </summary>
/// <remarks>
/// empty_cache() doesn't increase the amount of GPU memory available for PyTorch.
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation refers to "PyTorch" but this is TorchSharp. Update this to say "doesn't increase the amount of GPU memory available for TorchSharp" to accurately reflect the library being documented.

Suggested change
/// empty_cache() doesn't increase the amount of GPU memory available for PyTorch.
/// empty_cache() doesn't increase the amount of GPU memory available for TorchSharp.

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +213
// Set to device 0 (always valid if CUDA is available)
cuda.set_device(0);
Assert.Equal(0, cuda.current_device());
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider saving the original device at the beginning of the test and restoring it at the end to avoid potential side effects on other tests. While the tests are marked as Sequential and most explicitly set devices, it's good practice to restore the original state. You can use a try-finally block or wrap it in a using statement pattern to ensure cleanup.

Suggested change
// Set to device 0 (always valid if CUDA is available)
cuda.set_device(0);
Assert.Equal(0, cuda.current_device());
// Set to device 0 (always valid if CUDA is available) and restore original device afterwards
try {
cuda.set_device(0);
Assert.Equal(0, cuda.current_device());
}
finally {
cuda.set_device(device);
}

Copilot uses AI. Check for mistakes.
alinpahontu2912 and others added 2 commits February 16, 2026 16:00
Use c10::getDeviceAllocator (from c10/core/) instead of
c10::cuda::CUDACachingAllocator (from c10/cuda/) in the non-CUDA
build path. This allows memory_allocated, memory_reserved,
max_memory_allocated, max_memory_reserved, empty_cache,
reset_peak_memory_stats, and mem_get_info to return real values
when CUDA is available at runtime, even when LibTorchSharp is
compiled against the CPU-only libtorch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e context is set

Use non-throwing c10::cuda::GetDevice() instead of c10::cuda::current_device()
to safely resolve the device index when -1 is passed (default/current device).
Falls back to device 0 if no CUDA device context has been established yet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant