This guide is for LLM / SLM developers who want to use CUDA directly through an AIKernel external Capability module.
CUDA is optional in AIKernel.Core. The default .NET and Python installation paths do not require CUDA, LibTorch, or a native bridge. GPU support is supplied by external Capability modules that are explicitly installed and registered by a trusted host.
User host / Tools / Provider package
-> ICapabilityModuleRegistry / ICapabilityModuleInvoker
-> CUDA Capability module
-> AIKernel.Core.Memory abstractions
-> AIKernel.Kernel OS memory mapper
-> Native C ABI bridge
-> CUDA / LibTorch / model runtime
Rules:
- Core owns OS-independent abstractions only.
- Kernel owns OS-specific memory mapping implementations.
- CUDA Capability modules consume Core abstractions and do not reference Kernel.
- Native ABI signatures are stable C ABI boundaries and should not expose CUDA, LibTorch, or C++ types.
- Missing native libraries, model files, or mapper failures must fail closed.
Core provides the memory mapping contract:
IMemoryMapper.Open(path, accessMode) -> Result<IMemoryRegion>IMemoryRegion.PointerIMemoryRegion.LengthIMemoryRegion.InfoIMemoryRegion.Unmap() -> Result<bool>
Kernel provides the default OS implementations:
Win32MemoryMapper/Win32MemoryRegionPosixMemoryMapper/PosixMemoryRegion
Trusted hosts register the default mapper with AddAIKernelKernel(). A CUDA
Capability package may accept IMemoryMapper through dependency injection, use
it to validate and map the model payload, and pass only the resolved path or
stable ABI data to the native bridge. The current LibTorch reference module keeps
the native ABI unchanged and uses the mapper as a fail-closed validation layer.
Capability modules should not call Win32 or POSIX APIs directly. Put OS-specific mapping code in Kernel or in a host-owned mapper implementation.
The CUDA 13.0 reference implementation belongs in the external
AIKernel.Cuda13.0 repository and targets one Windows runtime combination only:
AIKernel.Cuda13.0/
src/
AIKernel.Cuda13.0.Libtorch2.12.win-x64/
native/
tests/
It targets:
- Windows / MSVC /
win-x64 - LibTorch 2.12.0
- CUDA 13.0
- C ABI functions:
load_model,unload_model,forward
Install it only on GPU hosts that explicitly need this runtime.
The reference CUDA package uses split distribution: NuGet.org carries a small
metadata package with managed dependencies, while the full runtime .nupkg is a
GitHub Release asset because it includes LibTorch, CUDA, cuDNN, and the native
bridge.
dotnet nuget add source <folder-containing-full-cuda-nupkg> --name AIKernel-CUDA
dotnet add package AIKernel.Cuda13.0.Libtorch2.12.win-x64 --version 0.1.0Do not use a GitHub Release page URL directly as a NuGet source. Download the
full .nupkg first, then add its containing folder as a local source.
Then register the descriptor and invoker in the trusted host:
services.AddAIKernelKernel();
services.AddSingleton<IMemoryMapper, Win32MemoryMapper>();
services.AddSingleton<ICapabilityModuleInvoker, LibTorchCapabilityInvoker>();
var descriptor = LibTorchCapabilityDescriptor.Create();
await registry.RegisterAsync(descriptor, cancellationToken);The default Core invoker is fail-closed. It registers metadata but does not grant execution permission. Replace it only in trusted hosts.
Keep the public C ABI stable:
int32_t load_model(const char* path);
int32_t unload_model(int32_t handle);
int32_t forward(
int32_t handle,
const int32_t* input_ids,
int32_t length,
ForwardResultNative* out_result);Implementation guidance:
- Use integer handles for sessions.
- Keep sessions in native C++ internals.
- Keep model loading, tensors, and CUDA device selection behind the ABI.
- Use caller-allocated output structs and buffers.
- Return status codes; managed code converts them into fail-closed results.
Host-side CUDA orchestration should be expressed as Result<T> /
ResultStep<TState,TValue> pipelines. This keeps load, route, forward, unload,
and fallback paths observable and deterministic.
var pipeline =
from mapped in memoryMapper.Open(modelPath, MemoryAccessMode.Read)
from loaded in InvokeLoadModel(mapped.Info.Path)
from output in InvokeForward(loaded.ModelHandle, inputIds)
select output;For replayable user-land control flow, use ResultStep:
var run =
from route in ResultStep<string, KernelProviderRoutingDecision>
.Success("cuda-route", cudaDecision)
.Where(static decision => decision.ProviderId == "libtorch.cuda")
from forward in InvokeCudaForwardStep(route)
select forward;Guidelines:
- Use
Bind/SelectManyfor operations that may fail. - Use
Map/Selectonly for pure projections. - Use
Wherefor deterministic reject conditions. - Record model path hashes, native status codes, device metadata, and replay log hashes in metadata.
- Do not throw across Capability boundaries; convert failures into
Result.
AIKernel.Python is part of AIKernel.Core and defaults to CUDA-free
installation:
pip install git+https://github.com/AIKernel-NET/AIKernel.Core.git#subdirectory=pythonInstall GPU-specific Python or native bindings from the matching external CUDA Capability repository when it provides a Python package. If the Capability ships only a full NuGet runtime package, follow that repository's GitHub Release installation instructions instead:
pip install git+https://github.com/AIKernel-NET/<matching-cuda-python-capability>.gitPython exposes the outer API and monad helpers. It does not reimplement OS memory mapping or Kernel internals.
The CUDA module should be maintained outside Core. If you need another CUDA
version, another LibTorch version, another OS/RID, a different model runtime,
or Linux CUDA, fork the CUDA Capability repository and create a new Capability
module. Do not mix multiple native targets into the Windows win-x64 package.
Recommended naming:
AIKernel.Cuda13.0.Libtorch2.12.win-x64
AIKernel.Cuda13.0.Libtorch2.12.win-arm64
AIKernel.Cuda13.0.Libtorch2.12.linux-x64
AIKernel.Cuda12.4.Libtorch2.3.win-x64
AIKernel.ROCm6.Libtorch2.12.linux-x64
AIKernel.DirectML.win-x64
AIKernel.Vulkan.linux-x64
For a new module:
- Keep the C ABI stable or version the ABI explicitly.
- Add a new
CapabilityModuleDescriptorwith a uniqueCapabilityId. - Keep runtime files outside Core and outside default package payloads.
- Consume AIKernel.Core
IMemoryMapper; do not reference Kernel from the Capability module. - Add platform-specific native build files inside the module.
- Add fail-closed tests for missing runtime, invalid handle, invalid model path, and mapper failures.
- Document all required environment variables and runtime search paths.
Linux CUDA support should be implemented in the external CUDA repository or one of its forks after the native Linux server environment is prepared. Do not add Linux include/lib paths to AIKernel.Core.
- CUDA remains opt-in.
- Core and Python default installs work without CUDA.
- Native ABI uses C-compatible types only.
- Caller owns dynamic buffers.
- Capability module does not reference Kernel.
- Memory mapping failures return
Resultfailures. - Replay metadata includes hashes and native status.
- Tests cover fail-closed boundaries.