Skip to content

Expose NUMA topology in gVisor #12876

@Nitin-Maddi

Description

@Nitin-Maddi

Description

gVisor does not currently expose /sys/devices/system/node/ in its virtual sysfs implementation (pkg/sentry/fsimpl/sys/). On real Linux, this directory describes NUMA topology: how many nodes exist, which CPUs and memory belong to each, and inter-node distances.

This is a problem for GPU workloads on NUMA-aware platforms. Specifically, NVIDIA's CUDA driver reads /sys/devices/system/node/ during cuCtxCreate() to discover the number of NUMA nodes and pre-allocate per-node bitmaps. On platforms where UVM reports numaEnabled=true (e.g., Grace Hopper / GH200, where the ARM CPU and GPU are connected via NVLink-C2C and form distinct NUMA nodes), the driver later indexes into these bitmaps during UVM_REGISTER_GPU. If the directory was missing and no bitmaps were allocated, this results in a NULL pointer dereference (SIGSEGV at addr=0x0) inside libcuda.so.

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

Add a nodeDir() function in pkg/sentry/fsimpl/sys/sys.go that constructs a virtual /sys/devices/system/node/ directory tree modeling a single NUMA node (node0) containing all CPUs and all memory. More details here: master...luiscape:gvisor:add-missing-arm-ioctl#diff-1186c8000d0492e2995c2423bc81d1b845aa26a1caf94b742cab91ed2bc711a6R193

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: gpuIssue related to sandboxed GPU accesstype: enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions