Skip to content

Add support for non A100 GPUs - using NVML#250

Open
fmoessbauer wants to merge 2 commits intoGoogleCloudPlatform:masterfrom
fmoessbauer:fm/parse-gpu-instances-v3
Open

Add support for non A100 GPUs - using NVML#250
fmoessbauer wants to merge 2 commits intoGoogleCloudPlatform:masterfrom
fmoessbauer:fm/parse-gpu-instances-v3

Conversation

@fmoessbauer
Copy link
Copy Markdown

This series removes the A100 specific hard-coded settings and adds NVML queries to automatically detect all available partitions.
The NVML library is added and vendored. We use the latest version as it contains a critical bugfix when initializing the library which is not yet in any release.

We internally tested this on an A30 GPU.

This patch replaces the static discovery and mapping of
GPU profiles (and sizes) by a dynamic discovery.

By that, the plugin supports any partitionable GPUs.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
This patch removes some sanity checks from nvidia_gpu that use hard-coded partition sizes.
By that, we make the plugin compatible with other NVIDIA cards like the
A30.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
@fmoessbauer
Copy link
Copy Markdown
Author

@crystalzhaizhai Any news on this one? Are there still changes required?

@tobias-schuele
Copy link
Copy Markdown

Any updates on this, @crystalzhaizhai? We'd highly appreciate a review of the contributions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants