Require --instance-type when specifying accelerator resources (#317)#393
Merged
mollyheamazon merged 2 commits intoaws:mainfrom Mar 25, 2026
Merged
Conversation
Collaborator
|
Can you reenable these unit tests as part of your fix: They were commented out previously for a merge conflict resolution. |
21b5e32 to
d03913f
Compare
Member
Author
|
Re-enabled all 4 tests with corrected assertions and refactored to use the full pipeline. Also moved the validation before the early return per the inline comment. |
mollyheamazon
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's changing and why?
Closes #317 (the
--node-countmutual exclusivity part was fixed in #383)When
--acceleratorsor--accelerators-limitis specified without--instance-type, the CLI silently producesnvidia.com/gpu: 0in the k8s resource spec instead of the user's requested value. This happens because_set_default_accelerators_valneeds the instance type to determine the accelerator key (nvidia.com/gpuvsaws.amazon.com/neuroncore), and when it can't, it silently returnsNonefor both values.This change adds an early validation that raises a clear error when accelerators are specified without
--instance-type, and removes the dead code path in_get_limitsthat hardcodednvidia.com/gpu: 0.Before/After UX
Before:
After:
How was this change tested?
Are unit tests added?
Updated 3 tests that asserted the old behavior (
nvidia.com/gpu: 0on CPU-only / invalid instances)Are integration tests added?
N/A
Reviewer Guidelines
One of the following must be true: