Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra) by KeitaW · Pull Request #398 · aws/sagemaker-hyperpod-cli

KeitaW · 2026-03-27T12:21:22Z

Summary

Add ml.p6-b300.48xlarge to INSTANCE_TYPE_MIG_PROFILES in constants.py with the B300 MIG profiles: mig-1g.34gb, mig-1g.67gb, mig-2g.67gb, mig-3g.135gb, mig-4g.135gb, mig-7g.269gb
Add 17 B300-specific MIG partition profiles (7 uniform + 10 mixed) to the Helm chart default-mig-config.yaml ConfigMap

Relationship to #396

PR #396 ("Added profiles for B300") was merged on 2026-03-23 and added 2 ConfigMap profiles (all-1g.67gb and mixed-2-1g.34gb-1-2g.67gb-1-3g.135gb). However, it left two critical gaps:

1. constants.py was not updated — MIG requests on B300 are rejected before the ConfigMap is ever consulted.

_validate_accelerator_partition_parameters() in accelerator_partition_util.py checks INSTANCE_TYPE_MIG_PROFILES at line 26 as a gate. Because ml.p6-b300.48xlarge is absent from that dict, the CLI returns:

"Instance type 'ml.p6-b300.48xlarge' does not support accelerator partitions."

This blocks all MIG usage on B300 — HyperPodPyTorchJob submissions, inference endpoints with acceleratorPartitionType, and hyp list-accelerator-partition-type. The ConfigMap profiles from #396 are unreachable.

2. 15 of 17 ConfigMap profiles are missing.

Cross-referencing against the NVIDIA GPU Operator v25.3.0 upstream ConfigMap (B300 section, device-filter 0x318210DE) and the NVIDIA MIG product page (Blackwell Ultra: 7x34GB, 4x69GB, 2x139GB, 1x279GB):

Profile	Upstream	After #396	This PR
`all-1g.34gb` (x7)	Yes	Missing	Added
`all-1g.67gb` (x4)	Yes	Added	—
`all-2g.67gb` (x3)	Yes	Missing	Added
`all-3g.135gb` (x2)	Yes	Missing	Added
`all-4g.135gb` (x1)	Yes	Missing	Added
`all-7g.269gb` (x1)	Yes	Missing	Added
10 mixed profiles	Yes	1 of 10	+9 added

Profile Source

MIG profiles are derived from the NVIDIA GPU Operator upstream ConfigMap (v25.3.0), which defines B300 profiles under the # B300 comment section with all-balanced device-filter 0x318210DE. The NVIDIA MIG User Guide (r580) has not been updated for B300 yet.

Additional Note

The existing p6-b200.48xlarge key in INSTANCE_TYPE_MIG_PROFILES is missing the ml. prefix (unlike all other entries). This PR does not address that issue to keep scope focused, but it may warrant a separate fix.

Test plan

Verify INSTANCE_TYPE_MIG_PROFILES['ml.p6-b300.48xlarge'] returns the correct 6 profiles
Verify ALLOWED_ACCELERATOR_PARTITION_TYPES includes all B300 MIG types (mig-1g.34gb, mig-1g.67gb, mig-2g.67gb, mig-3g.135gb, mig-4g.135gb, mig-7g.269gb)
Verify default-mig-config.yaml parses as valid YAML
Verify _validate_accelerator_partition("mig-1g.34gb", ..., "ml.p6-b300.48xlarge") passes validation
Integration test: deploy a MIG-enabled instance group with ml.p6-b300.48xlarge and nvidia.com/mig.config: all-1g.34gb

Add ml.p6-b300.48xlarge to INSTANCE_TYPE_MIG_PROFILES in constants.py with the correct B300 MIG profiles derived from the NVIDIA GPU Operator v25.3.0 upstream ConfigMap (device-filter 0x318210DE): - mig-1g.34gb, mig-1g.67gb, mig-2g.67gb - mig-3g.135gb, mig-4g.135gb, mig-7g.269gb Also add the corresponding uniform and mixed MIG partition profiles to the Helm chart default-mig-config.yaml ConfigMap, following the same pattern used for existing GPU types (H100, H200, B200). The B300 GPU (288GB HBM3e, ~269GB usable) was already registered in INSTANCE_RESOURCES but had no MIG profile mapping, causing HyperPod MIG validation to reject accelerator partition requests on this instance type.

Covers ml.p6-b300.48xlarge MIG profile support added in PR aws#398: - Profile presence in INSTANCE_TYPE_MIG_PROFILES - Complete profile list verification (6 profiles) - All profiles in ALLOWED_ACCELERATOR_PARTITION_TYPES - GPU slice extraction for all B300 profiles (1g→1, 2g→2, ..., 7g→7) - CPU/memory default calculation for each profile at max instances - Validation acceptance for valid B300 profiles - Validation rejection for invalid profiles on B300 instance type

KeitaW requested a review from a team as a code owner March 27, 2026 12:21

KeitaW had a problem deploying to manual-approval March 27, 2026 12:21 — with GitHub Actions Error

KeitaW force-pushed the feat/add-p6-b300-mig-profiles branch from 045470a to c98fd6e Compare March 28, 2026 00:06

KeitaW requested a deployment to manual-approval March 28, 2026 00:07 — with GitHub Actions Waiting

KeitaW mentioned this pull request Mar 28, 2026

Add MIG partition validation and defaults tests for all instance types #400

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra)#398

Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra)#398
KeitaW wants to merge 1 commit intoaws:mainfrom
KeitaW:feat/add-p6-b300-mig-profiles

KeitaW commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KeitaW commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to #396

Profile Source

Additional Note

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KeitaW commented Mar 27, 2026 •

edited

Loading