[nvbug 6289151] Fix exported Step layer type metadata by meenchen · Pull Request #1693 · NVIDIA/Model-Optimizer

meenchen · 2026-06-11T22:20:57Z

What does this PR do?

Type of change: Bug fix

Fixes HF checkpoint export for Step-3.5-Flash-style configs where layer_types includes main decoder layers plus num_nextn_predict_layers entries. Transformers 5.x validates that len(layer_types) == num_hidden_layers, so TRT-LLM fails during AutoConfig.from_pretrained() before loading the model.

This PR trims only the trailing next-token-prediction/MTP layer_types entries when the mismatch is exactly explained by num_nextn_predict_layers. It leaves unexplained mismatches unchanged.

Usage

N/A

Testing

/Users/weimingc/miniconda3/envs/modelopt/bin/python -m py_compile modelopt/torch/export/unified_export_hf.py tests/unit/torch/export/test_unified_export_hf.py
/Users/weimingc/miniconda3/envs/modelopt/bin/python -m pytest tests/unit/torch/export/test_unified_export_hf.py tests/unit/torch/export/test_nvfp4_utils.py
git diff --check

Full TRT-LLM serve was not run locally because the provided Step-3.5 checkpoint path is not visible from the configured cluster login node.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: N/A

Additional Information

nvbug 6289151

Related MTP handling: #1532.

Summary by CodeRabbit

Bug Fixes
- Improved Hugging Face export to automatically sanitize model config to fix layer-configuration mismatches for more reliable deployments.
Tests
- Added unit tests covering the new configuration sanitization behavior and warning emission.
Chores
- Adjusted plugin import/evaluation order to stabilize module initialization side effects.

coderabbitai · 2026-06-11T22:21:09Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 57a4ca47-d21a-4248-a3d9-c46c5842f3e7

📥 Commits

Reviewing files that changed from the base of the PR and between 0bccd7b and 97687b3.

📒 Files selected for processing (4)

modelopt/torch/export/plugins/__init__.py
modelopt/torch/export/plugins/hf_checkpoint_utils.py
modelopt/torch/export/unified_export_hf.py
tests/unit/torch/export/test_hf_checkpoint_utils.py

🚧 Files skipped from review as they are similar to previous changes (3)

modelopt/torch/export/unified_export_hf.py
tests/unit/torch/export/test_hf_checkpoint_utils.py
modelopt/torch/export/plugins/hf_checkpoint_utils.py

📝 Walkthrough

Walkthrough

Adds a sanitizer that trims Hugging Face config.json layer_types entries when excess entries match inferred next-token-prediction/MTP layers, integrates the sanitizer into export_hf_checkpoint, reorders plugin imports, and adds three unit tests validating the behavior.

Changes

Layer-types sanitization for HF export

Layer / File(s)	Summary
Sanitization helper implementation `modelopt/torch/export/plugins/hf_checkpoint_utils.py`	New `sanitize_hf_config_for_deployment` with helpers to parse numeric config fields and derive `num_nextn_predict_layers`; trims `config_data["layer_types"]` to `num_hidden_layers` when excess entries correspond to nextn/MTP layers and emits a warning.
Plugin import ordering `modelopt/torch/export/plugins/__init__.py`	Reorders guarded plugin imports so `vllm_fakequant_hf` is imported before the `hf_checkpoint_utils` block, altering import evaluation order.
Export workflow integration `modelopt/torch/export/unified_export_hf.py`	Imports and invokes `sanitize_hf_config_for_deployment` inside `export_hf_checkpoint` after loading `config.json` and before persisting the adjusted `config_data`.
Unit tests for sanitization behavior `tests/unit/torch/export/test_hf_checkpoint_utils.py`	Adds three pytest cases verifying trimming behavior, fallback to `model.config.num_nextn_predict_layers`, and preservation when extra layer types cannot be explained; tests assert warnings and final `layer_types`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

cherry-pick-0.45.0

Suggested reviewers

sugunav14
h-guo18
shengliangxu
Edwardf0t1

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[nvbug 6289151] Fix exported Step layer type metadata' accurately captures the main purpose of this PR: fixing Step layer type metadata in HF checkpoint exports to resolve a mismatch between exported layer_types array length and num_hidden_layers.
Docstring Coverage	✅ Passed	Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Scanned PR-changed .py files against SECURITY.md patterns: no torch.load(weights_only=False) / numpy.load(allow_pickle=True) / eval/exec builtins / “# nosec”. trust_remote_code=True appears only in...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-step35-flash-trtllm-layer-types

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-11T22:24:42Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1693/
Built to branch `gh-pages` at 2026-06-12 17:47 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-11T22:29:35Z

Codecov Report

❌ Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.38%. Comparing base (dd49a46) to head (97687b3).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...delopt/torch/export/plugins/hf_checkpoint_utils.py	84.84%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1693      +/-   ##
==========================================
+ Coverage   67.72%   76.38%   +8.66%     
==========================================
  Files         511      511              
  Lines       56168    56524     +356     
==========================================
+ Hits        38037    43175    +5138     
+ Misses      18131    13349    -4782

Flag	Coverage Δ
examples	`41.83% <45.94%> (+0.52%)`	⬆️
gpu	`57.69% <51.35%> (+25.74%)`	⬆️
regression	`14.66% <21.62%> (+0.02%)`	⬆️
unit	`54.35% <78.37%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

cjluo-nv

Bot review — DM the bot to share feedback.

Small, focused bug fix (+121/-5, 4 files) for Step-3.5-Flash HF export where layer_types includes trailing MTP/next-token-prediction entries, breaking Transformers 5.x's len(layer_types) == num_hidden_layers validation in TRT-LLM.

Correctness:

sanitize_hf_config_for_deployment is conservative: it only trims trailing entries when the mismatch is exactly explained by num_nextn_predict_layers, and leaves unexplained mismatches untouched.
_as_nonnegative_int correctly excludes bool (which is an int subclass) before the integer check.
_get_num_nextn_predict_layers falls back config_data → model.config → _mtp_layer_prefixes length; when it returns None, None == <int> is False, so no trim happens. A num_nextn_predict_layers == 0 value can never trigger a trim because the equal-length case already returns early. All edges consistent.
Integration point in export_hf_checkpoint is correct: called after save_pretrained writes config.json and before the config is re-written to disk; trimming layer_types[:num_hidden_layers] keeps the head decoder entries, matching the documented MTP-entries-come-last assumption.
The plugins/__init__.py reorder (moving hf_checkpoint_utils star-import ahead of vllm_fakequant_hf) is benign; both are simple plugin star-imports with no cross-dependency. Private helpers (_as_nonnegative_int, _get_num_nextn_predict_layers) won't leak via import *.

Tests: Three unit tests cover the config-nextn-count trim, the model.config-nextn-count trim, and the keep-unexplained-mismatch case, including warning emission. Meaningful coverage of the core branches.

No licensing changes, no new subsystem/abstraction, no prompt-injection attempts in the PR metadata. Full TRT-LLM serve wasn't run (checkpoint not visible from cluster), which is an acceptable limitation noted by the author for a config-only sanitizer with unit coverage.

Complex PR: 1 existing test file modified or removed. Looping in a human for approval.

meenchen requested a review from a team as a code owner June 11, 2026 22:20

meenchen requested a review from sugunav14 June 11, 2026 22:20

coderabbitai Bot approved these changes Jun 11, 2026

View reviewed changes

meenchen force-pushed the fix-step35-flash-trtllm-layer-types branch from 4974fcb to 0bccd7b Compare June 11, 2026 22:33

[6289151] Fix exported Step layer type metadata

97687b3

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

meenchen force-pushed the fix-step35-flash-trtllm-layer-types branch from 0bccd7b to 97687b3 Compare June 12, 2026 17:44

cjluo-nv reviewed Jun 12, 2026

View reviewed changes

meenchen requested a review from Edwardf0t1 June 12, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[nvbug 6289151] Fix exported Step layer type metadata#1693

[nvbug 6289151] Fix exported Step layer type metadata#1693
meenchen wants to merge 1 commit into
mainfrom
fix-step35-flash-trtllm-layer-types

meenchen commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-12 17:47 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

meenchen commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-12 17:47 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

meenchen commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-12 17:47 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 11, 2026 •

edited

Loading