Fix Mamba conv1d compatibility [OMNIML-5199] by jenchen13 · Pull Request #1750 · NVIDIA/Model-Optimizer

jenchen13 · 2026-06-16T18:30:42Z

What does this PR do?

Type of change: Bug Fix

Megatron-LM changed MambaMixer conv1d fields, causing this bug
120B_PTQ_0/0 [rank2]: AttributeError: 'MambaMixer' object has no attribute 'conv1d'

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A
Did you get Claude approval on this PR?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

Bug Fixes
- Improved Mamba model export compatibility by correctly handling both legacy convolution-style parameters and newer weight/bias layouts during conversion.
Tests
- Added end-to-end GPU tests covering export to safetensors and re-import into fresh models for a small Mamba + attention + MoE hybrid.
- Added focused unit coverage to ensure convolution parameter handling matches expected checkpoint key patterns.

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

coderabbitai · 2026-06-16T18:31:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4a0ddb81-d180-4e83-881a-837d05a20200

📥 Commits

Reviewing files that changed from the base of the PR and between cdda314 and 47f5e65.

📒 Files selected for processing (1)

tests/_test_utils/torch/transformers_models.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/_test_utils/torch/transformers_models.py

📝 Walkthrough

Walkthrough

Adds _MambaConv1dCompat and _get_mamba_conv1d to megatron_importer.py to handle both legacy conv1d module and newer conv1d_weight/conv1d_bias parameter layouts in Mamba mixers. Both the importer and unified exporter are updated to use this helper. Nemotron-H test fixtures and a new GPU test module covering unit and end-to-end export/import flows are added.

Changes

Mamba conv1d compatibility wrapper

Layer / File(s)	Summary
Compat helper definition and import/export wiring `modelopt/torch/export/plugins/megatron_importer.py`, `modelopt/torch/export/unified_export_megatron.py`	Defines `_MambaConv1dCompat` and `_get_mamba_conv1d`, which dispatch to the existing `conv1d` module or wrap `conv1d_weight`/`conv1d_bias` into a compatible module. Updates `_import_mamba_layer` and the unified exporter's `conv1d` rule to use this helper.
Nemotron-H tiny model test fixtures `tests/_test_utils/torch/transformers_models.py`	Adds `NemotronHConfig` import and two helpers — `get_tiny_nemotron_h` and `create_tiny_nemotron_h_dir` — for building and persisting a tiny Mamba+MoE hybrid model for tests.
Unit and end-to-end importer tests `tests/gpu_megatron/torch/export/test_megatron_importer.py`	Adds `_Mixer` stub and two unit tests for `_get_mamba_conv1d` (legacy passthrough and wrapper parameter identity), plus an end-to-end distributed test that builds a NemotronH mcore hybrid, exports to safetensors, asserts Mamba/MoE key patterns, and reimports into a fresh model.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Suggested reviewers

ChenhanYu
hychiang-git

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: fixing Mamba conv1d compatibility issues. It directly relates to the core problem (AttributeError on MambaMixer.conv1d) and the solution implemented across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	No security anti-patterns found. All trust_remote_code parameters are properly configurable with False defaults, no unsafe torch.load/numpy.load/eval/exec usage, no # nosec comments, and no new non...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jennifchen/mamba-conv1d-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-16T18:34:44Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1750/
Built to branch `gh-pages` at 2026-06-16 20:48 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

🧹 Nitpick comments (1)

tests/unit/torch/export/test_megatron_importer.py (1)
25-47: ⚡ Quick win

Add a negative-path regression test for unsupported mixer layouts.

Current coverage validates both success paths, but not the explicit failure contract. Add a test that asserts AttributeError when neither conv1d nor conv1d_weight/conv1d_bias exists.
♻️ Suggested test addition
+def test_get_mamba_conv1d_raises_for_unsupported_layout():
+    mixer = _Mixer()
+    with pytest.raises(AttributeError, match="does not have `conv1d`"):
+        _get_mamba_conv1d(mixer)
As per coding guidelines, checked-in tests should document expected behavior and protect against regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/torch/export/test_megatron_importer.py` around lines 25 - 47, Add
a new test function to document the expected failure behavior when the _Mixer
object lacks both the legacy conv1d module and the direct
conv1d_weight/conv1d_bias parameters. Create a test that instantiates a _Mixer
object without any of these attributes and asserts that calling
_get_mamba_conv1d on it raises an AttributeError. This negative-path test will
ensure the function's failure contract is protected against future regressions
and documents the unsupported mixer layout scenario.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/torch/export/test_megatron_importer.py`:
- Around line 25-47: Add a new test function to document the expected failure
behavior when the _Mixer object lacks both the legacy conv1d module and the
direct conv1d_weight/conv1d_bias parameters. Create a test that instantiates a
_Mixer object without any of these attributes and asserts that calling
_get_mamba_conv1d on it raises an AttributeError. This negative-path test will
ensure the function's failure contract is protected against future regressions
and documents the unsupported mixer layout scenario.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 545462bf-1a21-4b5c-ab87-333ef9a869d9

📥 Commits

Reviewing files that changed from the base of the PR and between 1cccf66 and fd8e9a9.

📒 Files selected for processing (3)

modelopt/torch/export/plugins/megatron_importer.py
modelopt/torch/export/unified_export_megatron.py
tests/unit/torch/export/test_megatron_importer.py

ChenhanYu · 2026-06-16T18:41:05Z

/claude review

codecov · 2026-06-16T18:41:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.31%. Comparing base (1e461dd) to head (47f5e65).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1750      +/-   ##
==========================================
+ Coverage   58.45%   68.31%   +9.85%     
==========================================
  Files         510      511       +1     
  Lines       56271    56367      +96     
==========================================
+ Hits        32896    38507    +5611     
+ Misses      23375    17860    -5515

Flag	Coverage Δ
examples	`41.79% <28.57%> (+19.35%)`	⬆️
gpu	`31.77% <100.00%> (+11.17%)`	⬆️
regression	`14.70% <28.57%> (+0.07%)`	⬆️
unit	`54.34% <28.57%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kevalmorabia97 · 2026-06-16T18:43:26Z

can you move this to tests/gpu_megatron/ which has all required megatron dependencies. It may fail with import error in unit test env

it's not really a GPU test though, is there a CPU megatron path?

No we dont have cpu-only megatron path. Given its unit test so likely <1s to run so fine to just run in gpu test path

i have moved them to gpu_megatron and added more robust NemotronH import + export tests

claude

Claude review passed — no blocking issues found.

Summary
Narrowly-scoped compatibility shim for the Megatron-LM API change (PR #4899) that flattened MambaMixer.conv1d (Conv1d submodule) into direct conv1d_weight/conv1d_bias parameters.

Verified

Algorithm/state correctness: _MambaConv1dCompat.__init__ assigns mixer.conv1d_weight/conv1d_bias as attributes — since they are nn.Parameter instances, PyTorch auto-registers them via __setattr__, so state_dict() yields {weight, bias} matching the legacy nn.Conv1d shape. The wrapper holds the same Parameter objects as the mixer, so load_state_dict() writes through to the underlying MambaMixer parameters (test confirms is identity).
Importer (_name_remapping): With REPLICATE parallel_config, the conv1d weight/bias are loaded from HF without sharding, then module.load_state_dict({weight, bias}) succeeds for both legacy Conv1d and the wrapper (matching keys, no quantizer state, no _extra_state).
Exporter (_get_weight_bias → _name_remapping): hasattr(module, 'weight') and module.weight.numel() > 0 are satisfied by the wrapper; the export writes backbone.layers.{i}.mixer.conv1d.weight/.bias matching HF format (Nemotron-H mapping).
Plugin laziness: Both new symbols are defined outside the import_plugin("megatron") block but don't import megatron, so the unit test loads cleanly.
Backward compat: Legacy code path is preserved (early return when conv1d attr exists); HF-side keys are unchanged in both directions; no modelopt_state schema impact.
Cross-file usage: unified_export_megatron.py correctly imports the helper from megatron_importer.py. Other call sites of mixer.conv1d (modelopt/torch/nas/plugins/megatron.py) are NAS-specific and unrelated to the export flow this PR fixes.

LGTM.

…round-trip Move test_megatron_importer.py from tests/unit to tests/gpu_megatron/torch/export and add a tiny NemotronH (Mamba + MoE + attention hybrid) export/import round-trip test that exercises the mamba conv1d export path. Adds get_tiny_nemotron_h / create_tiny_nemotron_h_dir helpers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 · 2026-06-16T20:38:49Z

/claude review

NemotronHConfig only exists in newer transformers; the top-level import broke every test importing transformers_models (e.g. test_hf_dflash) under the min-transformers CI job. Move it to a lazy import inside get_tiny_nemotron_h, matching the existing Qwen3VL pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

claude · 2026-06-16T20:49:20Z

+class _MambaConv1dCompat(torch.nn.Module):
+    """Expose direct Mamba conv params through the legacy Conv1d state_dict keys."""
+
+    def __init__(self, mixer):
+        super().__init__()
+        self.weight = mixer.conv1d_weight
+        self.bias = mixer.conv1d_bias
+
+
+def _get_mamba_conv1d(mixer):
+    conv1d = getattr(mixer, "conv1d", None)
+    if conv1d is not None:
+        return conv1d
+
+    if hasattr(mixer, "conv1d_weight") and hasattr(mixer, "conv1d_bias"):
+        # Megatron-LM PR #4899 / commit 35992ba changed MambaMixer fields from
+        # `conv1d` to direct `conv1d_weight` and `conv1d_bias` parameters.
+        return _MambaConv1dCompat(mixer)
+
+    raise AttributeError(
+        "MambaMixer does not have `conv1d` or `conv1d_weight`/`conv1d_bias` fields."
+    )


[SUGGESTION] The compat wrapper exposes conv1d_weight/conv1d_bias under the legacy weight/bias keys, which is correct as long as Megatron-LM's new conv1d_weight keeps the same shape as the old nn.Conv1d.weight — i.e. 3D (conv_dim, 1, conv_kernel) matching the HF NemotronH mixer.conv1d.weight layout.

Worth double-checking against commit 35992ba: if the refactor stores conv1d_weight as a squeezed 2D (conv_dim, conv_kernel) parameter, then on export the safetensor would be written 2D and would fail to load into HF transformers' 3D nn.Conv1d (and symmetrically on import). The export/import paths here pass the tensor through verbatim without reshaping, so a layout divergence would silently produce an HF-incompatible checkpoint.

If there's any chance of a 2D layout, consider normalizing to the HF 3D shape inside _MambaConv1dCompat (e.g. self.weight = mixer.conv1d_weight.view(conv_dim, 1, -1) on export and reverse on import). If the shape is guaranteed identical, no change needed — this is just a verification ask.

claude · 2026-06-16T20:49:31Z

+            with safe_open(str(sf), framework="pt", device="cpu") as f:
+                keys.extend(f.keys())
+        # Mamba mixer (conv1d is the layer fixed by _get_mamba_conv1d), plus MoE experts.
+        assert any("mixer.conv1d" in k for k in keys), "mamba conv1d weights missing from export"
+        assert any("mixer.in_proj" in k for k in keys), "mamba in_proj weights missing from export"
+        assert any("mixer.experts" in k for k in keys), "moe expert weights missing from export"
+
+    # Import the exported checkpoint back into a fresh mcore model
+    # (megatron is already initialized from the first build).
+    imported = _build_mcore_nemotron_h(config, size, initialize=False)
+    import_mcore_gpt_from_hf(imported, str(export_dir))
+
+
+def test_nemotron_h_export_import(dist_workers_size_1, tmp_path):
+    model_dir = create_tiny_nemotron_h_dir(tmp_path)


[SUGGESTION] Two coverage gaps worth noting (non-blocking):

The end-to-end test only exercises whichever conv1d layout the installed Megatron-LM has. With an older Megatron (still exposing mixer.conv1d), _get_mamba_conv1d returns the legacy module and the new _MambaConv1dCompat branch — the actual subject of this fix — is never hit here. Only the unit test test_get_mamba_conv1d_wraps_direct_params covers the new path, and it uses a hand-authored 3D tensor. So the new-layout path isn't validated against a real Megatron mixer end-to-end unless CI pins the post-35992ba version.

This round-trip validates mcore → safetensors → mcore self-consistency, not HF-loadability. Since import_mcore_gpt_from_hf reads the exported safetensors back into a same-shaped mcore param, a conv1d weight stored in a non-HF (e.g. 2D) shape would round-trip cleanly here yet still fail to load into transformers NemotronH. Consider adding an explicit assertion on the exported mixer.conv1d.weight shape/ndim (expecting the 3D HF layout) to catch a layout divergence — this ties directly to the shape concern flagged in megatron_importer.py.

claude

Claude review passed — no blocking issues found. LGTM

Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 2

This is a clean, minimal, well-targeted bug fix for the Megatron-LM MambaMixer field rename (conv1d → conv1d_weight/conv1d_bias, commit 35992ba).

Verification done:

_get_mamba_conv1d is wired symmetrically into both the importer (megatron_importer.py:588) and exporter (unified_export_megatron.py:667), the only two call sites that touch mixer.conv1d in the export path.
The compat wrapper round-trips correctly through both consumption mechanisms: import uses module.state_dict() + load_state_dict (in-place copy into the aliased params — covered by the unit test), and export reads module.weight/module.bias attributes directly via _get_weight_bias.
The legacy branch (getattr(mixer, "conv1d")) preserves existing behavior, so this is backward compatible with older Megatron.
The e2e test's mixer.conv1d/mixer.in_proj/mixer.experts key assertions all match the mcore_nemotron.py rule book.
The NAS path (nas/plugins/megatron.py) also references mixer.conv1d but is out of scope for this export-focused fix.

Suggestions (non-blocking):

Verify the new conv1d_weight retains the 3D (conv_dim, 1, kernel) shape of the old nn.Conv1d.weight; a squeezed 2D layout would silently produce HF-incompatible checkpoints since the tensor is passed through without reshaping.
The e2e round-trip validates mcore↔safetensors self-consistency on whichever conv1d layout the installed Megatron has — it doesn't independently assert HF-loadable shape, and won't exercise the new-field branch unless CI pins post-35992ba Megatron.

Risk: Low. Additive, backward-compatible, with both unit and e2e test coverage.

Fix Mamba conv1d compatibility

fd8e9a9

Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>

jenchen13 requested review from a team as code owners June 16, 2026 18:30

jenchen13 requested a review from meenchen June 16, 2026 18:30

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

coderabbitai Bot approved these changes Jun 16, 2026

View reviewed changes

jenchen13 changed the title ~~Fix Mamba conv1d compatibility~~ Fix Mamba conv1d compatibility [OMNIML-5199] Jun 16, 2026

kevalmorabia97 added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 16, 2026

kevalmorabia97 reviewed Jun 16, 2026

View reviewed changes

claude Bot approved these changes Jun 16, 2026

View reviewed changes

ChenhanYu approved these changes Jun 16, 2026

View reviewed changes

jenchen13 requested a review from kevalmorabia97 June 16, 2026 20:38

claude Bot reviewed Jun 16, 2026

View reviewed changes

claude Bot approved these changes Jun 16, 2026

View reviewed changes

claude Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread modelopt/torch/export/plugins/megatron_importer.py

claude Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread tests/gpu_megatron/torch/export/test_megatron_importer.py

jenchen13 enabled auto-merge (squash) June 16, 2026 21:21

Conversation

jenchen13 commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-16 20:48 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented Jun 16, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kevalmorabia97 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

jenchen13 Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jenchen13 commented Jun 16, 2026

Uh oh!

claude Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jenchen13 commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-16 20:48 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 16, 2026 •

edited

Loading