Fix Mamba conv1d compatibility [OMNIML-5199]#1750
Conversation
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesMamba conv1d compatibility wrapper
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/torch/export/test_megatron_importer.py (1)
25-47: ⚡ Quick winAdd a negative-path regression test for unsupported mixer layouts.
Current coverage validates both success paths, but not the explicit failure contract. Add a test that asserts
AttributeErrorwhen neitherconv1dnorconv1d_weight/conv1d_biasexists.♻️ Suggested test addition
+def test_get_mamba_conv1d_raises_for_unsupported_layout(): + mixer = _Mixer() + with pytest.raises(AttributeError, match="does not have `conv1d`"): + _get_mamba_conv1d(mixer)As per coding guidelines, checked-in tests should document expected behavior and protect against regressions.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/torch/export/test_megatron_importer.py` around lines 25 - 47, Add a new test function to document the expected failure behavior when the _Mixer object lacks both the legacy conv1d module and the direct conv1d_weight/conv1d_bias parameters. Create a test that instantiates a _Mixer object without any of these attributes and asserts that calling _get_mamba_conv1d on it raises an AttributeError. This negative-path test will ensure the function's failure contract is protected against future regressions and documents the unsupported mixer layout scenario.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/torch/export/test_megatron_importer.py`:
- Around line 25-47: Add a new test function to document the expected failure
behavior when the _Mixer object lacks both the legacy conv1d module and the
direct conv1d_weight/conv1d_bias parameters. Create a test that instantiates a
_Mixer object without any of these attributes and asserts that calling
_get_mamba_conv1d on it raises an AttributeError. This negative-path test will
ensure the function's failure contract is protected against future regressions
and documents the unsupported mixer layout scenario.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 545462bf-1a21-4b5c-ab87-333ef9a869d9
📒 Files selected for processing (3)
modelopt/torch/export/plugins/megatron_importer.pymodelopt/torch/export/unified_export_megatron.pytests/unit/torch/export/test_megatron_importer.py
|
/claude review |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1750 +/- ##
==========================================
+ Coverage 58.45% 68.31% +9.85%
==========================================
Files 510 511 +1
Lines 56271 56367 +96
==========================================
+ Hits 32896 38507 +5611
+ Misses 23375 17860 -5515
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
can you move this to tests/gpu_megatron/ which has all required megatron dependencies. It may fail with import error in unit test env
There was a problem hiding this comment.
it's not really a GPU test though, is there a CPU megatron path?
There was a problem hiding this comment.
No we dont have cpu-only megatron path. Given its unit test so likely <1s to run so fine to just run in gpu test path
There was a problem hiding this comment.
i have moved them to gpu_megatron and added more robust NemotronH import + export tests
There was a problem hiding this comment.
Claude review passed — no blocking issues found.
Summary
Narrowly-scoped compatibility shim for the Megatron-LM API change (PR #4899) that flattened MambaMixer.conv1d (Conv1d submodule) into direct conv1d_weight/conv1d_bias parameters.
Verified
- Algorithm/state correctness:
_MambaConv1dCompat.__init__assignsmixer.conv1d_weight/conv1d_biasas attributes — since they arenn.Parameterinstances, PyTorch auto-registers them via__setattr__, sostate_dict()yields{weight, bias}matching the legacynn.Conv1dshape. The wrapper holds the sameParameterobjects as the mixer, soload_state_dict()writes through to the underlyingMambaMixerparameters (test confirmsisidentity). - Importer (
_name_remapping): WithREPLICATEparallel_config, the conv1dweight/biasare loaded from HF without sharding, thenmodule.load_state_dict({weight, bias})succeeds for both legacyConv1dand the wrapper (matching keys, no quantizer state, no_extra_state). - Exporter (
_get_weight_bias→_name_remapping):hasattr(module, 'weight')andmodule.weight.numel() > 0are satisfied by the wrapper; the export writesbackbone.layers.{i}.mixer.conv1d.weight/.biasmatching HF format (Nemotron-H mapping). - Plugin laziness: Both new symbols are defined outside the
import_plugin("megatron")block but don't import megatron, so the unit test loads cleanly. - Backward compat: Legacy code path is preserved (early return when
conv1dattr exists); HF-side keys are unchanged in both directions; nomodelopt_stateschema impact. - Cross-file usage:
unified_export_megatron.pycorrectly imports the helper frommegatron_importer.py. Other call sites ofmixer.conv1d(modelopt/torch/nas/plugins/megatron.py) are NAS-specific and unrelated to the export flow this PR fixes.
LGTM.
…round-trip Move test_megatron_importer.py from tests/unit to tests/gpu_megatron/torch/export and add a tiny NemotronH (Mamba + MoE + attention hybrid) export/import round-trip test that exercises the mamba conv1d export path. Adds get_tiny_nemotron_h / create_tiny_nemotron_h_dir helpers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
|
/claude review |
NemotronHConfig only exists in newer transformers; the top-level import broke every test importing transformers_models (e.g. test_hf_dflash) under the min-transformers CI job. Move it to a lazy import inside get_tiny_nemotron_h, matching the existing Qwen3VL pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
| class _MambaConv1dCompat(torch.nn.Module): | ||
| """Expose direct Mamba conv params through the legacy Conv1d state_dict keys.""" | ||
|
|
||
| def __init__(self, mixer): | ||
| super().__init__() | ||
| self.weight = mixer.conv1d_weight | ||
| self.bias = mixer.conv1d_bias | ||
|
|
||
|
|
||
| def _get_mamba_conv1d(mixer): | ||
| conv1d = getattr(mixer, "conv1d", None) | ||
| if conv1d is not None: | ||
| return conv1d | ||
|
|
||
| if hasattr(mixer, "conv1d_weight") and hasattr(mixer, "conv1d_bias"): | ||
| # Megatron-LM PR #4899 / commit 35992ba changed MambaMixer fields from | ||
| # `conv1d` to direct `conv1d_weight` and `conv1d_bias` parameters. | ||
| return _MambaConv1dCompat(mixer) | ||
|
|
||
| raise AttributeError( | ||
| "MambaMixer does not have `conv1d` or `conv1d_weight`/`conv1d_bias` fields." | ||
| ) |
There was a problem hiding this comment.
[SUGGESTION] The compat wrapper exposes conv1d_weight/conv1d_bias under the legacy weight/bias keys, which is correct as long as Megatron-LM's new conv1d_weight keeps the same shape as the old nn.Conv1d.weight — i.e. 3D (conv_dim, 1, conv_kernel) matching the HF NemotronH mixer.conv1d.weight layout.
Worth double-checking against commit 35992ba: if the refactor stores conv1d_weight as a squeezed 2D (conv_dim, conv_kernel) parameter, then on export the safetensor would be written 2D and would fail to load into HF transformers' 3D nn.Conv1d (and symmetrically on import). The export/import paths here pass the tensor through verbatim without reshaping, so a layout divergence would silently produce an HF-incompatible checkpoint.
If there's any chance of a 2D layout, consider normalizing to the HF 3D shape inside _MambaConv1dCompat (e.g. self.weight = mixer.conv1d_weight.view(conv_dim, 1, -1) on export and reverse on import). If the shape is guaranteed identical, no change needed — this is just a verification ask.
| with safe_open(str(sf), framework="pt", device="cpu") as f: | ||
| keys.extend(f.keys()) | ||
| # Mamba mixer (conv1d is the layer fixed by _get_mamba_conv1d), plus MoE experts. | ||
| assert any("mixer.conv1d" in k for k in keys), "mamba conv1d weights missing from export" | ||
| assert any("mixer.in_proj" in k for k in keys), "mamba in_proj weights missing from export" | ||
| assert any("mixer.experts" in k for k in keys), "moe expert weights missing from export" | ||
|
|
||
| # Import the exported checkpoint back into a fresh mcore model | ||
| # (megatron is already initialized from the first build). | ||
| imported = _build_mcore_nemotron_h(config, size, initialize=False) | ||
| import_mcore_gpt_from_hf(imported, str(export_dir)) | ||
|
|
||
|
|
||
| def test_nemotron_h_export_import(dist_workers_size_1, tmp_path): | ||
| model_dir = create_tiny_nemotron_h_dir(tmp_path) |
There was a problem hiding this comment.
[SUGGESTION] Two coverage gaps worth noting (non-blocking):
-
The end-to-end test only exercises whichever conv1d layout the installed Megatron-LM has. With an older Megatron (still exposing
mixer.conv1d),_get_mamba_conv1dreturns the legacy module and the new_MambaConv1dCompatbranch — the actual subject of this fix — is never hit here. Only the unit testtest_get_mamba_conv1d_wraps_direct_paramscovers the new path, and it uses a hand-authored 3D tensor. So the new-layout path isn't validated against a real Megatron mixer end-to-end unless CI pins the post-35992baversion. -
This round-trip validates mcore → safetensors → mcore self-consistency, not HF-loadability. Since
import_mcore_gpt_from_hfreads the exported safetensors back into a same-shaped mcore param, a conv1d weight stored in a non-HF (e.g. 2D) shape would round-trip cleanly here yet still fail to load intotransformersNemotronH. Consider adding an explicit assertion on the exportedmixer.conv1d.weightshape/ndim (expecting the 3D HF layout) to catch a layout divergence — this ties directly to the shape concern flagged inmegatron_importer.py.
There was a problem hiding this comment.
Claude review passed — no blocking issues found. LGTM
Findings: CRITICAL: 0, IMPORTANT: 0, SUGGESTION: 2
This is a clean, minimal, well-targeted bug fix for the Megatron-LM MambaMixer field rename (conv1d → conv1d_weight/conv1d_bias, commit 35992ba).
Verification done:
_get_mamba_conv1dis wired symmetrically into both the importer (megatron_importer.py:588) and exporter (unified_export_megatron.py:667), the only two call sites that touchmixer.conv1din the export path.- The compat wrapper round-trips correctly through both consumption mechanisms: import uses
module.state_dict()+load_state_dict(in-place copy into the aliased params — covered by the unit test), and export readsmodule.weight/module.biasattributes directly via_get_weight_bias. - The legacy branch (
getattr(mixer, "conv1d")) preserves existing behavior, so this is backward compatible with older Megatron. - The e2e test's
mixer.conv1d/mixer.in_proj/mixer.expertskey assertions all match themcore_nemotron.pyrule book. - The NAS path (
nas/plugins/megatron.py) also referencesmixer.conv1dbut is out of scope for this export-focused fix.
Suggestions (non-blocking):
- Verify the new
conv1d_weightretains the 3D(conv_dim, 1, kernel)shape of the oldnn.Conv1d.weight; a squeezed 2D layout would silently produce HF-incompatible checkpoints since the tensor is passed through without reshaping. - The e2e round-trip validates mcore↔safetensors self-consistency on whichever conv1d layout the installed Megatron has — it doesn't independently assert HF-loadable shape, and won't exercise the new-field branch unless CI pins post-35992ba Megatron.
Risk: Low. Additive, backward-compatible, with both unit and e2e test coverage.
What does this PR do?
Type of change: Bug Fix
Megatron-LM changed MambaMixer conv1d fields, causing this bug
120B_PTQ_0/0 [rank2]: AttributeError: 'MambaMixer' object has no attribute 'conv1d'Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
Bug Fixes
Tests