[OMNIML-4922] Four over Six PTQ & Updating Nemotron Ultra Example#1684
[OMNIML-4922] Four over Six PTQ & Updating Nemotron Ultra Example#1684jenchen13 wants to merge 4 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Four-Over-Six (4/6) adaptive per-block NVFP4 weight scaling: parameterizes FP8 normalization, implements M=4 vs M=6 per-block candidate generation and MSE-based selection, threads flags through quantizers and backend, provides presets/recipes, and adds unit tests. ChangesNVFP4 Four-Over-Six Adaptive Per-Block Scaling
Sequence DiagramsequenceDiagram
participant Quantizer
participant NVFP4QTensor
participant _select_four_over_six_scale
participant triton_kernel
Quantizer->>NVFP4QTensor: quantize(four_over_six flag)
NVFP4QTensor->>_select_four_over_six_scale: per-block amax, generate M4/M6 candidates
_select_four_over_six_scale->>triton_kernel: fake-quant candidates (FP8→E2M1) round-trip
triton_kernel-->>_select_four_over_six_scale: quantized reconstructions
_select_four_over_six_scale-->>NVFP4QTensor: chosen per-block scales (lower MSE)
NVFP4QTensor->>triton_kernel: static_blockwise_fp4_fake_quant(fp8_max_for_normalization)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4_four_over_six.yaml`:
- Around line 16-18: The comment wrongly states that both weight and input
quantizers are dynamic NVFP4; update it to reflect that the imported weight
config uses the static four_over_six path (i.e. static weights with
Four-Over-Six scaling) while only the input/activation quantizers remain dynamic
NVFP4. Locate the QuantizerCfgList and the reference to the four_over_six weight
config (the w4a4_nvfp4_nvfp4_four_over_six unit) and change the wording to
“static weights (four_over_six) + dynamic NVFP4 inputs/activations” or
equivalent concise phrasing.
In `@tests/unit/torch/quantization/test_nvfp4_four_over_six.py`:
- Around line 168-169: The test currently imports choices inside the test body;
move the import statement "from modelopt.torch.quantization.config import
choices" to the module top-level so import errors surface at collection time and
follow project import guidelines, or if there is a true
circular/optional/heavy-import reason, keep it inline but add a concise comment
justifying the in-function import and mark the test appropriately (e.g., a skip
or note) so reviewers understand the exception.
- Around line 116-121: The selected-scale re-cast uses the default FP8
normalization max; change the second call so sel_scale is cast with the
four-over-six normalization max (256) to match the validation path. Update the
call to NVFP4QTensor._fake_quant_to_e2m1(...,
_cast_per_block_scale_to_fp8(sel_scale, 256).float(), ...) (or pass the project
constant for the 4/6 max) so sel_scale uses the 4/6 normalization like m6_scale
and BLOCK_SIZE remains unchanged.
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`:
- Line 32: Update the QUANT_CFG and export tag values to the new 4/6 preset
names: replace any occurrence of QUANT_CFG value
"huggingface/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/ultra-nvfp4-max-calib" with
"huggingface/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/ultra-nvfp4-46-max", and
replace export tag usages of "super-nvfp4-max-calib" with "super-nvfp4-46-max"
so task_2 artifact lookup path derived from QUANT_CFG matches; search for the
QUANT_CFG key and the literal export tag "super-nvfp4-max-calib" in the YAML
(lines around the existing QUANT_CFG, lines ~55, ~76-77, ~83) and update them
consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 922a3994-8d5b-4c0d-81b4-8d59a2509a7c
📒 Files selected for processing (14)
modelopt/torch/export/layer_utils.pymodelopt/torch/kernels/quantization/gemm/fp4_kernel.pymodelopt/torch/kernels/quantization/gemm/fp4_kernel_hopper.pymodelopt/torch/quantization/backends/nvfp4_gemm.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/qtensor/nvfp4_tensor.pymodelopt/torch/quantization/tensor_quant.pymodelopt_recipes/configs/numerics/nvfp4_four_over_six.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_four_over_six.yamlmodelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4_four_over_six.yamlmodelopt_recipes/huggingface/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/ultra-nvfp4-46-max.yamltests/unit/torch/quantization/test_nvfp4_four_over_six.pytools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1684 +/- ##
==========================================
+ Coverage 67.73% 76.04% +8.31%
==========================================
Files 511 511
Lines 56169 56636 +467
==========================================
+ Hits 38044 43068 +5024
+ Misses 18125 13568 -4557
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml (1)
21-21:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate the pipeline note to match the active quant config tag.
Line 21 still says
super-nvfp4-max-calibwhile tasks now usenvfp4-46-max. This creates doc/config drift for operators.As per coding guidelines, “Don't repeat yourself; keep a single source of truth.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml` at line 21, Update the pipeline note string to reflect the active quant config tag: replace "super-nvfp4-max-calib" with "nvfp4-46-max" in the note field (the quoted value on the line containing note: in the megatron_lm_ptq.yaml) so the documentation/config matches the actual tasks using nvfp4-46-max.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml`:
- Line 21: Update the pipeline note string to reflect the active quant config
tag: replace "super-nvfp4-max-calib" with "nvfp4-46-max" in the note field (the
quoted value on the line containing note: in the megatron_lm_ptq.yaml) so the
documentation/config matches the actual tasks using nvfp4-46-max.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8936915e-0ac5-43c0-a688-b25075b3ccf9
📒 Files selected for processing (6)
modelopt/torch/quantization/qtensor/nvfp4_tensor.pymodelopt_recipes/configs/numerics/nvfp4_four_over_six.yamlmodelopt_recipes/configs/ptq/presets/model/nvfp4_four_over_six.yamlmodelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4_four_over_six.yamlmodelopt_recipes/huggingface/models/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/nvfp4-46-max.yamltools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yaml
💤 Files with no reviewable changes (1)
- modelopt_recipes/huggingface/models/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/nvfp4-46-max.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
- modelopt_recipes/configs/numerics/nvfp4_four_over_six.yaml
- modelopt_recipes/configs/ptq/presets/model/nvfp4_four_over_six.yaml
- modelopt_recipes/configs/ptq/units/w4a4_nvfp4_nvfp4_four_over_six.yaml
- modelopt/torch/quantization/qtensor/nvfp4_tensor.py
|
/claude review |
Add the NVFP4_FOUR_OVER_SIX_CFG preset (scoped to max calibration) and implement 4/6 scale selection in NVFP4QTensor, normalizing the selected per-block scale with F8_E4M3_MAX_46. Wire the supporting changes through the fp4 kernels, nvfp4_gemm backend, tensor_quant, tensor_quantizer, config, and layer_utils. Add recipe/preset YAMLs and the Megatron PTQ launcher example, plus unit tests covering 4/6 quantization and config registration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
mypy defaulted to the running interpreter's version and failed to parse 3.10 match/case syntax (e.g. precisionconverter.py). Pin python_version to 3.10 so mypy parses modern syntax. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
c3f69af to
db5497e
Compare
|
/claude review |
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/unit/torch/quantization/test_mse_calibrator.py`:
- Line 805: Move the function-scoped import "import
modelopt.torch.quantization.nn.modules.tensor_quantizer as tqm" out of the test
helper and place it at the top of the module imports in
tests/unit/torch/quantization/test_mse_calibrator.py; update any references
inside the helper to use the top-level symbol `tqm`, and if you believe the
import must remain local due to a genuine circular or optional dependency, add a
short comment above it documenting that justification—otherwise keep it as a
standard module-level import to ensure import errors surface at test collection
time.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 930d2b15-6a15-40ad-8113-ec2383f013f5
📒 Files selected for processing (5)
tests/gpu/torch/quantization/test_nvfp4_fp8_sweep_kernel.pytests/unit/torch/quantization/test_config_validation.pytests/unit/torch/quantization/test_mse_calibrator.pytests/unit/torch/quantization/test_nvfp4_four_over_six.pytests/unit/torch/quantization/test_nvfp4_static_export_cpu.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/torch/quantization/test_nvfp4_four_over_six.py
| return q | ||
|
|
||
| def _captured_fp8_max(self, monkeypatch, four_over_six: bool) -> float: | ||
| import modelopt.torch.quantization.nn.modules.tensor_quantizer as tqm |
There was a problem hiding this comment.
Move the function-scoped import to module top.
Line 805 introduces an in-function import in a test helper without an explicit circular/optional dependency justification. This violates the test import convention and can hide import failures until runtime.
As per coding guidelines, imports in tests/**/*.py should be at module top unless there is a concrete, documented exception.
Suggested patch
@@
import torch
import modelopt.torch.quantization as mtq
+import modelopt.torch.quantization.nn.modules.tensor_quantizer as tqm
from modelopt.torch.quantization import calib
@@
def _captured_fp8_max(self, monkeypatch, four_over_six: bool) -> float:
- import modelopt.torch.quantization.nn.modules.tensor_quantizer as tqm
-
captured = {}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/torch/quantization/test_mse_calibrator.py` at line 805, Move the
function-scoped import "import
modelopt.torch.quantization.nn.modules.tensor_quantizer as tqm" out of the test
helper and place it at the top of the module imports in
tests/unit/torch/quantization/test_mse_calibrator.py; update any references
inside the helper to use the top-level symbol `tqm`, and if you believe the
import must remain local due to a genuine circular or optional dependency, add a
short comment above it documenting that justification—otherwise keep it as a
standard module-level import to ensure import errors surface at test collection
time.
Source: Coding guidelines
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
What does this PR do?
Four Over Six PTQ implementation for weight-only quantization. Four Over Six was used to produce the Nemotron 3 Ultra NVFP4 checkpoint. Also updates the Ultra PTQ example in the launcher to use this new 4/6 config
huggingface/nvidia/Nemotron-3-Ultra-550B-A55B/ptq/ultra-nvfp4-46-maxUsage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
New Features
Documentation
Tests
Chores