Skip to content

Conversation

@Fridah-nv
Copy link
Contributor

@Fridah-nv Fridah-nv commented Nov 27, 2025

What does this PR do?

Type of change: ?
new feature

Overview: ?
Support static block-wise MSE for NVFP4 weight quantization.
Add a FP4 triton kernel that take in scales for each block. It also quantizes the scales to FP8.

This PR does the following:

  1. Enable static NVFP4 implementation, i.e. block scales for weights are calculated during calibration and feed into fake quant kernels
    2.Extend mse_calibrate to support static NVFP4 with block scales searching by MSE and global scale set as MAX
    3.Refinements: calibrate weight quantizers only once during MSE calibration

Usage

Example config:

NVFP4_WEIGHT_MSE_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "static", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "enable": False,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
        "method": "mse",
        "step_size": 0.25,
        "start_multiplier": 0.25,
        "stop_multiplier": 2.0,
    },
}

NVFP4_WEIGHT_ACT_MSE_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "static", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
        "method": "mse",
        "step_size": 0.25,
        "start_multiplier": 0.25,
        "stop_multiplier": 2.0,
    },
}

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv self-assigned this Nov 27, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 27, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 87.03704% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.64%. Comparing base (f06c3f9) to head (76bd9b9).
⚠️ Report is 101 commits behind head on main.

Files with missing lines Patch % Lines
.../torch/quantization/nn/modules/tensor_quantizer.py 50.00% 3 Missing ⚠️
modelopt/torch/quantization/tensor_quant.py 70.00% 3 Missing ⚠️
modelopt/torch/quantization/model_calib.py 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #613      +/-   ##
==========================================
+ Coverage   74.57%   74.64%   +0.07%     
==========================================
  Files         183      192       +9     
  Lines       18412    19027     +615     
==========================================
+ Hits        13730    14202     +472     
- Misses       4682     4825     +143     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…nce; quant scale to FP8; rename static kernel

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv marked this pull request as ready for review January 6, 2026 21:17
@Fridah-nv Fridah-nv requested a review from a team as a code owner January 6, 2026 21:17
@Fridah-nv Fridah-nv requested a review from kaix-nv January 6, 2026 21:17
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv changed the title draft: Add per block MSE for NVFP4 and INT4 Add per block MSE for NVFP4 and INT4 Jan 6, 2026
@Fridah-nv Fridah-nv changed the title Add per block MSE for NVFP4 and INT4 Add per block MSE for NVFP4 Jan 7, 2026
@Fridah-nv Fridah-nv changed the title Add per block MSE for NVFP4 Add static per block MSE for NVFP4 weight Jan 7, 2026
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Contributor

@realAsma realAsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add step_size as well as mse_calib argument?

…nel launch func

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
…calibrate

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Copy link
Contributor

@realAsma realAsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!!

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv merged commit 18d9b1e into main Jan 13, 2026
27 checks passed
@Fridah-nv Fridah-nv deleted the fridah/block-mse branch January 13, 2026 22:44
jingyu-ml pushed a commit that referenced this pull request Jan 14, 2026
## What does this PR do?

**Type of change:** ? <!-- Use one of the following: Bug fix, new
feature, new example, new tests, documentation. -->
new feature

**Overview:** ?
Support static block-wise MSE for NVFP4 weight quantization.
Add a FP4 triton kernel that take in scales for each block. It also
quantizes the scales to FP8.

This PR does the following:
1. Enable static NVFP4 implementation, i.e. block scales for weights are
calculated during calibration and feed into fake quant kernels
2.Extend mse_calibrate to support static NVFP4 with block scales
searching by MSE and global scale set as MAX
3.Refinements: calibrate weight quantizers only once during MSE
calibration

## Usage
<!-- You can potentially add a usage example below. -->

Example config:
```python
NVFP4_WEIGHT_MSE_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "static", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "enable": False,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
        "method": "mse",
        "step_size": 0.25,
        "start_multiplier": 0.25,
        "stop_multiplier": 2.0,
    },
}

NVFP4_WEIGHT_ACT_MSE_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "static", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
        "method": "mse",
        "step_size": 0.25,
        "start_multiplier": 0.25,
        "stop_multiplier": 2.0,
    },
}

```

## Testing
<!-- Mention how have you tested your change if applicable. -->

## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->

- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes/No <!--- If No, explain
why. -->
- **Did you write any new necessary tests?**: Yes/No
- **Did you add or update any necessary documentation?**: Yes/No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
Yes/No <!--- Only for new features, API changes, critical bug fixes or
bw breaking changes. -->

## Additional Information
<!-- E.g. related issue. -->

---------

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Jingyu Xin <jingyux@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants