support Qwen3.5 quantization by deepindeed2022 · Pull Request #1230 · NVIDIA/Model-Optimizer

deepindeed2022 · 2026-04-10T08:36:23Z

What does this PR do?

feature for qwen3.5 quantization

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat int4_awq --export_path Qwen3.5-0.8B-AWQ 

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat fp8 --export_path Qwen3.5-0.8B-AWQ

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added Qwen3.5 recognition and post‑training quantization support (fp8 ✅, int4_awq ✅, w4a8_awq ✖️, int8_sq ✖️, nvfp4 ✖️), including targeted handling for hybrid‑attention models.
Documentation
- Updated LLM and VLM PTQ support matrices to include Qwen3.5 compatibility.
Tests
- Added tests validating Qwen3.5 hybrid‑attention quantization and selective exclusions for specific projection layers.

copy-pr-bot · 2026-04-10T08:36:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-10T08:36:37Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Qwen3.5 support: docs updated, model-name mapping extended, SV-D quant fusion mapping extended, quantizer-exclusion rules added for narrow hybrid-attention projections, test utilities for a tiny Qwen3.5, and a new unit test validating PTQ behavior for Qwen3.5 hybrid-attention models.

Changes

Cohort / File(s)	Summary
Documentation `examples/llm_ptq/README.md`, `examples/vlm_ptq/README.md`	Added Qwen3.5 rows to PTQ support matrices indicating supported/unsupported quant modes.
Quant config helpers `examples/llm_ptq/example_utils.py`	`build_quant_cfg()` now appends quantizer-exclusion entries for parameters matching `in_proj_b` and `in_proj_a` when `model_type == "qwen3_5"`.
Model name → type mapping `modelopt/torch/export/model_utils.py`	Added `Qwen3_5Moe` → `qwen3_5moe` and `Qwen3_5` → `qwen3_5` entries in `MODEL_NAME_TO_TYPE`.
Quantization fusion rules `modelopt/torch/export/quant_utils.py`	Extended the `_update_svdquant` fusion mapping so the `("up_proj","down_proj")` linear-to-linear scale adjustment also applies to `Qwen3_5MLP`.
Test utilities `tests/_test_utils/torch/transformers_models.py`	Added guarded import for `Qwen3_5TextConfig` and `get_tiny_qwen3_5(**config_kwargs)` to build minimal Qwen3.5 models for tests (skips if transformers lacks the config).
Unit tests `tests/unit/torch/quantization/plugins/test_huggingface.py`	Added `test_qwen3_5_hybrid_attention_quantize` (parameterized over FP8/INT4_AWQ) that quantizes a tiny Qwen3.5, runs calibration/inference, asserts logits, checks that hybrid-attention projections are quantized and that `in_proj_a`/`in_proj_b` modules are excluded.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'support Qwen3.5 quantization' directly summarizes the main objective of the changeset: adding quantization support for the Qwen3.5 model across multiple components (documentation, configuration, model utilities, and tests).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	The pull request does not introduce security anti-patterns: no unsafe torch.load() or numpy.load(), trust_remote_code properly exposed and defaulting to False, no eval()/exec() on external input, no nosec comments, no new PIP dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The test currently sets has_gdn_quantized /
has_attn_quantized based only on module name and presence of
module.weight_quantizer, which can give false positives when quantization is
disabled; update the loop over model.named_modules() to also check
module.weight_quantizer.is_enabled (or truthiness of that property) before
setting the flags and ensure the final assertions verify that the found modules
have weight_quantizer.is_enabled true (i.e., assert the quantizer is enabled for
"linear_attn.in_proj_qkv" and "self_attn.q_proj" modules).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1ad8b23-7859-407c-8231-f295b28c6ba4

📥 Commits

Reviewing files that changed from the base of the PR and between 3baa2da and 5acee3e.

📒 Files selected for processing (7)

examples/llm_ptq/README.md
examples/llm_ptq/example_utils.py
examples/vlm_ptq/README.md
modelopt/torch/export/model_utils.py
modelopt/torch/export/quant_utils.py
tests/_test_utils/torch/transformers_models.py
tests/unit/torch/quantization/plugins/test_huggingface.py

tests/unit/torch/quantization/plugins/test_huggingface.py

coderabbitai

♻️ Duplicate comments (1)

tests/unit/torch/quantization/plugins/test_huggingface.py (1)

286-293: ⚠️ Potential issue | 🟡 Minor

Strengthen positive quantization assertions to avoid false positives.

has_gdn_quantized / has_attn_quantized are currently set by name match + quantizer presence, even if quantization is disabled. Gate these checks on module.weight_quantizer.is_enabled.

Proposed fix

     for name, module in model.named_modules():
         if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
-            if "linear_attn.in_proj_qkv" in name:
+            if "linear_attn.in_proj_qkv" in name and module.weight_quantizer.is_enabled:
                 has_gdn_quantized = True
-            if "self_attn.q_proj" in name:
+            if "self_attn.q_proj" in name and module.weight_quantizer.is_enabled:
                 has_attn_quantized = True

#!/bin/bash
# Verify positive assertions currently don't require enabled quantizers.
rg -n -C2 'linear_attn\.in_proj_qkv|self_attn\.q_proj|is_enabled' tests/unit/torch/quantization/plugins/test_huggingface.py

Expected: the positive checks for linear_attn.in_proj_qkv and self_attn.q_proj should include is_enabled in the same condition.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 286 -
293, The current positive assertions set has_gdn_quantized/has_attn_quantized
based only on name and presence of module.weight_quantizer, which can produce
false positives when quantization is disabled; update the loop over
model.named_modules() so that when you detect the target names
("linear_attn.in_proj_qkv" and "self_attn.q_proj") you also check
module.weight_quantizer.is_enabled before setting has_gdn_quantized or
has_attn_quantized, i.e., require hasattr(module, "weight_quantizer") and
hasattr(module, "weight") and module.weight_quantizer.is_enabled when assigning
those flags, then keep the existing assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The current positive assertions set
has_gdn_quantized/has_attn_quantized based only on name and presence of
module.weight_quantizer, which can produce false positives when quantization is
disabled; update the loop over model.named_modules() so that when you detect the
target names ("linear_attn.in_proj_qkv" and "self_attn.q_proj") you also check
module.weight_quantizer.is_enabled before setting has_gdn_quantized or
has_attn_quantized, i.e., require hasattr(module, "weight_quantizer") and
hasattr(module, "weight") and module.weight_quantizer.is_enabled when assigning
those flags, then keep the existing assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99e37517-c133-410c-86b9-b5afc49ed662

📥 Commits

Reviewing files that changed from the base of the PR and between 95b3487 and f660a93.

📒 Files selected for processing (7)

examples/llm_ptq/README.md
examples/llm_ptq/example_utils.py
examples/vlm_ptq/README.md
modelopt/torch/export/model_utils.py
modelopt/torch/export/quant_utils.py
tests/_test_utils/torch/transformers_models.py
tests/unit/torch/quantization/plugins/test_huggingface.py

✅ Files skipped from review due to trivial changes (1)

modelopt/torch/export/model_utils.py

🚧 Files skipped from review as they are similar to previous changes (3)

examples/vlm_ptq/README.md
modelopt/torch/export/quant_utils.py
examples/llm_ptq/example_utils.py

deepindeed2022 requested review from a team as code owners April 10, 2026 08:36

deepindeed2022 requested review from Edwardf0t1 and yueshen2016 April 10, 2026 08:36

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

tests/unit/torch/quantization/plugins/test_huggingface.py Show resolved Hide resolved

support Qwen3.5 quantization

f660a93

deepindeed2022 force-pushed the main branch from 95b3487 to f660a93 Compare April 13, 2026 01:52

fix qwen3.5 ut

d2bb7ad

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

deepindeed2022 force-pushed the main branch from fb77f21 to a9e38cf Compare April 13, 2026 03:20

add docstring for Docstring coverage

477ee77

deepindeed2022 force-pushed the main branch from a9e38cf to 477ee77 Compare April 13, 2026 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Qwen3.5 quantization#1230

support Qwen3.5 quantization#1230
deepindeed2022 wants to merge 3 commits intoNVIDIA:mainfrom
deepindeed2022:main

deepindeed2022 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deepindeed2022 commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

deepindeed2022 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading