Skip to content

support Qwen3.5 quantization#1230

Open
deepindeed2022 wants to merge 3 commits intoNVIDIA:mainfrom
deepindeed2022:main
Open

support Qwen3.5 quantization#1230
deepindeed2022 wants to merge 3 commits intoNVIDIA:mainfrom
deepindeed2022:main

Conversation

@deepindeed2022
Copy link
Copy Markdown

@deepindeed2022 deepindeed2022 commented Apr 10, 2026

What does this PR do?

feature for qwen3.5 quantization

Usage

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat int4_awq --export_path Qwen3.5-0.8B-AWQ 

python3 examples/llm_ptq/hf_ptq.py --pyt_ckpt_path ./Qwen3.5-0.8B/ --qformat fp8 --export_path Qwen3.5-0.8B-AWQ 

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added Qwen3.5 recognition and post‑training quantization support (fp8 ✅, int4_awq ✅, w4a8_awq ✖️, int8_sq ✖️, nvfp4 ✖️), including targeted handling for hybrid‑attention models.
  • Documentation

    • Updated LLM and VLM PTQ support matrices to include Qwen3.5 compatibility.
  • Tests

    • Added tests validating Qwen3.5 hybrid‑attention quantization and selective exclusions for specific projection layers.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Qwen3.5 support: docs updated, model-name mapping extended, SV-D quant fusion mapping extended, quantizer-exclusion rules added for narrow hybrid-attention projections, test utilities for a tiny Qwen3.5, and a new unit test validating PTQ behavior for Qwen3.5 hybrid-attention models.

Changes

Cohort / File(s) Summary
Documentation
examples/llm_ptq/README.md, examples/vlm_ptq/README.md
Added Qwen3.5 rows to PTQ support matrices indicating supported/unsupported quant modes.
Quant config helpers
examples/llm_ptq/example_utils.py
build_quant_cfg() now appends quantizer-exclusion entries for parameters matching *in_proj_b* and *in_proj_a* when model_type == "qwen3_5".
Model name → type mapping
modelopt/torch/export/model_utils.py
Added Qwen3_5Moeqwen3_5moe and Qwen3_5qwen3_5 entries in MODEL_NAME_TO_TYPE.
Quantization fusion rules
modelopt/torch/export/quant_utils.py
Extended the _update_svdquant fusion mapping so the ("up_proj","down_proj") linear-to-linear scale adjustment also applies to Qwen3_5MLP.
Test utilities
tests/_test_utils/torch/transformers_models.py
Added guarded import for Qwen3_5TextConfig and get_tiny_qwen3_5(**config_kwargs) to build minimal Qwen3.5 models for tests (skips if transformers lacks the config).
Unit tests
tests/unit/torch/quantization/plugins/test_huggingface.py
Added test_qwen3_5_hybrid_attention_quantize (parameterized over FP8/INT4_AWQ) that quantizes a tiny Qwen3.5, runs calibration/inference, asserts logits, checks that hybrid-attention projections are quantized and that in_proj_a/in_proj_b modules are excluded.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'support Qwen3.5 quantization' directly summarizes the main objective of the changeset: adding quantization support for the Qwen3.5 model across multiple components (documentation, configuration, model utilities, and tests).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request does not introduce security anti-patterns: no unsafe torch.load() or numpy.load(), trust_remote_code properly exposed and defaulting to False, no eval()/exec() on external input, no nosec comments, no new PIP dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The test currently sets has_gdn_quantized /
has_attn_quantized based only on module name and presence of
module.weight_quantizer, which can give false positives when quantization is
disabled; update the loop over model.named_modules() to also check
module.weight_quantizer.is_enabled (or truthiness of that property) before
setting the flags and ensure the final assertions verify that the found modules
have weight_quantizer.is_enabled true (i.e., assert the quantizer is enabled for
"linear_attn.in_proj_qkv" and "self_attn.q_proj" modules).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b1ad8b23-7859-407c-8231-f295b28c6ba4

📥 Commits

Reviewing files that changed from the base of the PR and between 3baa2da and 5acee3e.

📒 Files selected for processing (7)
  • examples/llm_ptq/README.md
  • examples/llm_ptq/example_utils.py
  • examples/vlm_ptq/README.md
  • modelopt/torch/export/model_utils.py
  • modelopt/torch/export/quant_utils.py
  • tests/_test_utils/torch/transformers_models.py
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/unit/torch/quantization/plugins/test_huggingface.py (1)

286-293: ⚠️ Potential issue | 🟡 Minor

Strengthen positive quantization assertions to avoid false positives.

has_gdn_quantized / has_attn_quantized are currently set by name match + quantizer presence, even if quantization is disabled. Gate these checks on module.weight_quantizer.is_enabled.

Proposed fix
     for name, module in model.named_modules():
         if hasattr(module, "weight_quantizer") and hasattr(module, "weight"):
-            if "linear_attn.in_proj_qkv" in name:
+            if "linear_attn.in_proj_qkv" in name and module.weight_quantizer.is_enabled:
                 has_gdn_quantized = True
-            if "self_attn.q_proj" in name:
+            if "self_attn.q_proj" in name and module.weight_quantizer.is_enabled:
                 has_attn_quantized = True
#!/bin/bash
# Verify positive assertions currently don't require enabled quantizers.
rg -n -C2 'linear_attn\.in_proj_qkv|self_attn\.q_proj|is_enabled' tests/unit/torch/quantization/plugins/test_huggingface.py

Expected: the positive checks for linear_attn.in_proj_qkv and self_attn.q_proj should include is_enabled in the same condition.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 286 -
293, The current positive assertions set has_gdn_quantized/has_attn_quantized
based only on name and presence of module.weight_quantizer, which can produce
false positives when quantization is disabled; update the loop over
model.named_modules() so that when you detect the target names
("linear_attn.in_proj_qkv" and "self_attn.q_proj") you also check
module.weight_quantizer.is_enabled before setting has_gdn_quantized or
has_attn_quantized, i.e., require hasattr(module, "weight_quantizer") and
hasattr(module, "weight") and module.weight_quantizer.is_enabled when assigning
those flags, then keep the existing assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 286-293: The current positive assertions set
has_gdn_quantized/has_attn_quantized based only on name and presence of
module.weight_quantizer, which can produce false positives when quantization is
disabled; update the loop over model.named_modules() so that when you detect the
target names ("linear_attn.in_proj_qkv" and "self_attn.q_proj") you also check
module.weight_quantizer.is_enabled before setting has_gdn_quantized or
has_attn_quantized, i.e., require hasattr(module, "weight_quantizer") and
hasattr(module, "weight") and module.weight_quantizer.is_enabled when assigning
those flags, then keep the existing assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 99e37517-c133-410c-86b9-b5afc49ed662

📥 Commits

Reviewing files that changed from the base of the PR and between 95b3487 and f660a93.

📒 Files selected for processing (7)
  • examples/llm_ptq/README.md
  • examples/llm_ptq/example_utils.py
  • examples/vlm_ptq/README.md
  • modelopt/torch/export/model_utils.py
  • modelopt/torch/export/quant_utils.py
  • tests/_test_utils/torch/transformers_models.py
  • tests/unit/torch/quantization/plugins/test_huggingface.py
✅ Files skipped from review due to trivial changes (1)
  • modelopt/torch/export/model_utils.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/vlm_ptq/README.md
  • modelopt/torch/export/quant_utils.py
  • examples/llm_ptq/example_utils.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant