Skip to content

Fix gemma w4a8_awq recipe crashing export on multimodal checkpoints (NVBug 6294017)#1690

Open
Edwardf0t1 wants to merge 3 commits into
mainfrom
fix-gemma4-w4a8-awq-vision-exclude
Open

Fix gemma w4a8_awq recipe crashing export on multimodal checkpoints (NVBug 6294017)#1690
Edwardf0t1 wants to merge 3 commits into
mainfrom
fix-gemma4-w4a8-awq-vision-exclude

Conversation

@Edwardf0t1

@Edwardf0t1 Edwardf0t1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: Bug fix

Fixes the huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast recipe crashing at export on multimodal Gemma checkpoints (e.g. gemma-4-31B-it).

The recipe enables quantization with bare *weight_quantizer / *input_quantizer wildcards. On VLM Gemma these also match the SigLIP vision tower (model.vision_tower.*) and the vision embedding projection (model.embed_vision.*). The vision tower's MLP in-features (4304) are not a multiple of the INT4 block size (128), so INT4 weight packing at export hits a device-side index out of bounds assert in pack_int4_in_uint8 (quant_utils.py). Quantizing the vision branch is also accuracy-harmful (post-PTQ generation in the bug log is degenerate).

The fix appends *vision_tower* / *embed_vision* disable rules after the enables (later entries win), keeping the vision branch in BF16. This mirrors vision exclusions already shipping in the qwen3_5, nemotron_vl, and phi4mm recipes.

Usage

python examples/llm_ptq/hf_ptq.py --model <gemma-4-31B-it> --dataset cnn_dailymail \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 8 --calib_size 512 --export_path <out> --trust_remote_code

Testing

Reproduced the exact NVBug 6294017 scenario on the real gemma-4-31B-it checkpoint, with the repo's modelopt (0.45.0.dev159, this branch) and the requested container nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc18. Ran on RTX 6000 Ada (sm_89); the crash is a structural tensor-indexing bounds error in pack_int4_in_uint8 at export, which is architecture-independent, so it reproduces/verifies identically (a Blackwell run can be added if desired). Verified at both calib_size 4 (smoke) and calib_size 512 (full); identical structural results.

python hf_ptq.py --dataset cnn_dailymail --model gemma-4-31B-it \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 4 --calib_size 512 --export_path <out> --trust_remote_code
Check Before (bug) After (this PR)
Export / pack_int4_in_uint8 index out of bounds device-side assert ✅ completes, exit 0, 0 crash signatures
hf_quant_config.exclude_modules n/a (crashed) lm_head, model.embed_vision*, model.vision_tower*
Vision-branch quantized weights (crashed on intermediate_size=4304) 0 / 353 scaled → kept BF16
Language-model quantized weights n/a 410 / 772 scaled → W4A8_AWQ, group 128, KV FP8
Exported checkpoint none ✅ ~18.3 GB (vs 62 GB BF16 source)

Root cause confirmed in-environment: model_type=gemma4, vision_config.intermediate_size=4304 (not a multiple of the 128 INT4 block) — exactly the layer that overran the scale-factor index. This verifies the crash fix and the layer-precision split; W4A8 accuracy is a separate matter (not assessed here).

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (text-only Gemma checkpoints are unaffected; the new rules only match vision modules that should never have been quantized)
  • If you copied code from any other sources or added a new PIP dependency: N/A
  • Did you write any new necessary tests?: N/A (recipe data fix; verified by a real PTQ export run)
  • Did you update Changelog?: N/A
  • Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

NVBug 6294017. Reported on modelopt 0.45.0rc0, GB200, gemma-4-31B-it.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Added README documenting Gemma 4 post-training quantization recipes and usage guidance.
  • New Features

    • New Gemma 4 PTQ recipe: W4A8 weight quantization with FP8 KV-cache casting and specialized handling to keep vision components in BF16.
  • Bug Fixes

    • Updated default quantizer rules to preserve the multimodal vision branch by default, preventing INT4 packing index-out-of-bounds issues.

@Edwardf0t1 Edwardf0t1 requested a review from a team as a code owner June 11, 2026 20:51
@Edwardf0t1 Edwardf0t1 requested a review from cjluo-nv June 11, 2026 20:51
@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.09%. Comparing base (60b1af5) to head (0cf494b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1690   +/-   ##
=======================================
  Coverage   77.09%   77.09%           
=======================================
  Files         511      511           
  Lines       56173    56173           
=======================================
  Hits        43307    43307           
  Misses      12866    12866           
Flag Coverage Δ
unit 54.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a Gemma 4 PTQ recipe (W4A8, AWQ-lite) with FP8 KV-cache casting and updates default disabled-quantizers to keep multimodal vision/embedding components unquantized; includes README documenting numerics and AWQ search differences.

Changes

Gemma 4 W4A8 PTQ Recipe and Documentation

Layer / File(s) Summary
Multimodal vision quantizer defaults
modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml
Disables output.*, documents multimodal vision preservation, and keeps *embed_vision*, *vision_tower*, and *visual* quantizers unquantized by default.
Gemma4 W4A8 PTQ recipe and docs
modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml, modelopt_recipes/huggingface/gemma4/ptq/README.md
Adds PTQ recipe using awq_lite W4A8 with INT4 block packing, FP8 weight/input numerics, a kv_fp8_cast unit using constant amax (no KV calibration), and documents the recipe and vision-component exclusions.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1687: Overlapping changes to default_disabled_quantizers.yaml to keep multimodal vision quantizers disabled.
  • NVIDIA/Model-Optimizer#1691: Also modifies disable patterns for *embed_vision*, *vision_tower*, and *visual* to preserve vision/embedding BF16 behavior.

Suggested reviewers

  • kevalmorabia97
  • juhi10071998
  • sychen52
  • shengliangxu
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: fixing a crash in the gemma w4a8_awq recipe when exporting multimodal checkpoints by excluding vision modules from quantization.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed Git diff shows only YAML/README files (no .py / dependency changes). Scanned changed contents for torch.load/allow_pickle/trust_remote_code/eval/exec/#nosec patterns—none found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-gemma4-w4a8-awq-vision-exclude

Comment @coderabbitai help to get the list of available commands and usage tips.

@Edwardf0t1

Copy link
Copy Markdown
Contributor Author

✅ Verified on real hardware

Reproduced the exact NVBug 6294017 scenario and confirmed the fix, using the real gemma-4-31B-it checkpoint, the repo's modelopt (0.45.0.dev159, this branch), and the requested container nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc18.

Environment note: run on RTX 6000 Ada (sm_89), not Blackwell. The bug is a structural tensor-indexing bounds error in pack_int4_in_uint8 at export, which is architecture-independent, so it reproduces/verifies identically. Container ships transformers 5.5.4 (gemma4 loads without a pin).

Command (matches the bug report)

python hf_ptq.py --dataset cnn_dailymail --model gemma-4-31B-it \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 4 --calib_size 512 --export_path <out> --trust_remote_code

Results (full calib_size 512 run)

Check Before (bug) After (this PR)
Export / pack_int4_in_uint8 index out of bounds device-side assert ✅ completes, exit 0, 0 crash signatures
hf_quant_config.exclude_modules n/a (crashed) lm_head, model.embed_vision*, model.vision_tower*
Vision-branch quantized weights (would crash on intermediate_size=4304) 0 / 353 scaled → kept BF16
Language-model quantized weights n/a 410 / 772 scaled → W4A8_AWQ, group 128, KV FP8
Exported checkpoint none ✅ ~18.3 GB (vs 62 GB BF16 source)

Root cause confirmed in-environment: model_type=gemma4, vision_config.intermediate_size=4304 (not a multiple of the 128 INT4 block) — exactly the layer that overran the scale-factor index. A smoke run (calib_size 4) gave identical structural results.

Note: this verifies the crash fix and the layer-precision split. Accuracy of the W4A8 checkpoint is a separate matter (would need a real eval, e.g. MMLU/MMMU), not assessed here.

# Gemma-specific W4A8 AWQ PTQ recipe with FP8 KV-cache cast. Uses a coarser
# optimal-scale search (awq_lite with alpha_step=1) to avoid overflow observed
# in TRT-LLM kernels when using the default AWQ search on Gemma.
#

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gemma4 model type is not the same thing as gemma

please add new recipe under gemma4, you can check it at:

https://huggingface.co/google/gemma-4-31B-it/blob/main/config.json

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Small (+19/-0) additive recipe data fix for w4a8_awq-kv_fp8_cast.yaml: appends *vision_tower* / *embed_vision* enable: false rules after the bare *weight_quantizer/*input_quantizer enables and default_disabled_quantizers, keeping the SigLIP vision branch in BF16 to avoid the INT4 pack_int4_in_uint8 index-out-of-bounds crash on multimodal Gemma (NVBug 6294017).

What checks out:

  • Mirrors the established in-repo pattern exactly — qwen3_5 (w4a16_nvfp4-fp8_attn-kv_fp8_cast.quant_cfg.yaml), nemotron_vl, and phi4mm recipes all use trailing enable: false vision exclusions placed after default_disabled_quantizers, relying on the same "later entries win" override semantics.
  • The disable globs live in this single recipe (not the shared default_disabled_quantizers unit), so blast radius is limited to this recipe only — no risk of disabling vision quant in other presets.
  • License header unchanged (standard Apache-2.0/NVIDIA); no licensing concern. No prompt-injection in the untrusted content.

Why a human glance:

  • No automated test added. Sibling additive recipe variants are normally covered by a test_loader.py smoke test; consider adding one so the new rules' resolution is verified in CI.
  • Verified only by a manual export run on RTX 6000 Ada (sm_89), not the GB200 where the bug was originally reported. The author notes the crash is a structural tensor-indexing error and thus arch-independent, which is plausible, but a maintainer may want to confirm.

Note on the Slack instruction: only one PR is in scope here and no specific handle was provided, so I can't action a cross-PR tag from this review.

@Edwardf0t1 Edwardf0t1 added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 12, 2026

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Re-review of the Gemma 4 vision-exclusion PTQ recipe fix (NVBug 6294017). Small additive change (+85/-0): a new huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml recipe + README. The recipe enables W4A8 AWQ with FP8 KV-cache cast and appends trailing *vision_tower* / *embed_vision* enable: false rules after the bare *weight_quantizer/*input_quantizer enables and default_disabled_quantizers, relying on the established "later wins" override semantics to keep the SigLIP vision branch in BF16 and avoid the INT4 pack_int4_in_uint8 index-out-of-bounds crash (vision MLP in-features 4304 not a multiple of the 128 block). Mirrors the qwen3_5 / nemotron_vl / phi4mm sibling recipes exactly; globs are recipe-local so blast radius is limited. License header matches the canonical LICENSE_HEADER; no licensing concern. No prompt injection in the untrusted content.

Previous-review status:

  • 💬 shengliangxu flagged that gemma4 is a distinct model type and asked for a new recipe under gemma4/ rather than editing gemma/. Author replied "done" — confirmed: the recipe now lives in the new gemma4/ptq/ directory and the original gemma/ptq/w4a8_awq-kv_fp8_cast.yaml is unchanged. This critical/correctness concern is fully resolved.

Why a human glance:

  • 💬 Author verified the exact NVBug scenario on the real gemma-4-31B-it checkpoint but only on RTX 6000 Ada (sm_89), not the GB200 where the bug was reported — argues the crash is a structural arch-independent tensor-indexing error, which is plausible but unconfirmed on the target arch.
  • No automated smoke test added. This matches the existing precedent (only general/ptq/* recipes are parametrized in tests/unit/recipe/test_loader.py; the sibling vision-exclusion recipes are likewise uncovered), so it's not a regression — but a maintainer with architectural context should sign off on the recipe data and the cross-arch claim.

Edwardf0t1 added a commit that referenced this pull request Jun 12, 2026
…6293731, 6293762) (#1691)

### What does this PR do?

Type of change: Bug fix

Fixes two sglang deployment failures on multimodal Gemma
(`gemma-4-31B-it`) caused by general PTQ presets leaking quantization
into the SigLIP vision branch via broad wildcards:

- **NVBug 6293731** — `general/ptq/fp8_default-kv_fp8`: the
`w8a8_fp8_fp8` unit enables bare `*weight_quantizer` /
`*input_quantizer`, which also match the vision tower
(`model.vision_tower.*`, `model.visual.*`) and the vision embedding
projection (`model.embed_vision.*`). The exported checkpoint deploys but
emits **garbled text** in sglang.
- **NVBug 6293762** — `general/ptq/nvfp4_mlp_only-kv_fp8`: the `*mlp*`
enables also match the vision tower's block MLPs
(`model.vision_tower.encoder.layers.*.mlp`), and an image request
**crashes** the FP4 kernel at decode: `ValueError: too many values to
unpack (expected 2)` in sglang's `modelopt_quant.py` `apply`.

### Fix

Add `*embed_vision*` / `*vision_tower*` / `*visual*` disable rules to
the shared `configs/ptq/units/default_disabled_quantizers` unit,
alongside the existing `*router*` / `*lm_head*` entries.

Both the composed `general/ptq/*` recipes **and** the
`configs/ptq/presets/model/*` presets import this unit, so:
- every general recipe (`fp8_default`, `nvfp4_default`,
`nvfp4_mlp_only`, `nvfp4_omlp_only`, …) keeps the vision branch in BF16
by default — fixing the whole vision-overreach class, not just the two
reported recipes;
- the `test_general_ptq_yaml_matches_config_dicts` YAML↔preset parity
test stays satisfied (both sides pick up the new entries from the one
shared unit).

The rules are **no-ops on text-only models** (nothing matches). A recipe
that intentionally wants to quantize the vision branch can re-enable
these after importing the unit.

Files changed:
- `modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml`
(+14)

### Testing

Re-export of `gemma-4-31B-it` with the affected recipes and re-deploy in
sglang (the env from the bug reports: `lmsysorg/sglang:v0.5.12.post1`,
GB200) to confirm fp8_default no longer garbles text and nvfp4_mlp_only
no longer crashes on image requests. _(Results to be appended.)_
Unit-level:
`tests/unit/recipe/test_loader.py::test_general_ptq_yaml_matches_config_dicts`
(parity) passes for all four general presets.

### Before your PR is "*Ready for review*"

- Is this change backward compatible?: ✅ (text-only checkpoints
unaffected; new rules only match vision modules that should never have
been quantized by a general recipe)
- If you copied code from any other sources or added a new PIP
dependency: N/A
- Did you write any new necessary tests?: N/A (recipe data fix; covered
by the existing parity test + verified by real PTQ export + sglang
deploy)
- Did you update Changelog?: N/A
- Did you get Claude approval on this PR?: ❌ (pending)

### Additional Information

NVBug 6293731 and 6293762. Reported on modelopt 0.45.0rc0, GB200,
gemma-4-31B-it, sglang 0.5.12.post1. Tracked under OMNIML-5034.
Companion to PR #1690 (same vision-overreach class on the gemma-specific
`w4a8_awq` recipe, NVBug 6294017).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated quantization configuration to preserve BF16 precision for
vision encoder components in multimodal models.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Edwardf0t1 and others added 3 commits June 12, 2026 10:26
…NVBug 6294017)

The gemma `w4a8_awq-kv_fp8_cast` recipe enables quantization with bare
`*weight_quantizer` / `*input_quantizer` wildcards. On multimodal Gemma
checkpoints (e.g. gemma-4-31B-it) these also match the SigLIP vision tower
(`model.vision_tower.*`) and the vision embedding projection
(`model.embed_vision.*`). The vision tower's MLP in-features (4304) are not a
multiple of the INT4 block size (128), so INT4 weight packing at export hits a
device-side `index out of bounds` assert in `pack_int4_in_uint8`. Quantizing
the vision branch is also accuracy-harmful.

Add trailing `*vision_tower*` / `*embed_vision*` disable rules (placed after the
enables so the disable wins), keeping the vision branch in BF16. Mirrors the
vision exclusions already shipped in the qwen3_5 / nemotron_vl / phi4mm recipes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Per review (shengliangxu): gemma-4-31B-it is model_type=gemma4, a distinct
multimodal architecture from the text-only `gemma` model type. The vision-branch
exclusion belongs in a gemma4-specific recipe, not the gemma one (which is
text-only and has no vision tower).

- Revert modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml to its
  original (text-only) form.
- Add modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml with the
  same awq_lite alpha_step=1 numerics plus `*vision_tower*` / `*embed_vision*`
  enable:false rules to keep the SigLIP vision branch in BF16 (fixes NVBug
  6294017: INT4 pack_int4_in_uint8 index-out-of-bounds on the vision MLP whose
  in-features (4304) are not a multiple of the 128 block size).
- Add modelopt_recipes/huggingface/gemma4/ptq/README.md.

Verified: load_recipe resolves the new recipe (<builtin>/huggingface/gemma4/...)
with the vision excludes present; the quantize block is identical to the
previously hardware-verified fix (full calib_size 512 export on gemma-4-31B-it:
no index-out-of-bounds, exclude_modules = lm_head, model.embed_vision*,
model.vision_tower*; vision 0/353 quantized, LM 410/772 quantized W4A8_AWQ).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
…rlap

Follow-up to #1691 (merged) and meenchen's qwen3.6 vision-exclusion addition,
both of which landed `*vision_tower*` / `*visual*` in default_disabled_quantizers.

- default_disabled_quantizers.yaml: remove the duplicate bare `*visual*` /
  `*vision_tower*` entries (qwen3.6) now that the documented block already
  disables `*vision_tower*` / `*visual*` / `*embed_vision*`. One source of truth.
- gemma4 w4a8_awq recipe: drop the now-redundant explicit `*vision_tower*` /
  `*embed_vision*` excludes — they are inherited from the shared
  default_disabled_quantizers unit (imported last so its disables win). The
  recipe is now just the gemma-specific awq_lite alpha_step=1 numerics.
- Update the gemma4 recipe comment / README to reflect the shared-unit source.

Verified: load_recipe on the gemma4 recipe resolves `*vision_tower*` /
`*visual*` / `*embed_vision*` as disabled (via the shared unit) with
`*weight_quantizer` still enabled (INT4). Fixes NVBug 6294017.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 force-pushed the fix-gemma4-w4a8-awq-vision-exclude branch from 11aa3ac to 0cf494b Compare June 12, 2026 17:29

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml (1)

41-42: Clarify intent of quantizer_name: 'output.*' disable rule

  • quantizer_name entries are matched with fnmatch.fnmatch(name, key), so output.* disables quantizers whose module name starts with output. (distinct from *output_layer*, which matches the substring output_layer).
  • The same output.* exclusion appears in other PTQ recipes (modelopt_recipes/huggingface/step3p5/Step3.5-Flash/ptq/nvfp4-mlp-only.yaml and modelopt_recipes/general/ptq/int4_blockwise_weight_only.yaml), so it’s likely intentional rather than a merge artifact.
  • Add a short comment near this entry (or in the PR description) stating which model component/module naming pattern it targets and why it must be disabled.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml` around
lines 41 - 42, The rule disabling quantizers with quantizer_name 'output.*' is
ambiguous; add a short inline comment next to the entry explaining that
fnmatch.fnmatch is used, that 'output.*' matches module names starting with
"output." (not substring matches like "*output_layer*"), and state which model
component(s) this targets (e.g., final output projection or output module name
used by the model) and why those quantizers must be disabled; reference the
exact key 'quantizer_name: \'output.*\'' so reviewers can find it and mirror the
same comment in the other PTQ recipes that use this exclusion if applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml`:
- Around line 41-42: The rule disabling quantizers with quantizer_name
'output.*' is ambiguous; add a short inline comment next to the entry explaining
that fnmatch.fnmatch is used, that 'output.*' matches module names starting with
"output." (not substring matches like "*output_layer*"), and state which model
component(s) this targets (e.g., final output projection or output module name
used by the model) and why those quantizers must be disabled; reference the
exact key 'quantizer_name: \'output.*\'' so reviewers can find it and mirror the
same comment in the other PTQ recipes that use this exclusion if applicable.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5938de2d-431f-4fe2-8fc2-f0943732b72f

📥 Commits

Reviewing files that changed from the base of the PR and between 11aa3ac and 0cf494b.

📒 Files selected for processing (3)
  • modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml
  • modelopt_recipes/huggingface/gemma4/ptq/README.md
  • modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt_recipes/huggingface/gemma4/ptq/README.md
  • modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants