Fix gemma w4a8_awq recipe crashing export on multimodal checkpoints (NVBug 6294017) by Edwardf0t1 · Pull Request #1690 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-06-11T20:51:01Z

What does this PR do?

Type of change: Bug fix

Fixes the huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast recipe crashing at export on multimodal Gemma checkpoints (e.g. gemma-4-31B-it).

The recipe enables quantization with bare *weight_quantizer / *input_quantizer wildcards. On VLM Gemma these also match the SigLIP vision tower (model.vision_tower.*) and the vision embedding projection (model.embed_vision.*). The vision tower's MLP in-features (4304) are not a multiple of the INT4 block size (128), so INT4 weight packing at export hits a device-side index out of bounds assert in pack_int4_in_uint8 (quant_utils.py). Quantizing the vision branch is also accuracy-harmful (post-PTQ generation in the bug log is degenerate).

The fix appends *vision_tower* / *embed_vision* disable rules after the enables (later entries win), keeping the vision branch in BF16. This mirrors vision exclusions already shipping in the qwen3_5, nemotron_vl, and phi4mm recipes.

Usage

python examples/llm_ptq/hf_ptq.py --model <gemma-4-31B-it> --dataset cnn_dailymail \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 8 --calib_size 512 --export_path <out> --trust_remote_code

Testing

Reproduced the exact NVBug 6294017 scenario on the real gemma-4-31B-it checkpoint, with the repo's modelopt (0.45.0.dev159, this branch) and the requested container nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc18. Ran on RTX 6000 Ada (sm_89); the crash is a structural tensor-indexing bounds error in pack_int4_in_uint8 at export, which is architecture-independent, so it reproduces/verifies identically (a Blackwell run can be added if desired). Verified at both calib_size 4 (smoke) and calib_size 512 (full); identical structural results.

python hf_ptq.py --dataset cnn_dailymail --model gemma-4-31B-it \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 4 --calib_size 512 --export_path <out> --trust_remote_code

Check	Before (bug)	After (this PR)
Export / `pack_int4_in_uint8`	❌ `index out of bounds` device-side assert	✅ completes, exit 0, 0 crash signatures
`hf_quant_config.exclude_modules`	n/a (crashed)	✅ `lm_head, model.embed_vision, model.vision_tower`
Vision-branch quantized weights	(crashed on `intermediate_size=4304`)	✅ 0 / 353 scaled → kept BF16
Language-model quantized weights	n/a	✅ 410 / 772 scaled → `W4A8_AWQ`, group 128, KV FP8
Exported checkpoint	none	✅ ~18.3 GB (vs 62 GB BF16 source)

Root cause confirmed in-environment: model_type=gemma4, vision_config.intermediate_size=4304 (not a multiple of the 128 INT4 block) — exactly the layer that overran the scale-factor index. This verifies the crash fix and the layer-precision split; W4A8 accuracy is a separate matter (not assessed here).

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (text-only Gemma checkpoints are unaffected; the new rules only match vision modules that should never have been quantized)
If you copied code from any other sources or added a new PIP dependency: N/A
Did you write any new necessary tests?: N/A (recipe data fix; verified by a real PTQ export run)
Did you update Changelog?: N/A
Did you get Claude approval on this PR?: ❌ (pending)

Additional Information

NVBug 6294017. Reported on modelopt 0.45.0rc0, GB200, gemma-4-31B-it.

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Added README documenting Gemma 4 post-training quantization recipes and usage guidance.
New Features
- New Gemma 4 PTQ recipe: W4A8 weight quantization with FP8 KV-cache casting and specialized handling to keep vision components in BF16.
Bug Fixes
- Updated default quantizer rules to preserve the multimodal vision branch by default, preventing INT4 packing index-out-of-bounds issues.

codecov · 2026-06-11T20:59:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.09%. Comparing base (60b1af5) to head (0cf494b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1690   +/-   ##
=======================================
  Coverage   77.09%   77.09%           
=======================================
  Files         511      511           
  Lines       56173    56173           
=======================================
  Hits        43307    43307           
  Misses      12866    12866

Flag	Coverage Δ
unit	`54.34% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai · 2026-06-11T21:39:45Z

📝 Walkthrough

Walkthrough

Adds a Gemma 4 PTQ recipe (W4A8, AWQ-lite) with FP8 KV-cache casting and updates default disabled-quantizers to keep multimodal vision/embedding components unquantized; includes README documenting numerics and AWQ search differences.

Changes

Gemma 4 W4A8 PTQ Recipe and Documentation

Layer / File(s)	Summary
Multimodal vision quantizer defaults `modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml`	Disables `output.`, documents multimodal vision preservation, and keeps `embed_vision`, `vision_tower`, and `visual*` quantizers unquantized by default.
Gemma4 W4A8 PTQ recipe and docs `modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml`, `modelopt_recipes/huggingface/gemma4/ptq/README.md`	Adds PTQ recipe using `awq_lite` W4A8 with INT4 block packing, FP8 weight/input numerics, a `kv_fp8_cast` unit using constant amax (no KV calibration), and documents the recipe and vision-component exclusions.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1687: Overlapping changes to default_disabled_quantizers.yaml to keep multimodal vision quantizers disabled.
NVIDIA/Model-Optimizer#1691: Also modifies disable patterns for *embed_vision*, *vision_tower*, and *visual* to preserve vision/embedding BF16 behavior.

Suggested reviewers

kevalmorabia97
juhi10071998
sychen52
shengliangxu

🚥 Pre-merge checks | ✅ 6

✅ Passed checks (6 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main change: fixing a crash in the gemma w4a8_awq recipe when exporting multimodal checkpoints by excluding vision modules from quantization.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	Git diff shows only YAML/README files (no .py / dependency changes). Scanned changed contents for torch.load/allow_pickle/trust_remote_code/eval/exec/#nosec patterns—none found.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-gemma4-w4a8-awq-vision-exclude

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Edwardf0t1 · 2026-06-11T21:42:50Z

✅ Verified on real hardware

Reproduced the exact NVBug 6294017 scenario and confirmed the fix, using the real gemma-4-31B-it checkpoint, the repo's modelopt (0.45.0.dev159, this branch), and the requested container nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc18.

Environment note: run on RTX 6000 Ada (sm_89), not Blackwell. The bug is a structural tensor-indexing bounds error in pack_int4_in_uint8 at export, which is architecture-independent, so it reproduces/verifies identically. Container ships transformers 5.5.4 (gemma4 loads without a pin).

Command (matches the bug report)

python hf_ptq.py --dataset cnn_dailymail --model gemma-4-31B-it \
  --recipe modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml \
  --batch_size 4 --calib_size 512 --export_path <out> --trust_remote_code

Results (full calib_size 512 run)

Check	Before (bug)	After (this PR)
Export / `pack_int4_in_uint8`	❌ `index out of bounds` device-side assert	✅ completes, exit 0, 0 crash signatures
`hf_quant_config.exclude_modules`	n/a (crashed)	✅ `lm_head, model.embed_vision, model.vision_tower`
Vision-branch quantized weights	(would crash on `intermediate_size=4304`)	✅ 0 / 353 scaled → kept BF16
Language-model quantized weights	n/a	✅ 410 / 772 scaled → `W4A8_AWQ`, group 128, KV FP8
Exported checkpoint	none	✅ ~18.3 GB (vs 62 GB BF16 source)

Root cause confirmed in-environment: model_type=gemma4, vision_config.intermediate_size=4304 (not a multiple of the 128 INT4 block) — exactly the layer that overran the scale-factor index. A smoke run (calib_size 4) gave identical structural results.

Note: this verifies the crash fix and the layer-precision split. Accuracy of the W4A8 checkpoint is a separate matter (would need a real eval, e.g. MMLU/MMMU), not assessed here.

shengliangxu · 2026-06-11T22:57:03Z

 # Gemma-specific W4A8 AWQ PTQ recipe with FP8 KV-cache cast. Uses a coarser
 # optimal-scale search (awq_lite with alpha_step=1) to avoid overflow observed
 # in TRT-LLM kernels when using the default AWQ search on Gemma.
+#


gemma4 model type is not the same thing as gemma

please add new recipe under gemma4, you can check it at:

https://huggingface.co/google/gemma-4-31B-it/blob/main/config.json

cjluo-nv

Bot review — DM the bot to share feedback.

Small (+19/-0) additive recipe data fix for w4a8_awq-kv_fp8_cast.yaml: appends *vision_tower* / *embed_vision* enable: false rules after the bare *weight_quantizer/*input_quantizer enables and default_disabled_quantizers, keeping the SigLIP vision branch in BF16 to avoid the INT4 pack_int4_in_uint8 index-out-of-bounds crash on multimodal Gemma (NVBug 6294017).

What checks out:

Mirrors the established in-repo pattern exactly — qwen3_5 (w4a16_nvfp4-fp8_attn-kv_fp8_cast.quant_cfg.yaml), nemotron_vl, and phi4mm recipes all use trailing enable: false vision exclusions placed after default_disabled_quantizers, relying on the same "later entries win" override semantics.
The disable globs live in this single recipe (not the shared default_disabled_quantizers unit), so blast radius is limited to this recipe only — no risk of disabling vision quant in other presets.
License header unchanged (standard Apache-2.0/NVIDIA); no licensing concern. No prompt-injection in the untrusted content.

Why a human glance:

No automated test added. Sibling additive recipe variants are normally covered by a test_loader.py smoke test; consider adding one so the new rules' resolution is verified in CI.
Verified only by a manual export run on RTX 6000 Ada (sm_89), not the GB200 where the bug was originally reported. The author notes the crash is a structural tensor-indexing error and thus arch-independent, which is plausible, but a maintainer may want to confirm.

Note on the Slack instruction: only one PR is in scope here and no specific handle was provided, so I can't action a cross-PR tag from this review.

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review of the Gemma 4 vision-exclusion PTQ recipe fix (NVBug 6294017). Small additive change (+85/-0): a new huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml recipe + README. The recipe enables W4A8 AWQ with FP8 KV-cache cast and appends trailing *vision_tower* / *embed_vision* enable: false rules after the bare *weight_quantizer/*input_quantizer enables and default_disabled_quantizers, relying on the established "later wins" override semantics to keep the SigLIP vision branch in BF16 and avoid the INT4 pack_int4_in_uint8 index-out-of-bounds crash (vision MLP in-features 4304 not a multiple of the 128 block). Mirrors the qwen3_5 / nemotron_vl / phi4mm sibling recipes exactly; globs are recipe-local so blast radius is limited. License header matches the canonical LICENSE_HEADER; no licensing concern. No prompt injection in the untrusted content.

Previous-review status:

💬 shengliangxu flagged that gemma4 is a distinct model type and asked for a new recipe under gemma4/ rather than editing gemma/. Author replied "done" — confirmed: the recipe now lives in the new gemma4/ptq/ directory and the original gemma/ptq/w4a8_awq-kv_fp8_cast.yaml is unchanged. This critical/correctness concern is fully resolved.

Why a human glance:

💬 Author verified the exact NVBug scenario on the real gemma-4-31B-it checkpoint but only on RTX 6000 Ada (sm_89), not the GB200 where the bug was reported — argues the crash is a structural arch-independent tensor-indexing error, which is plausible but unconfirmed on the target arch.
No automated smoke test added. This matches the existing precedent (only general/ptq/* recipes are parametrized in tests/unit/recipe/test_loader.py; the sibling vision-exclusion recipes are likewise uncovered), so it's not a regression — but a maintainer with architectural context should sign off on the recipe data and the cross-arch claim.

…6293731, 6293762) (#1691) ### What does this PR do? Type of change: Bug fix Fixes two sglang deployment failures on multimodal Gemma (`gemma-4-31B-it`) caused by general PTQ presets leaking quantization into the SigLIP vision branch via broad wildcards: - **NVBug 6293731** — `general/ptq/fp8_default-kv_fp8`: the `w8a8_fp8_fp8` unit enables bare `*weight_quantizer` / `*input_quantizer`, which also match the vision tower (`model.vision_tower.*`, `model.visual.*`) and the vision embedding projection (`model.embed_vision.*`). The exported checkpoint deploys but emits **garbled text** in sglang. - **NVBug 6293762** — `general/ptq/nvfp4_mlp_only-kv_fp8`: the `*mlp*` enables also match the vision tower's block MLPs (`model.vision_tower.encoder.layers.*.mlp`), and an image request **crashes** the FP4 kernel at decode: `ValueError: too many values to unpack (expected 2)` in sglang's `modelopt_quant.py` `apply`. ### Fix Add `*embed_vision*` / `*vision_tower*` / `*visual*` disable rules to the shared `configs/ptq/units/default_disabled_quantizers` unit, alongside the existing `*router*` / `*lm_head*` entries. Both the composed `general/ptq/*` recipes **and** the `configs/ptq/presets/model/*` presets import this unit, so: - every general recipe (`fp8_default`, `nvfp4_default`, `nvfp4_mlp_only`, `nvfp4_omlp_only`, …) keeps the vision branch in BF16 by default — fixing the whole vision-overreach class, not just the two reported recipes; - the `test_general_ptq_yaml_matches_config_dicts` YAML↔preset parity test stays satisfied (both sides pick up the new entries from the one shared unit). The rules are **no-ops on text-only models** (nothing matches). A recipe that intentionally wants to quantize the vision branch can re-enable these after importing the unit. Files changed: - `modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml` (+14) ### Testing Re-export of `gemma-4-31B-it` with the affected recipes and re-deploy in sglang (the env from the bug reports: `lmsysorg/sglang:v0.5.12.post1`, GB200) to confirm fp8_default no longer garbles text and nvfp4_mlp_only no longer crashes on image requests. _(Results to be appended.)_ Unit-level: `tests/unit/recipe/test_loader.py::test_general_ptq_yaml_matches_config_dicts` (parity) passes for all four general presets. ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ (text-only checkpoints unaffected; new rules only match vision modules that should never have been quantized by a general recipe) - If you copied code from any other sources or added a new PIP dependency: N/A - Did you write any new necessary tests?: N/A (recipe data fix; covered by the existing parity test + verified by real PTQ export + sglang deploy) - Did you update Changelog?: N/A - Did you get Claude approval on this PR?: ❌ (pending) ### Additional Information NVBug 6293731 and 6293762. Reported on modelopt 0.45.0rc0, GB200, gemma-4-31B-it, sglang 0.5.12.post1. Tracked under OMNIML-5034. Companion to PR #1690 (same vision-overreach class on the gemma-specific `w4a8_awq` recipe, NVBug 6294017). 🤖 Generated with [Claude Code](https://claude.com/claude-code)  ## Summary by CodeRabbit * **Chores** * Updated quantization configuration to preserve BF16 precision for vision encoder components in multimodal models.  --------- Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…NVBug 6294017) The gemma `w4a8_awq-kv_fp8_cast` recipe enables quantization with bare `*weight_quantizer` / `*input_quantizer` wildcards. On multimodal Gemma checkpoints (e.g. gemma-4-31B-it) these also match the SigLIP vision tower (`model.vision_tower.*`) and the vision embedding projection (`model.embed_vision.*`). The vision tower's MLP in-features (4304) are not a multiple of the INT4 block size (128), so INT4 weight packing at export hits a device-side `index out of bounds` assert in `pack_int4_in_uint8`. Quantizing the vision branch is also accuracy-harmful. Add trailing `*vision_tower*` / `*embed_vision*` disable rules (placed after the enables so the disable wins), keeping the vision branch in BF16. Mirrors the vision exclusions already shipped in the qwen3_5 / nemotron_vl / phi4mm recipes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Per review (shengliangxu): gemma-4-31B-it is model_type=gemma4, a distinct multimodal architecture from the text-only `gemma` model type. The vision-branch exclusion belongs in a gemma4-specific recipe, not the gemma one (which is text-only and has no vision tower). - Revert modelopt_recipes/huggingface/gemma/ptq/w4a8_awq-kv_fp8_cast.yaml to its original (text-only) form. - Add modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml with the same awq_lite alpha_step=1 numerics plus `*vision_tower*` / `*embed_vision*` enable:false rules to keep the SigLIP vision branch in BF16 (fixes NVBug 6294017: INT4 pack_int4_in_uint8 index-out-of-bounds on the vision MLP whose in-features (4304) are not a multiple of the 128 block size). - Add modelopt_recipes/huggingface/gemma4/ptq/README.md. Verified: load_recipe resolves the new recipe (<builtin>/huggingface/gemma4/...) with the vision excludes present; the quantize block is identical to the previously hardware-verified fix (full calib_size 512 export on gemma-4-31B-it: no index-out-of-bounds, exclude_modules = lm_head, model.embed_vision*, model.vision_tower*; vision 0/353 quantized, LM 410/772 quantized W4A8_AWQ). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

…rlap Follow-up to #1691 (merged) and meenchen's qwen3.6 vision-exclusion addition, both of which landed `*vision_tower*` / `*visual*` in default_disabled_quantizers. - default_disabled_quantizers.yaml: remove the duplicate bare `*visual*` / `*vision_tower*` entries (qwen3.6) now that the documented block already disables `*vision_tower*` / `*visual*` / `*embed_vision*`. One source of truth. - gemma4 w4a8_awq recipe: drop the now-redundant explicit `*vision_tower*` / `*embed_vision*` excludes — they are inherited from the shared default_disabled_quantizers unit (imported last so its disables win). The recipe is now just the gemma-specific awq_lite alpha_step=1 numerics. - Update the gemma4 recipe comment / README to reflect the shared-unit source. Verified: load_recipe on the gemma4 recipe resolves `*vision_tower*` / `*visual*` / `*embed_vision*` as disabled (via the shared unit) with `*weight_quantizer` still enabled (INT4). Fixes NVBug 6294017. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml (1)
41-42: Clarify intent of quantizer_name: 'output.*' disable rule

quantizer_name entries are matched with fnmatch.fnmatch(name, key), so output.* disables quantizers whose module name starts with output. (distinct from *output_layer*, which matches the substring output_layer).

The same output.* exclusion appears in other PTQ recipes (modelopt_recipes/huggingface/step3p5/Step3.5-Flash/ptq/nvfp4-mlp-only.yaml and modelopt_recipes/general/ptq/int4_blockwise_weight_only.yaml), so it’s likely intentional rather than a merge artifact.

Add a short comment near this entry (or in the PR description) stating which model component/module naming pattern it targets and why it must be disabled.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml` around
lines 41 - 42, The rule disabling quantizers with quantizer_name 'output.*' is
ambiguous; add a short inline comment next to the entry explaining that
fnmatch.fnmatch is used, that 'output.*' matches module names starting with
"output." (not substring matches like "*output_layer*"), and state which model
component(s) this targets (e.g., final output projection or output module name
used by the model) and why those quantizers must be disabled; reference the
exact key 'quantizer_name: \'output.*\'' so reviewers can find it and mirror the
same comment in the other PTQ recipes that use this exclusion if applicable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml`:
- Around line 41-42: The rule disabling quantizers with quantizer_name
'output.*' is ambiguous; add a short inline comment next to the entry explaining
that fnmatch.fnmatch is used, that 'output.*' matches module names starting with
"output." (not substring matches like "*output_layer*"), and state which model
component(s) this targets (e.g., final output projection or output module name
used by the model) and why those quantizers must be disabled; reference the
exact key 'quantizer_name: \'output.*\'' so reviewers can find it and mirror the
same comment in the other PTQ recipes that use this exclusion if applicable.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5938de2d-431f-4fe2-8fc2-f0943732b72f

📥 Commits

Reviewing files that changed from the base of the PR and between 11aa3ac and 0cf494b.

📒 Files selected for processing (3)

modelopt_recipes/configs/ptq/units/default_disabled_quantizers.yaml
modelopt_recipes/huggingface/gemma4/ptq/README.md
modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt_recipes/huggingface/gemma4/ptq/README.md
modelopt_recipes/huggingface/gemma4/ptq/w4a8_awq-kv_fp8_cast.yaml

Edwardf0t1 requested a review from a team as a code owner June 11, 2026 20:51

Edwardf0t1 requested a review from cjluo-nv June 11, 2026 20:51

Edwardf0t1 mentioned this pull request Jun 11, 2026

Exclude multimodal vision branch from quantization by default (NVBug 6293731, 6293762) #1691

Merged

coderabbitai Bot approved these changes Jun 11, 2026

View reviewed changes

Edwardf0t1 requested review from juhi10071998, kevalmorabia97 and shengliangxu June 11, 2026 22:48

shengliangxu reviewed Jun 11, 2026

View reviewed changes

cjluo-nv reviewed Jun 11, 2026

View reviewed changes

Edwardf0t1 added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 12, 2026

cjluo-nv reviewed Jun 12, 2026

View reviewed changes

Edwardf0t1 and others added 3 commits June 12, 2026 10:26

Edwardf0t1 force-pushed the fix-gemma4-w4a8-awq-vision-exclude branch from 11aa3ac to 0cf494b Compare June 12, 2026 17:29

coderabbitai Bot reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gemma w4a8_awq recipe crashing export on multimodal checkpoints (NVBug 6294017)#1690

Fix gemma w4a8_awq recipe crashing export on multimodal checkpoints (NVBug 6294017)#1690
Edwardf0t1 wants to merge 3 commits into
mainfrom
fix-gemma4-w4a8-awq-vision-exclude

Edwardf0t1 commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Uh oh!

Edwardf0t1 commented Jun 11, 2026

Uh oh!

shengliangxu Jun 11, 2026

Uh oh!

Edwardf0t1 Jun 12, 2026

Uh oh!

cjluo-nv left a comment

Uh oh!

cjluo-nv left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Edwardf0t1 commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Uh oh!

Edwardf0t1 commented Jun 11, 2026

✅ Verified on real hardware

Command (matches the bug report)

Results (full calib_size 512 run)

Uh oh!

shengliangxu Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Edwardf0t1 commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading