refactor(examples): consolidate vlm_ptq into llm_ptq#1705
Conversation
VLM PTQ already ran entirely through examples/llm_ptq (the vlm_ptq shell script sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), so the vlm_ptq example was effectively a thin, partially-broken wrapper. Make llm_ptq the single source of truth for both LLM and VLM PTQ: - Add --vlm and --calib_with_images flags to scripts/parser.sh and scripts/huggingface_example.sh. --vlm bootstraps VILA deps and runs the TRT-LLM multimodal quickstart as the deploy smoke test. - Add examples/llm_ptq/requirements-vila.txt (the vlm_ptq script referenced a requirements-vila.txt that never existed in the repo). - Document the VLM support matrix and --vlm workflow in llm_ptq/README.md. Deprecate examples/vlm_ptq: - Replace its huggingface_example.sh with a shim that warns and forwards to the llm_ptq script with --vlm (backward compatible). - Turn its README into a redirect/migration notice. - Repoint root README VLM links and add a CHANGELOG deprecation entry. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConsolidates ChangesVLM PTQ Consolidation into LLM PTQ
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 262-263: The shell invocation expands QUICK_START_MULTIMODAL and
SAVE_PATH unquoted which breaks when paths contain spaces; update the python3
command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and
the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap
QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other
path-like variables used in the alternate branch at line 267 so arguments aren’t
split.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6149b08c-a48a-4a84-8cbe-8d1efb8b1ea6
📒 Files selected for processing (8)
CHANGELOG.rstREADME.mdexamples/llm_ptq/README.mdexamples/llm_ptq/requirements-vila.txtexamples/llm_ptq/scripts/huggingface_example.shexamples/llm_ptq/scripts/parser.shexamples/vlm_ptq/README.mdexamples/vlm_ptq/scripts/huggingface_example.sh
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1705 +/- ##
===========================================
+ Coverage 58.45% 76.93% +18.47%
===========================================
Files 510 511 +1
Lines 56271 56460 +189
===========================================
+ Hits 32896 43438 +10542
+ Misses 23375 13022 -10353
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Consolidates examples/vlm_ptq into examples/llm_ptq: adds --vlm/--calib_with_images to parser.sh and huggingface_example.sh, moves the VILA bootstrap + a (previously missing) requirements-vila.txt into llm_ptq, and replaces the vlm_ptq script with an exec shim that forwards --vlm "$@". Small, net-negative diff (+142/-192).
Design review: the "design review required" gate fired only because the change spans ≥5 directories, but almost all of that is README/CHANGELOG edits. This is a deprecation/de-duplication of a thin wrapper (the old vlm_ptq script already sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), not a new subsystem/abstraction — it reuses the existing parser/script pattern rather than introducing a second one. The PR body justifies the consolidation well. No new framework concern.
Correctness: verified the consolidation is faithful — VILA version check/clone block, requirements-vila.txt reference (now points to llm_ptq and the file actually exists, fixing the prior broken reference), and the multimodal-quickstart deploy smoke test all match the old behavior, gated behind $VLM. Anchor links in the top-level README and the migrated vlm_ptq/README.md match the new headings. No licensing changes (the new requirements file is just a transformers<=4.50.0 pin). No prompt-injection in the untrusted blocks.
Why nudge rather than approve:
- No tests directly exercise the new
--vlmpath throughexamples/llm_ptq. Coverage is currently indirect viatests/examples/vlm_ptq/test_qwen_vl.py→ the shim →llm_ptq --vlm. The PR's own follow-up plan removesexamples/vlm_ptqand its CI matrix entry, at which point that indirect coverage disappears unless a directrun_llm_ptq_command(..., vlm=True)test is added. Worth a maintainer deciding whether to add direct coverage now. - The VLM deploy smoke test (multimodal quickstart) is only reached for the
fp8path;int8_sq/non-Blackwellnvfp4exit early before the deploy block. This matches old behavior (not a regression) but is worth confirming is intended.
| LLMs — the language model is quantized while the vision encoder is kept in high precision. Pass | ||
| `--vlm` to the shell script (see [VLM quantization](#vlm-quantization)). | ||
|
|
||
| | Model | fp8 | int8_sq<sup>1</sup> | int4_awq | w4a8_awq<sup>2</sup> | nvfp4<sup>3</sup> | |
There was a problem hiding this comment.
Can we merge this list to https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/llm_ptq#hugging-face-supported-models?
There was a problem hiding this comment.
Do we want to drop Vila model support? ModelOpt min transformers is 4.56 so we cannot continue guaranteeing it works with 4.50
There was a problem hiding this comment.
Sounds good - dropped Vila support.
There was a problem hiding this comment.
Do we also want to rename examples/llm_ptq to examples/hf_ptq?
There was a problem hiding this comment.
this is a good idea. Though not sure if it will be a breaking change
There was a problem hiding this comment.
Can we leave a symlink from examples/llm_ptq/ to new examples/hf_ptq/ directory so previous path still remains valid and then we remove the symlink folder after few releases?
There was a problem hiding this comment.
I think that probably works, but I would defer it to a follow-up PR, as we focus on soft deprecation of vlm_ptq here and the renaming would require changes in many other places.
There was a problem hiding this comment.
We also need to merge tests and CI jobs:
tests/examples/vlm_ptqmerged intotests/examples/llm_ptq- Remove
vlm_ptqtest job from https://github.com/NVIDIA/Model-Optimizer/blob/main/.github/workflows/example_tests.yml
… feedback) Address PR review feedback on the vlm_ptq -> llm_ptq consolidation: - Drop VILA/NVILA support: its modeling code requires transformers<=4.50.0, which conflicts with ModelOpt's minimum transformers version. Remove the VILA bootstrap (repo clone, requirements-vila.txt) and the VILA loading paths in example_utils.py. - Merge the VLM support matrix into the main "Hugging Face Supported Models" table (rows tagged (VLM)); replace the separate VLM subsection with a note. - Move the VLM example test into tests/examples/llm_ptq via run_llm_ptq_command(..., vlm=True) for direct --vlm coverage; drop the vlm_ptq CI matrix entries and remove run_vlm_ptq_command. - Quote smoke-test paths in huggingface_example.sh (CodeRabbit nit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
I think so, but we dropped VILA support. See discussions in #1705 (comment) |
…nto-llm-ptq Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> # Conflicts: # README.md
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/example_tests.yml:
- Line 58: The workflow matrix in the example_tests.yml file at lines 58 and 72
currently only includes llm_ptq, which removes CI coverage for the deprecated
vlm_ptq forwarding shim. Since vlm_ptq is still marked for backward
compatibility during a deprecation window and has no other CI coverage, you must
either add vlm_ptq back to the example matrix at both line 58 and line 72 (the
sibling location), or create a separate standalone smoke test for the deprecated
shim, or remove the shim entirely if the deprecation period has ended. Choose
the appropriate approach based on the deprecation timeline for this component.
In `@examples/llm_ptq/example_utils.py`:
- Around line 656-665: The `has_pack_quantized_config` function assumes
`quantization_config` is always a dict by calling `.get()` on it, but in Hugging
Face configs it can be either a dict or a config object. Follow the pattern used
in `get_original_hf_quant_method()` which handles both cases, applying the same
defensive approach to both the top-level `quantization_config` check and the
nested `text_config.quantization_config` check. Use appropriate type checking
and attribute access methods to safely retrieve the "format" value regardless of
whether it is stored as a dict key or an object attribute.
- Around line 706-708: The code accesses hf_config.architectures[0] without
verifying that the architectures list is not empty. Even though there may be a
check for None earlier in the code, an empty list would still cause an
IndexError at this index access. Add a length check to ensure
hf_config.architectures has at least one element before accessing the first
element at index 0, and handle the case where the list is empty appropriately
(either by raising a more informative error or providing a fallback value).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05235eca-5c7f-432d-a1f9-bc1ce79cc3fd
📒 Files selected for processing (10)
.github/workflows/example_tests.ymlCHANGELOG.rstREADME.mdexamples/llm_ptq/README.mdexamples/llm_ptq/example_utils.pyexamples/llm_ptq/scripts/huggingface_example.shexamples/vlm_ptq/README.mdtests/_test_utils/examples/run_command.pytests/examples/llm_ptq/test_vlm_ptq.pytests/examples/vlm_ptq/_extensions/test_torch_extensions.py
💤 Files with no reviewable changes (1)
- tests/examples/vlm_ptq/_extensions/test_torch_extensions.py
✅ Files skipped from review due to trivial changes (2)
- CHANGELOG.rst
- examples/vlm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- README.md
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Clean deprecation/consolidation refactor (+219/-346, net negative) folding examples/vlm_ptq into examples/llm_ptq via a --vlm flag and a forwarding shim. Design-review gate fired only on directory count; this reuses the existing parser.sh/huggingface_example.sh pattern rather than introducing a new subsystem (maintainer cjluo-nv concurred), so the design question is settled. README anchors (#hugging-face-supported-models, #vlm-quantization, #mxfp4--nvfp4-cast-for-gpt-oss) all resolve correctly; no licensing changes. No prompt injection in the untrusted blocks.
Status of prior review comments:
- 💬 Author replied (meenchen's "functionally equivalent?"): equivalent except VILA/NVILA support intentionally dropped (transformers<=4.50 conflicts with ModelOpt's min). Confirmed — VILA bootstrap,
requirements-vila.txt, andexample_utils.pyVILA paths are all removed and the CHANGELOG documents the drop. Resolves kevalmorabia97's "drop VILA?" comment too. - 💬 CodeRabbit "quote smoke-test paths" — addressed in commit 1b66de7 (
"$QUICK_START_MULTIMODAL","$SAVE_PATH"); verified in current file. - 💬 llm_ptq→hf_ptq rename / symlink (kevalmorabia97) — author deferred to a follow-up PR; reasonable.
Why nudge rather than approve:
- CI coverage for the deprecated
vlm_ptqshim (CodeRabbit, Major): the shim itself is no longer exercised in CI, but the consolidated--vlmpath now has direct coverage viatests/examples/llm_ptq/test_vlm_ptq.py→run_llm_ptq_command(..., vlm=True)in thellm_ptqmatrix job. So the functionality is tested; the thin forwarding shim is not. Acceptable for a soft deprecation, but worth a maintainer confirming they're comfortable with the shim being untested for its lifetime. has_pack_quantized_configassumesquantization_configis a dict (.get()) and the finalelsebranch accesseshf_config.architectures[0]without an empty-list guard (CodeRabbit). Both are pre-existing and were only de-indented (moved out of the old VILA branch) by this PR, not regressed here — but since this refactor touches the surrounding code, a maintainer may want to opportunistically harden them to match the dict/object handling already used inget_original_hf_quant_method().
No GPU-validated run of the consolidated --vlm path is shown (CPU bash -n + pre-commit only), so a maintainer with the Qwen-VL test env should confirm the end-to-end path before merge.
kevalmorabia97
left a comment
There was a problem hiding this comment.
LGTM. Please make sure to also add the llm_ptq to hf_ptq directory rename in 0.46 release so both changes are part of same release
…ack) Address CodeRabbit findings on code that this PR de-indented out of the old VILA branch: - has_pack_quantized_config: handle quantization_config stored as either a dict or a config object (mirrors get_original_hf_quant_method), avoiding an AttributeError on object-style configs. Also dedupes the top-level / nested text_config checks. - Guard against an empty hf_config.architectures list before indexing [0] to avoid an IndexError with a clearer message. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
What does this PR do?
Type of change: refactor / deprecation (examples)
examples/vlm_ptqwas effectively a thin wrapper overexamples/llm_ptq: itsscripts/huggingface_example.shalready sourcedllm_ptq/scripts/parser.shandcalled
llm_ptq/hf_ptq.py, and all the actual VLM logic (vision-tower exclusion,--calib_with_images, Nemotron VL calibration, VILA loading, multimodal export)already lives under
llm_ptq. The wrapper also referenced arequirements-vila.txtthat did not exist in the repo.This PR makes
llm_ptqthe single source of truth for both LLM and VLM PTQ anddeprecates
vlm_ptq.llm_ptq(canonical):--vlmand--calib_with_imagesflags toscripts/parser.shandscripts/huggingface_example.sh.--vlmbootstraps VILA dependencies and runsthe TensorRT-LLM multimodal quickstart as the deploy smoke test (instead of the
text-only
run_tensorrt_llm.py).examples/llm_ptq/requirements-vila.txt(fixes the previously brokenreference).
--vlmworkflow inREADME.md.vlm_ptq(deprecated):scripts/huggingface_example.shwith a shim that prints a deprecationwarning and forwards to the
llm_ptqscript with--vlm.README.mdinto a redirect/migration notice.README.mdVLM links and add aCHANGELOG.rstdeprecation entry.Usage
Testing
bash -nsyntax check on the modifiedparser.sh,llm_ptqscript, and thevlm_ptqshim.pre-commit run --files <changed files>passes.tests/examples/vlm_ptq/test_qwen_vl.pyviarun_vlm_ptq_command) still exercises the path end-to-end through thedeprecation shim, which forwards to the consolidated
llm_ptqscript.Before your PR is "Ready for review"
vlm_ptqentry point still works via a forwarding shim)CONTRIBUTING.md: N/AAdditional Information
Follow-up (later release): remove the
examples/vlm_ptqdirectory and its CImatrix entry once external references have migrated.
Summary by CodeRabbit
Release Notes
New Features
examples/llm_ptqvia a--vlmflag.--calib_with_images, including VLM multimodal smoke-test coverage.Deprecations
examples/vlm_ptqis deprecated; it now forwards to theexamples/llm_ptq --vlmflow with a warning.examples/llm_ptqdue to a model dependency compatibility conflict.Documentation
Tests / CI
vlm_ptqexample.