Skip to content

refactor(examples): consolidate vlm_ptq into llm_ptq#1705

Open
Edwardf0t1 wants to merge 4 commits into
mainfrom
consolidate-vlm-ptq-into-llm-ptq
Open

refactor(examples): consolidate vlm_ptq into llm_ptq#1705
Edwardf0t1 wants to merge 4 commits into
mainfrom
consolidate-vlm-ptq-into-llm-ptq

Conversation

@Edwardf0t1

@Edwardf0t1 Edwardf0t1 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: refactor / deprecation (examples)

examples/vlm_ptq was effectively a thin wrapper over examples/llm_ptq: its
scripts/huggingface_example.sh already sourced llm_ptq/scripts/parser.sh and
called llm_ptq/hf_ptq.py, and all the actual VLM logic (vision-tower exclusion,
--calib_with_images, Nemotron VL calibration, VILA loading, multimodal export)
already lives under llm_ptq. The wrapper also referenced a
requirements-vila.txt that did not exist in the repo.

This PR makes llm_ptq the single source of truth for both LLM and VLM PTQ and
deprecates vlm_ptq.

llm_ptq (canonical):

  • Add --vlm and --calib_with_images flags to scripts/parser.sh and
    scripts/huggingface_example.sh. --vlm bootstraps VILA dependencies and runs
    the TensorRT-LLM multimodal quickstart as the deploy smoke test (instead of the
    text-only run_tensorrt_llm.py).
  • Add examples/llm_ptq/requirements-vila.txt (fixes the previously broken
    reference).
  • Document the VLM support matrix and the --vlm workflow in README.md.

vlm_ptq (deprecated):

  • Replace scripts/huggingface_example.sh with a shim that prints a deprecation
    warning and forwards to the llm_ptq script with --vlm.
  • Convert README.md into a redirect/migration notice.
  • Repoint root README.md VLM links and add a CHANGELOG.rst deprecation entry.

Usage

cd examples/llm_ptq
# VLM PTQ (was: examples/vlm_ptq/scripts/huggingface_example.sh)
scripts/huggingface_example.sh --model <hf_model> --quant fp8 --vlm

# VLM image-text calibration
scripts/huggingface_example.sh --model <hf_model> --quant nvfp4 --vlm --calib_with_images --trust_remote_code

Testing

  • bash -n syntax check on the modified parser.sh, llm_ptq script, and the
    vlm_ptq shim.
  • pre-commit run --files <changed files> passes.
  • The existing VLM example test (tests/examples/vlm_ptq/test_qwen_vl.py via
    run_vlm_ptq_command) still exercises the path end-to-end through the
    deprecation shim, which forwards to the consolidated llm_ptq script.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅ (old vlm_ptq entry point still works via a forwarding shim)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A (existing VLM test still covers the consolidated path)
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: ❌

Additional Information

Follow-up (later release): remove the examples/vlm_ptq directory and its CI
matrix entry once external references have migrated.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added VLM quantization support to examples/llm_ptq via a --vlm flag.
    • Enabled image-text pair calibration with --calib_with_images, including VLM multimodal smoke-test coverage.
  • Deprecations

    • examples/vlm_ptq is deprecated; it now forwards to the examples/llm_ptq --vlm flow with a warning.
    • VILA/NVILA VLM support was removed from examples/llm_ptq due to a model dependency compatibility conflict.
  • Documentation

    • Updated READMEs and the model support matrix with VLM quantization behavior and export limitations.
  • Tests / CI

    • Updated VLM PTQ tests and CI workflow matrices to stop running the deprecated vlm_ptq example.

VLM PTQ already ran entirely through examples/llm_ptq (the vlm_ptq shell
script sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), so the
vlm_ptq example was effectively a thin, partially-broken wrapper.

Make llm_ptq the single source of truth for both LLM and VLM PTQ:
- Add --vlm and --calib_with_images flags to scripts/parser.sh and
  scripts/huggingface_example.sh. --vlm bootstraps VILA deps and runs the
  TRT-LLM multimodal quickstart as the deploy smoke test.
- Add examples/llm_ptq/requirements-vila.txt (the vlm_ptq script referenced
  a requirements-vila.txt that never existed in the repo).
- Document the VLM support matrix and --vlm workflow in llm_ptq/README.md.

Deprecate examples/vlm_ptq:
- Replace its huggingface_example.sh with a shim that warns and forwards to
  the llm_ptq script with --vlm (backward compatible).
- Turn its README into a redirect/migration notice.
- Repoint root README VLM links and add a CHANGELOG deprecation entry.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 requested review from a team as code owners June 12, 2026 22:47
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 910c2010-66cc-4c0b-a648-0926e47e31c8

📥 Commits

Reviewing files that changed from the base of the PR and between 7397fe1 and bfa3282.

📒 Files selected for processing (1)
  • examples/llm_ptq/example_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/llm_ptq/example_utils.py

📝 Walkthrough

Walkthrough

Consolidates examples/vlm_ptq into examples/llm_ptq by adding --vlm and --calib_with_images flags to the shared parser and launcher scripts, removing VILA-specific loading paths from example_utils.py, converting the old vlm_ptq entry point to a deprecated forwarding shim, and updating documentation, CI, and tests accordingly.

Changes

VLM PTQ Consolidation into LLM PTQ

Layer / File(s) Summary
New --vlm and --calib_with_images flags in parser and launcher
examples/llm_ptq/scripts/parser.sh, examples/llm_ptq/scripts/huggingface_example.sh
Parser adds --vlm and --calib_with_images to the getopt option list, initializes variables, parses them in the option case statement, and outputs their values in the configuration summary. Launcher conditionally appends --calib_with_images to PTQ arguments and branches the smoke-test stage: when VLM is set, runs quickstart_multimodal.py with --model_dir and --modality image (defaulting TRT_LLM_CODE_PATH to /app/tensorrt_llm and warning if the file is missing); otherwise invokes run_tensorrt_llm.py with the checkpoint directory.
VILA removal and get_model refactoring
examples/llm_ptq/example_utils.py
Removes sys import and all VILA-specific sys.path/LLaVA import blocks from get_tokenizer and get_model. Refactors get_model to unconditionally apply dtype=auto, reorganizes sequential device-map and memory handling with a has_pack_quantized_config() helper that detects pack-quantized format from quantization_config or nested text_config.quantization_config, and restructures model-loading branches for speculative architectures, pack-quantized checkpoints, native MXFP4, and general models with proper device-map and memory inference.
vlm_ptq entry point converted to deprecated forwarding shim
examples/vlm_ptq/scripts/huggingface_example.sh
Replaces 122 lines of full PTQ/deployment logic with a 13-line shim that prints deprecation messages to stderr and exec-forwards all arguments to examples/llm_ptq/scripts/huggingface_example.sh --vlm "$@".
LLM PTQ README extended with VLM support matrix and quantization sections
examples/llm_ptq/README.md
Adds VLM model rows to the Hugging Face support matrix with quantization compatibility markings; documents that VLM quantization targets only the language model while the vision encoder stays high precision via --vlm; documents int8_sq backend/export constraints and Nemotron VL image-text-pair calibration behavior; introduces "VLM quantization" subsection with hf_ptq.py and huggingface_example.sh command examples and supported quant values; extends "VLM calibration with image-text pairs" with flag examples for --calib_with_images and --trust_remote_code.
VLM PTQ README condensed to deprecation notice and migration guide
examples/vlm_ptq/README.md
Replaces detailed PTQ background and resources with a [Deprecated] notice, migration instructions showing the --vlm invocation, a "Where things moved" table linking former topics to their new locations under examples/llm_ptq and examples/megatron_bridge, and a reduced Resources section containing only Documentation and Release Notes links.
Root README, CHANGELOG, and CI workflow updates
README.md, CHANGELOG.rst, .github/workflows/example_tests.yml
Main README updates the VLM Quantization model support matrix link to point to consolidated examples/llm_ptq/README.md#hugging-face-supported-models; CHANGELOG adds 0.46 deprecation entry documenting consolidation with shared entry points, --vlm requirement, deprecation warning and forwarding behavior in the old script, and removal of VILA/NVILA vision-language model support and VILA-specific bootstrap/loading paths; CI example test matrix removes vlm_ptq from both PR and non-PR TensorRT-LLM runs, leaving only llm_ptq.
Test utilities and VLM PTQ test updated
tests/_test_utils/examples/run_command.py, tests/examples/llm_ptq/test_vlm_ptq.py
run_vlm_ptq_command helper removed; run_llm_ptq_command gains vlm: bool = False parameter and conditionally appends --vlm to the command arguments; VLM PTQ test updates imports to use the unified run_llm_ptq_command and calls it with vlm=True while preserving the quantization parameterization over ["fp8", "int8_sq", "nvfp4"].

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/Model-Optimizer#1650: Both PRs extend examples/llm_ptq/scripts/parser.sh's parse_options() function with new command-line flags (--vlm/--calib_with_images in this PR, --mmlu_limit in the related PR), modifying the same code location.

Suggested reviewers

  • jenchen13
  • ChenhanYu
  • cjluo-nv
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor(examples): consolidate vlm_ptq into llm_ptq' accurately summarizes the primary change: consolidating the VLM PTQ example into the LLM PTQ example as the canonical source.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed All Python changes follow SECURITY.md practices: trust_remote_code defaults to False (not hardcoded True), no unsafe deserialization (torch.load, numpy.load, pickle), no eval/exec on untrusted inpu...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch consolidate-vlm-ptq-into-llm-ptq

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/llm_ptq/scripts/huggingface_example.sh`:
- Around line 262-263: The shell invocation expands QUICK_START_MULTIMODAL and
SAVE_PATH unquoted which breaks when paths contain spaces; update the python3
command invocations (the lines that call python3 with QUICK_START_MULTIMODAL and
the --model_dir SAVE_PATH flag) to quote those expansions (e.g., wrap
QUICK_START_MULTIMODAL and SAVE_PATH in double quotes) and also quote any other
path-like variables used in the alternate branch at line 267 so arguments aren’t
split.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6149b08c-a48a-4a84-8cbe-8d1efb8b1ea6

📥 Commits

Reviewing files that changed from the base of the PR and between d26c8af and 3d42843.

📒 Files selected for processing (8)
  • CHANGELOG.rst
  • README.md
  • examples/llm_ptq/README.md
  • examples/llm_ptq/requirements-vila.txt
  • examples/llm_ptq/scripts/huggingface_example.sh
  • examples/llm_ptq/scripts/parser.sh
  • examples/vlm_ptq/README.md
  • examples/vlm_ptq/scripts/huggingface_example.sh

Comment thread examples/llm_ptq/scripts/huggingface_example.sh Outdated
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.93%. Comparing base (bcd8dd4) to head (bfa3282).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1705       +/-   ##
===========================================
+ Coverage   58.45%   76.93%   +18.47%     
===========================================
  Files         510      511        +1     
  Lines       56271    56460      +189     
===========================================
+ Hits        32896    43438    +10542     
+ Misses      23375    13022    -10353     
Flag Coverage Δ
examples 41.79% <ø> (+19.35%) ⬆️
unit 54.34% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Consolidates examples/vlm_ptq into examples/llm_ptq: adds --vlm/--calib_with_images to parser.sh and huggingface_example.sh, moves the VILA bootstrap + a (previously missing) requirements-vila.txt into llm_ptq, and replaces the vlm_ptq script with an exec shim that forwards --vlm "$@". Small, net-negative diff (+142/-192).

Design review: the "design review required" gate fired only because the change spans ≥5 directories, but almost all of that is README/CHANGELOG edits. This is a deprecation/de-duplication of a thin wrapper (the old vlm_ptq script already sourced llm_ptq/parser.sh and called llm_ptq/hf_ptq.py), not a new subsystem/abstraction — it reuses the existing parser/script pattern rather than introducing a second one. The PR body justifies the consolidation well. No new framework concern.

Correctness: verified the consolidation is faithful — VILA version check/clone block, requirements-vila.txt reference (now points to llm_ptq and the file actually exists, fixing the prior broken reference), and the multimodal-quickstart deploy smoke test all match the old behavior, gated behind $VLM. Anchor links in the top-level README and the migrated vlm_ptq/README.md match the new headings. No licensing changes (the new requirements file is just a transformers<=4.50.0 pin). No prompt-injection in the untrusted blocks.

Why nudge rather than approve:

  • No tests directly exercise the new --vlm path through examples/llm_ptq. Coverage is currently indirect via tests/examples/vlm_ptq/test_qwen_vl.py → the shim → llm_ptq --vlm. The PR's own follow-up plan removes examples/vlm_ptq and its CI matrix entry, at which point that indirect coverage disappears unless a direct run_llm_ptq_command(..., vlm=True) test is added. Worth a maintainer deciding whether to add direct coverage now.
  • The VLM deploy smoke test (multimodal quickstart) is only reached for the fp8 path; int8_sq/non-Blackwell nvfp4 exit early before the deploy block. This matches old behavior (not a regression) but is worth confirming is intended.

@meenchen meenchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the refactor functionally equivalent?

Comment thread examples/llm_ptq/README.md Outdated
LLMs — the language model is quantized while the vision encoder is kept in high precision. Pass
`--vlm` to the shell script (see [VLM quantization](#vlm-quantization)).

| Model | fp8 | int8_sq<sup>1</sup> | int4_awq | w4a8_awq<sup>2</sup> | nvfp4<sup>3</sup> |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shengliangxu shengliangxu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread examples/llm_ptq/requirements-vila.txt Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to drop Vila model support? ModelOpt min transformers is 4.56 so we cannot continue guaranteeing it works with 4.50

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - dropped Vila support.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to rename examples/llm_ptq to examples/hf_ptq?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good idea. Though not sure if it will be a breaking change

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave a symlink from examples/llm_ptq/ to new examples/hf_ptq/ directory so previous path still remains valid and then we remove the symlink folder after few releases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that probably works, but I would defer it to a follow-up PR, as we focus on soft deprecation of vlm_ptq here and the renaming would require changes in many other places.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to merge tests and CI jobs:

… feedback)

Address PR review feedback on the vlm_ptq -> llm_ptq consolidation:

- Drop VILA/NVILA support: its modeling code requires transformers<=4.50.0,
  which conflicts with ModelOpt's minimum transformers version. Remove the
  VILA bootstrap (repo clone, requirements-vila.txt) and the VILA loading
  paths in example_utils.py.
- Merge the VLM support matrix into the main "Hugging Face Supported Models"
  table (rows tagged (VLM)); replace the separate VLM subsection with a note.
- Move the VLM example test into tests/examples/llm_ptq via
  run_llm_ptq_command(..., vlm=True) for direct --vlm coverage; drop the
  vlm_ptq CI matrix entries and remove run_vlm_ptq_command.
- Quote smoke-test paths in huggingface_example.sh (CodeRabbit nit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 requested a review from a team as a code owner June 16, 2026 06:31
@Edwardf0t1

Copy link
Copy Markdown
Contributor Author

Is the refactor functionally equivalent?

I think so, but we dropped VILA support. See discussions in #1705 (comment)

…nto-llm-ptq

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

# Conflicts:
#	README.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/example_tests.yml:
- Line 58: The workflow matrix in the example_tests.yml file at lines 58 and 72
currently only includes llm_ptq, which removes CI coverage for the deprecated
vlm_ptq forwarding shim. Since vlm_ptq is still marked for backward
compatibility during a deprecation window and has no other CI coverage, you must
either add vlm_ptq back to the example matrix at both line 58 and line 72 (the
sibling location), or create a separate standalone smoke test for the deprecated
shim, or remove the shim entirely if the deprecation period has ended. Choose
the appropriate approach based on the deprecation timeline for this component.

In `@examples/llm_ptq/example_utils.py`:
- Around line 656-665: The `has_pack_quantized_config` function assumes
`quantization_config` is always a dict by calling `.get()` on it, but in Hugging
Face configs it can be either a dict or a config object. Follow the pattern used
in `get_original_hf_quant_method()` which handles both cases, applying the same
defensive approach to both the top-level `quantization_config` check and the
nested `text_config.quantization_config` check. Use appropriate type checking
and attribute access methods to safely retrieve the "format" value regardless of
whether it is stored as a dict key or an object attribute.
- Around line 706-708: The code accesses hf_config.architectures[0] without
verifying that the architectures list is not empty. Even though there may be a
check for None earlier in the code, an empty list would still cause an
IndexError at this index access. Add a length check to ensure
hf_config.architectures has at least one element before accessing the first
element at index 0, and handle the case where the list is empty appropriately
(either by raising a more informative error or providing a fallback value).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 05235eca-5c7f-432d-a1f9-bc1ce79cc3fd

📥 Commits

Reviewing files that changed from the base of the PR and between 3d42843 and 1b66de7.

📒 Files selected for processing (10)
  • .github/workflows/example_tests.yml
  • CHANGELOG.rst
  • README.md
  • examples/llm_ptq/README.md
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/scripts/huggingface_example.sh
  • examples/vlm_ptq/README.md
  • tests/_test_utils/examples/run_command.py
  • tests/examples/llm_ptq/test_vlm_ptq.py
  • tests/examples/vlm_ptq/_extensions/test_torch_extensions.py
💤 Files with no reviewable changes (1)
  • tests/examples/vlm_ptq/_extensions/test_torch_extensions.py
✅ Files skipped from review due to trivial changes (2)
  • CHANGELOG.rst
  • examples/vlm_ptq/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md

Comment thread .github/workflows/example_tests.yml
Comment thread examples/llm_ptq/example_utils.py
Comment thread examples/llm_ptq/example_utils.py

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Clean deprecation/consolidation refactor (+219/-346, net negative) folding examples/vlm_ptq into examples/llm_ptq via a --vlm flag and a forwarding shim. Design-review gate fired only on directory count; this reuses the existing parser.sh/huggingface_example.sh pattern rather than introducing a new subsystem (maintainer cjluo-nv concurred), so the design question is settled. README anchors (#hugging-face-supported-models, #vlm-quantization, #mxfp4--nvfp4-cast-for-gpt-oss) all resolve correctly; no licensing changes. No prompt injection in the untrusted blocks.

Status of prior review comments:

  • 💬 Author replied (meenchen's "functionally equivalent?"): equivalent except VILA/NVILA support intentionally dropped (transformers<=4.50 conflicts with ModelOpt's min). Confirmed — VILA bootstrap, requirements-vila.txt, and example_utils.py VILA paths are all removed and the CHANGELOG documents the drop. Resolves kevalmorabia97's "drop VILA?" comment too.
  • 💬 CodeRabbit "quote smoke-test paths" — addressed in commit 1b66de7 ("$QUICK_START_MULTIMODAL", "$SAVE_PATH"); verified in current file.
  • 💬 llm_ptq→hf_ptq rename / symlink (kevalmorabia97) — author deferred to a follow-up PR; reasonable.

Why nudge rather than approve:

  • CI coverage for the deprecated vlm_ptq shim (CodeRabbit, Major): the shim itself is no longer exercised in CI, but the consolidated --vlm path now has direct coverage via tests/examples/llm_ptq/test_vlm_ptq.pyrun_llm_ptq_command(..., vlm=True) in the llm_ptq matrix job. So the functionality is tested; the thin forwarding shim is not. Acceptable for a soft deprecation, but worth a maintainer confirming they're comfortable with the shim being untested for its lifetime.
  • has_pack_quantized_config assumes quantization_config is a dict (.get()) and the final else branch accesses hf_config.architectures[0] without an empty-list guard (CodeRabbit). Both are pre-existing and were only de-indented (moved out of the old VILA branch) by this PR, not regressed here — but since this refactor touches the surrounding code, a maintainer may want to opportunistically harden them to match the dict/object handling already used in get_original_hf_quant_method().

No GPU-validated run of the consolidated --vlm path is shown (CPU bash -n + pre-commit only), so a maintainer with the Qwen-VL test env should confirm the end-to-end path before merge.

@kevalmorabia97 kevalmorabia97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please make sure to also add the llm_ptq to hf_ptq directory rename in 0.46 release so both changes are part of same release

…ack)

Address CodeRabbit findings on code that this PR de-indented out of the old
VILA branch:

- has_pack_quantized_config: handle quantization_config stored as either a dict
  or a config object (mirrors get_original_hf_quant_method), avoiding an
  AttributeError on object-style configs. Also dedupes the top-level / nested
  text_config checks.
- Guard against an empty hf_config.architectures list before indexing [0] to
  avoid an IndexError with a clearer message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants