vLLM fakequant: add recipe-based quantization support by kinjalpatel27 · Pull Request #1233 · NVIDIA/Model-Optimizer

kinjalpatel27 · 2026-04-10T19:43:36Z

What does this PR do?

Type of change: example update

This PR adds recipe-based quantization support to the vLLM fakequant example.

Testing

docker run --gpus all -it --shm-size=160GB --network host --rm --entrypoint bash -v <modelopt>:/home/modelopt vllm/vllm-openai:v0.15.0 -c "cd /home/modelopt && pip install . && pip install datasets && RECIPE_PATH=/home/modelopt/modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml python3 /home/modelopt/examples/vllm_serve/vllm_serve_fakequant.py Qwen/Qwen3-0.6B -tp 1 --served-model-name Qwen3-0.6B --host 0.0.0.0 --port 8001 --trust-remote-code --disable-custom-all-reduce --gpu-memory-utilization 0.8"

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A

Additional Information

Summary by CodeRabbit

New Features
- Added RECIPE_PATH environment variable support enabling users to specify ModelOpt PTQ recipe YAML files for quantization configuration in vLLM serving.
Documentation
- Updated examples and documentation to support recipe-driven quantization configuration, aligning export workflow with recipe-based setup.

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

copy-pr-bot · 2026-04-10T19:43:41Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-10T19:43:44Z

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Critical security violation found at examples/vllm_serve/vllm_ptq_utils.py lines 147-149 where assert isinstance() is used for runtime validation, which can be disabled with Python optimization flags.	Replace assert isinstance(recipe, ModelOptPTQRecipe) with explicit validation using if not isinstance(): raise ValueError() to ensure validation cannot be bypassed by optimization flags.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: adding recipe-based quantization support to vLLM fakequant, which is reflected across all modified files (README, fakequant_worker.py, vllm_ptq_utils.py, vllm_serve_fakequant.py).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kinjal/vllm_fq_recipe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-10T19:47:48Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1233/
Built to branch `gh-pages` at 2026-04-10 19:47 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-04-10T19:55:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.58%. Comparing base (3baa2da) to head (531fa3c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1233      +/-   ##
==========================================
+ Coverage   76.03%   77.58%   +1.54%     
==========================================
  Files         350      350              
  Lines       40469    40537      +68     
==========================================
+ Hits        30772    31449     +677     
+ Misses       9697     9088     -609

Flag	Coverage Δ
examples	`44.12% <ø> (+2.76%)`	⬆️
unit	`55.51% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/vllm_serve/vllm_ptq_utils.py`:
- Around line 145-150: Replace the unsafe assert in the recipe validation with
an explicit runtime check: after calling
load_recipe(quant_config["recipe_path"]) and assigning to recipe, verify type
with if not isinstance(recipe, ModelOptPTQRecipe): raise a ValueError containing
the same descriptive message (including the actual type and recipe path) and
then set quant_cfg = recipe.quantize; this ensures load_recipe,
ModelOptPTQRecipe, quant_cfg and recipe behavior remains the same but prevents
the check from being skipped under Python optimizations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fe9bb658-ad3b-4a06-aa28-6e2bb5155f6f

📥 Commits

Reviewing files that changed from the base of the PR and between da0e8ff and 531fa3c.

📒 Files selected for processing (4)

examples/vllm_serve/README.md
examples/vllm_serve/fakequant_worker.py
examples/vllm_serve/vllm_ptq_utils.py
examples/vllm_serve/vllm_serve_fakequant.py

coderabbitai · 2026-04-10T20:00:28Z

examples/vllm_serve/vllm_ptq_utils.py

+    if quant_config["recipe_path"]:
+        recipe = load_recipe(quant_config["recipe_path"])
+        assert isinstance(recipe, ModelOptPTQRecipe), (
+            f"Expected PTQ recipe, but got {type(recipe).__name__} from {quant_config['recipe_path']}"
+        )
+        quant_cfg = recipe.quantize


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current assertion usage in this path. rg -n -C2 'assert isinstance\(recipe, ModelOptPTQRecipe\)' --type=py # Demonstrate Python optimization removing assert checks. python - <<'PY' src = """def f(x): assert isinstance(x, int), "bad type" return x """ ns = {} exec(compile(src, "<inline>", "exec", optimize=1), ns) import dis dis.dis(ns["f"]) PY

Repository: NVIDIA/Model-Optimizer

Length of output: 1818

Use explicit exception instead of assert for runtime validation.

At line 147, using assert isinstance() is unsafe because assertions can be disabled when Python runs with optimization flags (e.g., python -O), allowing invalid recipe types to bypass this check silently. Use an explicit if/raise ValueError() pattern instead.

Proposed fix

if quant_config["recipe_path"]: recipe = load_recipe(quant_config["recipe_path"]) - assert isinstance(recipe, ModelOptPTQRecipe), ( - f"Expected PTQ recipe, but got {type(recipe).__name__} from {quant_config['recipe_path']}" - ) + if not isinstance(recipe, ModelOptPTQRecipe): + raise ValueError( + f"Expected PTQ recipe, but got {type(recipe).__name__} from {quant_config['recipe_path']}" + ) quant_cfg = recipe.quantize

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if quant_config["recipe_path"]:

recipe = load_recipe(quant_config["recipe_path"])

assert isinstance(recipe, ModelOptPTQRecipe), (

f"Expected PTQ recipe, but got {type(recipe).__name__} from {quant_config['recipe_path']}"

)

quant_cfg = recipe.quantize

if quant_config["recipe_path"]:

recipe = load_recipe(quant_config["recipe_path"])

if not isinstance(recipe, ModelOptPTQRecipe):

raise ValueError(

f"Expected PTQ recipe, but got {type(recipe).__name__} from {quant_config['recipe_path']}"

)

quant_cfg = recipe.quantize

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/vllm_serve/vllm_ptq_utils.py` around lines 145 - 150, Replace the unsafe assert in the recipe validation with an explicit runtime check: after calling load_recipe(quant_config["recipe_path"]) and assigning to recipe, verify type with if not isinstance(recipe, ModelOptPTQRecipe): raise a ValueError containing the same descriptive message (including the actual type and recipe path) and then set quant_cfg = recipe.quantize; this ensures load_recipe, ModelOptPTQRecipe, quant_cfg and recipe behavior remains the same but prevents the check from being skipped under Python optimizations.

shengliangxu

LGTM

kinjalpatel27 added 2 commits April 10, 2026 18:16

added support for quantizing using recipe in vllm fakequant

41849ab

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

minor

531fa3c

Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>

kinjalpatel27 marked this pull request as ready for review April 10, 2026 19:54

kinjalpatel27 requested a review from a team as a code owner April 10, 2026 19:54

kinjalpatel27 requested a review from sugunav14 April 10, 2026 19:54

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

shengliangxu approved these changes Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM fakequant: add recipe-based quantization support#1233

vLLM fakequant: add recipe-based quantization support#1233
kinjalpatel27 wants to merge 2 commits intomainfrom
kinjal/vllm_fq_recipe

kinjalpatel27 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Pre-merge checks failed

Uh oh!

github-actions bot commented Apr 10, 2026

Built to branch `gh-pages` at 2026-04-10 19:47 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 10, 2026

Uh oh!

shengliangxu left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kinjalpatel27 commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 10, 2026

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-merge checks failed

❌ Failed checks (1 error)

Uh oh!

github-actions bot commented Apr 10, 2026

Built to branch gh-pages at 2026-04-10 19:47 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kinjalpatel27 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-10 19:47 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Apr 10, 2026 •

edited

Loading