refactor(examples): rename llm_ptq → hf_ptq (symlink for back-compat)#1759
refactor(examples): rename llm_ptq → hf_ptq (symlink for back-compat)#1759Edwardf0t1 wants to merge 1 commit into
Conversation
The example covers Hugging Face LLM and VLM PTQ, so "llm_ptq" is a misnomer since the vlm_ptq consolidation. Rename the directory to examples/hf_ptq and leave a relative symlink examples/llm_ptq -> hf_ptq so existing paths and commands keep working through a deprecation window. - git mv examples/llm_ptq -> examples/hf_ptq and tests/examples/llm_ptq -> tests/examples/hf_ptq (CI maps the matrix name to both examples/<name> and tests/examples/<name>). - Add back-compat symlink examples/llm_ptq -> hf_ptq (tracked as a symlink). - Update CI matrices and all repo path references (docs, READMEs, skills, launcher/debugger tools, tests) from llm_ptq to hf_ptq. Python identifiers and test-util module names (run_llm_ptq_command, llm_ptq_utils) are kept. - Preserve the CODEOWNERS team slug and historical CHANGELOG entries; add a CHANGELOG deprecation note for the rename. Follow-up to the examples/vlm_ptq -> examples/llm_ptq consolidation (#1705), targeted for the same 0.46 release. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
📝 WalkthroughWalkthroughRenames the PTQ example directory from Changesllm_ptq → hf_ptq Rename and vlm_ptq Consolidation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.agents/skills/ptq/SKILL.md (1)
108-108:⚠️ Potential issue | 🟡 MinorCorrect the launcher script path to match the actual template location.
Line 108 references
common/hf_ptq/hf_ptq.sh, which does not exist. The correct path iscommon/hf/ptq.sh, as documented in the launcher guide itself. Replacecommon/hf_ptq/hf_ptq.shwithcommon/hf/ptq.sh.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/ptq/SKILL.md at line 108, The launcher script path referenced in the SKILL.md file is incorrect. Locate the reference to `common/hf_ptq/hf_ptq.sh` on line 108 and replace it with the correct path `common/hf/ptq.sh` to match the actual template location documented in the launcher guide.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.agents/skills/ptq/references/unsupported-models.md:
- Line 150: In the unsupported-models.md file, replace the broken documentation
link `examples/hf_ptq/moe.md` with the correct path `examples/hf_ptq/README.md`.
The README file contains the actual MoE quantization documentation that should
be referenced instead of the non-existent moe.md file.
In `@README.md`:
- Line 33: The README.md file contains anchor links pointing to sections in
examples/hf_ptq/README.md that do not exist as markdown headings. To fix this,
either add the missing sections (`#llama-4`,
`#model-quantization-and-trt-llm-conversion`, and
`#deploy-fp8-quantized-model-using-vllm`) as properly formatted markdown headings
in examples/hf_ptq/README.md, or update the links in the main README to
reference only the existing anchors (`#support-matrix` and
`#hugging-face-supported-models`). Choose the approach that best maintains the
documentation structure and user experience.
---
Outside diff comments:
In @.agents/skills/ptq/SKILL.md:
- Line 108: The launcher script path referenced in the SKILL.md file is
incorrect. Locate the reference to `common/hf_ptq/hf_ptq.sh` on line 108 and
replace it with the correct path `common/hf/ptq.sh` to match the actual template
location documented in the launcher guide.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 36b1e7eb-b9ba-4047-a71d-d2be1dedff6f
📒 Files selected for processing (65)
.agents/skills/common/environment-setup.md.agents/skills/deployment/references/support-matrix.md.agents/skills/ptq/SKILL.md.agents/skills/ptq/references/slurm-setup-ptq.md.agents/skills/ptq/references/unsupported-models.md.github/CODEOWNERS.github/workflows/_example_tests_runner.yml.github/workflows/example_tests.ymlCHANGELOG.rstREADME.mddocs/source/deployment/3_unified_hf.rstdocs/source/guides/10_recipes.rstdocs/source/guides/_compress_quantized_models.rstdocs/source/guides/_customized_model_quantization.rstdocs/source/index.rstexamples/deepseek/README.mdexamples/deepseek/deepseek_v4/quantize_to_nvfp4.pyexamples/gpt-oss/README.mdexamples/hf_ptq/.gitignoreexamples/hf_ptq/README.mdexamples/hf_ptq/cast_mxfp4_to_nvfp4.pyexamples/hf_ptq/example_utils.pyexamples/hf_ptq/fsdp2.yamlexamples/hf_ptq/hf_ptq.pyexamples/hf_ptq/multinode_ptq.pyexamples/hf_ptq/nemotron_vl_calib.pyexamples/hf_ptq/notebooks/1_FP4-FP8_PTQ_Min-Max_Calibration.ipynbexamples/hf_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynbexamples/hf_ptq/notebooks/3_PTQ_AutoQuantization.ipynbexamples/hf_ptq/requirements.txtexamples/hf_ptq/run_tensorrt_llm.pyexamples/hf_ptq/scripts/huggingface_example.shexamples/hf_ptq/scripts/parser.shexamples/hf_ptq/vlm_utils.pyexamples/llm_eval/README.mdexamples/llm_ptqexamples/llm_qat/README.mdexamples/llm_qat/llama_factory/README.mdexamples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynbexamples/megatron_bridge/quantize.pyexamples/model_hub/README.mdexamples/pruning/minitron/NVIDIA-Nemotron-Nano-9B-v2/README.mdexamples/speculative_decoding/README.mdexamples/vllm_serve/README.mdexamples/vlm_ptq/README.mdexamples/vlm_ptq/scripts/huggingface_example.shmodelopt/recipe/presets.pymodelopt/torch/quantization/utils/numeric_utils.pytests/_test_utils/examples/llm_ptq_example_utils.pytests/_test_utils/examples/run_command.pytests/examples/hf_ptq/_extensions/test_torch_extensions.pytests/examples/hf_ptq/test_cast_mxfp4_to_nvfp4.pytests/examples/hf_ptq/test_deploy.pytests/examples/hf_ptq/test_example_utils.pytests/examples/hf_ptq/test_hf_ptq_args.pytests/examples/hf_ptq/test_llm_ptq.pytests/examples/hf_ptq/test_vlm_ptq.pytests/examples/speculative_decoding/test_eagle_offline_ptq.pytests/gpu/torch/export/test_unified_hf_export_and_check_safetensors.pytests/gpu/torch/quantization/test_gpt_oss_mxfp4_nvfp4_cast_cuda.pytools/debugger/CLAUDE.mdtools/debugger/README.mdtools/launcher/common/eagle3/hf_ptq.shtools/launcher/common/hf/ptq.shtools/launcher/examples/Qwen/Qwen3-8B/hf_ptq.yaml
| | Fused weights + `forward(x, expert_id)` | Expand + reconstruct on export | `_QuantMoELinear` (Step3.5) | | ||
|
|
||
| For the full guide, see `examples/llm_ptq/moe.md`. | ||
| For the full guide, see `examples/hf_ptq/moe.md`. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the moe.md file exists
ls -la examples/hf_ptq/moe.md 2>&1Repository: NVIDIA/Model-Optimizer
Length of output: 136
🏁 Script executed:
#!/bin/bash
# Check if the examples/hf_ptq directory exists
echo "=== Checking if examples/hf_ptq directory exists ==="
ls -la examples/hf_ptq 2>&1 || echo "Directory does not exist"
# Search for any moe.md file in the repository
echo -e "\n=== Searching for moe.md files in repository ==="
find . -name "*moe.md" -type f 2>/dev/null
# Check if there are any similar documentation files in examples directory
echo -e "\n=== Files in examples directory (if it exists) ==="
find examples -type f -name "*.md" 2>/dev/null | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 1834
🏁 Script executed:
#!/bin/bash
# Check the context around line 150 in the unsupported-models.md file
echo "=== Content around line 150 in unsupported-models.md ==="
sed -n '145,155p' .agents/skills/ptq/references/unsupported-models.md
# Check if hf_ptq/README.md contains MoE information
echo -e "\n=== Checking hf_ptq/README.md for MoE content ==="
grep -i "moe\|mixture" examples/hf_ptq/README.md | head -10
# List all files in hf_ptq directory to see what's available
echo -e "\n=== All files in examples/hf_ptq ==="
ls -1 examples/hf_ptq/Repository: NVIDIA/Model-Optimizer
Length of output: 3168
Fix broken documentation link: examples/hf_ptq/moe.md does not exist.
The file examples/hf_ptq/moe.md referenced on line 150 is not present in the repository. The MoE handling guide is in examples/hf_ptq/README.md, which contains comprehensive documentation on MoE quantization strategies, expert layer handling, and supported MoE models. Update the link accordingly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.agents/skills/ptq/references/unsupported-models.md at line 150, In the
unsupported-models.md file, replace the broken documentation link
`examples/hf_ptq/moe.md` with the correct path `examples/hf_ptq/README.md`. The
README file contains the actual MoE quantization documentation that should be
referenced instead of the non-existent moe.md file.
| - [2026/04/15] Customer story: [Domyn compresses Colosseum-355B → 260B using ModelOpt's Minitron pruning + distillation](https://www.domyn.com/blog/domyn-large-the-journey-of-a-european-sovereign-ai-model-for-regulated-industries) | ||
| - [2026/03/17] Customer story: [Bielik.AI builds Bielik Minitron 7B (33% smaller, 50% faster, 90% quality retained) using ModelOpt's Minitron pruning + distillation](https://bielik.ai/en/nvidia-gtc-bielik-minitron-premiere/) | ||
| - [2026/03/11] Model Optimizer quantized Nemotron-3-Super checkpoints are available on Hugging Face for download: [FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8), [NVFP4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4). Learn more in the [Nemotron 3 Super release blog](https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/). Check out how to quantize Nemotron 3 models for deployment acceleration [here](./examples/llm_ptq/README.md) | ||
| - [2026/03/11] Model Optimizer quantized Nemotron-3-Super checkpoints are available on Hugging Face for download: [FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8), [NVFP4](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4). Learn more in the [Nemotron 3 Super release blog](https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/). Check out how to quantize Nemotron 3 models for deployment acceleration [here](./examples/hf_ptq/README.md) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the hf_ptq README exists and contains expected sections
if [ -f examples/hf_ptq/README.md ]; then
echo "=== Checking for expected anchors in examples/hf_ptq/README.md ==="
grep -E '(llama-4|model-quantization|deploy-fp8|support-matrix|hugging-face-supported)' examples/hf_ptq/README.md || echo "WARNING: Some expected section headings may be missing"
else
echo "ERROR: examples/hf_ptq/README.md does not exist"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 728
🏁 Script executed:
#!/bin/bash
# Check file existence and examine markdown headings
if [ -f examples/hf_ptq/README.md ]; then
echo "=== File exists. Getting size ==="
wc -l examples/hf_ptq/README.md
echo -e "\n=== Searching for markdown headings (##) ==="
grep -n "^##" examples/hf_ptq/README.md | head -20
echo -e "\n=== Searching for specific anchor text patterns ==="
grep -in -E "(llama|quantization.*trt|deploy.*fp8|support.*matrix|hugging.*face)" examples/hf_ptq/README.md | head -30
else
echo "File does not exist"
fiRepository: NVIDIA/Model-Optimizer
Length of output: 5799
Fix broken anchor links: three referenced sections do not exist in examples/hf_ptq/README.md.
The following anchors are referenced in the main README links but do not exist as markdown headings in examples/hf_ptq/README.md:
#llama-4— no heading found (LLAMA 4 appears only as a table row)#model-quantization-and-trt-llm-conversion— no heading found#deploy-fp8-quantized-model-using-vllm— no heading found
Only #support-matrix (line 96) and #hugging-face-supported-models (line 98) exist.
Either add the missing sections as markdown headings to examples/hf_ptq/README.md, or update the links in the main README to point to existing anchors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` at line 33, The README.md file contains anchor links pointing to
sections in examples/hf_ptq/README.md that do not exist as markdown headings. To
fix this, either add the missing sections (`#llama-4`,
`#model-quantization-and-trt-llm-conversion`, and
`#deploy-fp8-quantized-model-using-vllm`) as properly formatted markdown headings
in examples/hf_ptq/README.md, or update the links in the main README to
reference only the existing anchors (`#support-matrix` and
`#hugging-face-supported-models`). Choose the approach that best maintains the
documentation structure and user experience.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1759 +/- ##
==========================================
+ Coverage 74.29% 76.65% +2.36%
==========================================
Files 511 511
Lines 56356 56356
==========================================
+ Hits 41868 43200 +1332
+ Misses 14488 13156 -1332
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: refactor / deprecation (examples)
Follow-up to #1705 (which consolidated
examples/vlm_ptqintoexamples/llm_ptq). Since that example now covers Hugging Face LLM and VLM PTQ, thellm_ptqname is a misnomer. This renames the directory toexamples/hf_ptqand leaves a relative symlinkexamples/llm_ptq → hf_ptqso existing paths/commands keep working during a deprecation window.Requested by @kevalmorabia97 on #1705 (with the symlink-for-back-compat approach), targeted for the same 0.46 release as the consolidation.
Changes
git mv examples/llm_ptq → examples/hf_ptqandtests/examples/llm_ptq → tests/examples/hf_ptq(the CI runner maps the matrix name to bothexamples/<name>andtests/examples/<name>).examples/llm_ptq → hf_ptq.llm_ptqtohf_ptq.run_llm_ptq_command,llm_ptq_utils) — they name the LLM-PTQ task, not the directory.modelopt-examples-llm_ptq-codeowners) and historical CHANGELOG entries; add a CHANGELOG deprecation note.Back-compat caveats (inherent to git directory symlinks)
cwd/pytest resolution work through the symlink.examples/llm_ptq/...won't navigate in. All internal references are repointed tohf_ptq, so the symlink is only for legacy external/CLI use.Usage (unchanged via symlink)
Testing
bash -non moved/edited shell scripts (new path + via symlink).py_compileon moved/edited Python; test re-export shim repointed toexamples/hf_ptq/example_utils.examples/llm_ptqas a single symlink (mode 120000), not a duplicated tree (no pre-commit / pytest double-processing).pre-commit runon all changed files passes.Before your PR is "Ready for review"
examples/llm_ptqpaths valid; see caveats above)Additional Information
Follow-up (later release): remove the
examples/llm_ptqsymlink once external references have migrated.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Deprecations
--vlmflag within the main PTQ workflow. Legacy paths remain accessible via symlink for backward compatibility but will be removed in future releases.Documentation