[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236
[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236Edwardf0t1 wants to merge 1 commit intomainfrom
Conversation
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughAdded documentation for handling unsupported model deployments. A new troubleshooting guide was created at Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.claude/skills/deployment/references/unsupported-models.md (1)
23-23: Avoid an absolute “never modify config/checkpoint” directive.This is mostly good guidance, but written as an unconditional rule it can mislead when the checkpoint metadata/export is actually invalid. Consider softening to “prefer framework-side fixes first, unless checkpoint metadata is clearly malformed.”
✏️ Suggested wording
-Focus on **small, targeted patches** to the framework source. Do not modify `config.json` or the checkpoint — fix the framework's handling instead. +Focus on **small, targeted patches** to the framework source. Prefer framework-side fixes first; only modify `config.json` or checkpoint artifacts when metadata/export defects are clearly identified.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.claude/skills/deployment/references/unsupported-models.md at line 23, Replace the absolute directive "Do not modify `config.json` or the checkpoint — fix the framework's handling instead." with a softened conditional phrasing that prefers framework-side fixes but allows editing checkpoint/export when its metadata is demonstrably malformed; e.g., change to "Prefer small, targeted framework fixes first; only edit `config.json` or checkpoint exports when the checkpoint metadata is clearly invalid or irreparably corrupt." Ensure the new phrasing appears in place of the original sentence in the same paragraph so guidance remains focused on small, targeted patches while permitting exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 18: Update the grammar in the table row titled "Transformers version
mismatch": change the phrase "transformers that doesn't know the model type" to
"transformers that don't know the model type" so the subject-verb agreement is
correct; edit the string in the markdown table entry under the error explanation
for Transformers version mismatch.
---
Nitpick comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 23: Replace the absolute directive "Do not modify `config.json` or the
checkpoint — fix the framework's handling instead." with a softened conditional
phrasing that prefers framework-side fixes but allows editing checkpoint/export
when its metadata is demonstrably malformed; e.g., change to "Prefer small,
targeted framework fixes first; only edit `config.json` or checkpoint exports
when the checkpoint metadata is clearly invalid or irreparably corrupt." Ensure
the new phrasing appears in place of the original sentence in the same paragraph
so guidance remains focused on small, targeted patches while permitting
exceptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: af792be5-6e2a-415c-9d31-cd69b946a8a3
📒 Files selected for processing (2)
.claude/skills/deployment/SKILL.md.claude/skills/deployment/references/unsupported-models.md
| | **Weight key mismatch** | `KeyError`, `Unexpected key`, `Missing key` during weight loading | Checkpoint uses `model.language_model.layers.*` but framework expects `model.layers.*`. See [vllm#39406](https://github.com/vllm-project/vllm/pull/39406) | | ||
| | **Quantized/unquantized layer confusion** | Wrong layer type loaded, dtype errors, shape mismatches | Framework tries to load unquantized layers with FP4 kernel due to overly broad `quantization_config.ignore` patterns or missing ignore entries. See [sglang#18937](https://github.com/sgl-project/sglang/pull/18937) | | ||
| | **Missing architecture support** | `NoneType is not iterable`, `KeyError` on model type, unknown architecture | Framework's model handler doesn't recognize the text backbone type (e.g., `ministral3` not handled in vLLM's `mistral3.py` init). Fix: extend the model type mapping | | ||
| | **Transformers version mismatch** | `ImportError`, `KeyError` on config fields | Framework ships with older transformers that doesn't know the model type. Fix: upgrade transformers after installing the framework | |
There was a problem hiding this comment.
Fix grammar in the transformers mismatch guidance.
The sentence uses incorrect subject-verb agreement: “transformers that doesn’t know”. Update to “transformers that don’t know”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.claude/skills/deployment/references/unsupported-models.md at line 18,
Update the grammar in the table row titled "Transformers version mismatch":
change the phrase "transformers that doesn't know the model type" to
"transformers that don't know the model type" so the subject-verb agreement is
correct; edit the string in the markdown table entry under the error explanation
for Transformers version mismatch.
There was a problem hiding this comment.
Pull request overview
This PR updates the deployment skill documentation to add guidance for debugging and deploying models that are not in the validated support matrix (e.g., new architectures or newly-quantized checkpoints) when vLLM/SGLang/TRT-LLM fail during initialization or weight loading.
Changes:
- Add an “Unsupported Models” section to the deployment skill with a pointer to a dedicated debug-loop guide.
- Add a new reference doc describing a 5-step iterative workflow (run → read error → diagnose → patch → re-run) and common failure categories.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
.claude/skills/deployment/SKILL.md |
Adds a concise “Unsupported Models” pointer to the new reference guide. |
.claude/skills/deployment/references/unsupported-models.md |
New guide documenting an iterative debug loop and common error categories with examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,63 @@ | |||
| # Deploying Unsupported Models | |||
|
|
|||
| When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM. | |||
There was a problem hiding this comment.
In this reference doc, the path references/support-matrix.md is relative to the current file, which already lives under references/. On GitHub this will resolve to references/references/support-matrix.md (broken link). Use a relative link to the sibling file instead (e.g., support-matrix.md).
| When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM. | |
| When deploying a model not in the validated support matrix (`support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1236 +/- ##
=======================================
Coverage 72.14% 72.14%
=======================================
Files 350 350
Lines 40478 40478
=======================================
Hits 29202 29202
Misses 11276 11276
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval pipeline, workspace continuity, unsupported model handling, NEL deployment.command pattern, and NEL CI vs SLURM executor decision table - Add cross-skill workspace flow to workspace-management.md - Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation - Add pipeline integration note to evaluation/SKILL.md Depends on PR #1236 (deployment/references/unsupported-models.md). Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
What does this PR do?
Type of change: Skills update
Add a debug loop guide for deploying unsupported models to the deployment skill. When deploying models not in the validated support matrix (e.g., newly quantized VLMs or models with new architectures like Devstral/ministral3), the inference framework (vLLM, SGLang, TRT-LLM) often fails during model init or weight loading.
This PR adds:
references/unsupported-models.md— a 5-step iterative debug workflow: run → read error → diagnose → patch framework source → re-runSKILL.mdunder "Unsupported Models" (keeps SKILL.md concise, matching the PTQ skill's pattern)The guide covers five common error categories with real-world examples:
ministral3not handled in vLLM'smistral3.py)Motivated by deploying a Devstral-Small-2-24B NVFP4 checkpoint on vLLM, where vLLM's
mistral3.pydidn't handleministral3as a text backbone model type.Testing
Validated end-to-end: NVFP4 quantization of Devstral-Small-2-24B → vLLM deployment on B100 GPUs with the debug loop (3 iterations to get the server running).
Before your PR is "Ready for review"
CONTRIBUTING.md: N/ASummary by CodeRabbit