Skip to content

[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236

Open
Edwardf0t1 wants to merge 1 commit intomainfrom
zhiyu/polish-deployment-skills
Open

[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236
Edwardf0t1 wants to merge 1 commit intomainfrom
zhiyu/polish-deployment-skills

Conversation

@Edwardf0t1
Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 commented Apr 11, 2026

What does this PR do?

Type of change: Skills update

Add a debug loop guide for deploying unsupported models to the deployment skill. When deploying models not in the validated support matrix (e.g., newly quantized VLMs or models with new architectures like Devstral/ministral3), the inference framework (vLLM, SGLang, TRT-LLM) often fails during model init or weight loading.

This PR adds:

  • references/unsupported-models.md — a 5-step iterative debug workflow: run → read error → diagnose → patch framework source → re-run
  • A short pointer in SKILL.md under "Unsupported Models" (keeps SKILL.md concise, matching the PTQ skill's pattern)

The guide covers five common error categories with real-world examples:

  • Weight key mismatches (e.g., vllm#39406)
  • Quantized/unquantized layer confusion (e.g., sglang#18937)
  • Missing architecture support (e.g., ministral3 not handled in vLLM's mistral3.py)
  • Transformers version mismatches
  • Kernel-level issues (escalate to framework team)

Motivated by deploying a Devstral-Small-2-24B NVFP4 checkpoint on vLLM, where vLLM's mistral3.py didn't handle ministral3 as a text backbone model type.

Testing

Validated end-to-end: NVFP4 quantization of Devstral-Small-2-24B → vLLM deployment on B100 GPUs with the debug loop (3 iterations to get the server running).

Before your PR is "Ready for review"

  • Is this change backward compatible?: N/A (documentation only)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A (skill documentation)
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Documentation
    • Added guidance for deploying models outside the supported matrix, including troubleshooting steps for weight key mismatches, layer quantization conflicts, and architecture mapping issues.
    • Provided iterative debugging workflow with clear escalation criteria to the framework team for kernel-level issues.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 11, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

📝 Walkthrough

Walkthrough

Added documentation for handling unsupported model deployments. A new troubleshooting guide was created at references/unsupported-models.md with iterative debugging workflows, root-cause categorization (weight key mismatches, quantization confusion, missing architecture support, version mismatches, kernel issues), and remediation instructions. The main deployment skill document was updated to reference this guide.

Changes

Cohort / File(s) Summary
Deployment Documentation
.claude/skills/deployment/SKILL.md, .claude/skills/deployment/references/unsupported-models.md
Added "Unsupported Models" section to main skill document and created new deployment guide with iterative debug workflow, root-cause categorization (weight key mismatches, quantization confusion, architecture gaps, version mismatches, kernel issues), and targeted remediation instructions for each failure mode.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a debug workflow for unsupported models in the deployment skill. It is concise, specific, and clearly represents the primary purpose of this documentation-focused PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed This is a documentation-only PR with no Python code changes, no unsafe functions, and no new dependencies.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zhiyu/polish-deployment-skills

Comment @coderabbitai help to get the list of available commands and usage tips.

@Edwardf0t1 Edwardf0t1 marked this pull request as ready for review April 11, 2026 05:53
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1236/

Built to branch gh-pages at 2026-04-11 05:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.claude/skills/deployment/references/unsupported-models.md (1)

23-23: Avoid an absolute “never modify config/checkpoint” directive.

This is mostly good guidance, but written as an unconditional rule it can mislead when the checkpoint metadata/export is actually invalid. Consider softening to “prefer framework-side fixes first, unless checkpoint metadata is clearly malformed.”

✏️ Suggested wording
-Focus on **small, targeted patches** to the framework source. Do not modify `config.json` or the checkpoint — fix the framework's handling instead.
+Focus on **small, targeted patches** to the framework source. Prefer framework-side fixes first; only modify `config.json` or checkpoint artifacts when metadata/export defects are clearly identified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/deployment/references/unsupported-models.md at line 23,
Replace the absolute directive "Do not modify `config.json` or the checkpoint —
fix the framework's handling instead." with a softened conditional phrasing that
prefers framework-side fixes but allows editing checkpoint/export when its
metadata is demonstrably malformed; e.g., change to "Prefer small, targeted
framework fixes first; only edit `config.json` or checkpoint exports when the
checkpoint metadata is clearly invalid or irreparably corrupt." Ensure the new
phrasing appears in place of the original sentence in the same paragraph so
guidance remains focused on small, targeted patches while permitting exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 18: Update the grammar in the table row titled "Transformers version
mismatch": change the phrase "transformers that doesn't know the model type" to
"transformers that don't know the model type" so the subject-verb agreement is
correct; edit the string in the markdown table entry under the error explanation
for Transformers version mismatch.

---

Nitpick comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 23: Replace the absolute directive "Do not modify `config.json` or the
checkpoint — fix the framework's handling instead." with a softened conditional
phrasing that prefers framework-side fixes but allows editing checkpoint/export
when its metadata is demonstrably malformed; e.g., change to "Prefer small,
targeted framework fixes first; only edit `config.json` or checkpoint exports
when the checkpoint metadata is clearly invalid or irreparably corrupt." Ensure
the new phrasing appears in place of the original sentence in the same paragraph
so guidance remains focused on small, targeted patches while permitting
exceptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: af792be5-6e2a-415c-9d31-cd69b946a8a3

📥 Commits

Reviewing files that changed from the base of the PR and between 82cf851 and 2732ff4.

📒 Files selected for processing (2)
  • .claude/skills/deployment/SKILL.md
  • .claude/skills/deployment/references/unsupported-models.md

| **Weight key mismatch** | `KeyError`, `Unexpected key`, `Missing key` during weight loading | Checkpoint uses `model.language_model.layers.*` but framework expects `model.layers.*`. See [vllm#39406](https://github.com/vllm-project/vllm/pull/39406) |
| **Quantized/unquantized layer confusion** | Wrong layer type loaded, dtype errors, shape mismatches | Framework tries to load unquantized layers with FP4 kernel due to overly broad `quantization_config.ignore` patterns or missing ignore entries. See [sglang#18937](https://github.com/sgl-project/sglang/pull/18937) |
| **Missing architecture support** | `NoneType is not iterable`, `KeyError` on model type, unknown architecture | Framework's model handler doesn't recognize the text backbone type (e.g., `ministral3` not handled in vLLM's `mistral3.py` init). Fix: extend the model type mapping |
| **Transformers version mismatch** | `ImportError`, `KeyError` on config fields | Framework ships with older transformers that doesn't know the model type. Fix: upgrade transformers after installing the framework |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix grammar in the transformers mismatch guidance.

The sentence uses incorrect subject-verb agreement: “transformers that doesn’t know”. Update to “transformers that don’t know”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/deployment/references/unsupported-models.md at line 18,
Update the grammar in the table row titled "Transformers version mismatch":
change the phrase "transformers that doesn't know the model type" to
"transformers that don't know the model type" so the subject-verb agreement is
correct; edit the string in the markdown table entry under the error explanation
for Transformers version mismatch.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the deployment skill documentation to add guidance for debugging and deploying models that are not in the validated support matrix (e.g., new architectures or newly-quantized checkpoints) when vLLM/SGLang/TRT-LLM fail during initialization or weight loading.

Changes:

  • Add an “Unsupported Models” section to the deployment skill with a pointer to a dedicated debug-loop guide.
  • Add a new reference doc describing a 5-step iterative workflow (run → read error → diagnose → patch → re-run) and common failure categories.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
.claude/skills/deployment/SKILL.md Adds a concise “Unsupported Models” pointer to the new reference guide.
.claude/skills/deployment/references/unsupported-models.md New guide documenting an iterative debug loop and common error categories with examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,63 @@
# Deploying Unsupported Models

When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this reference doc, the path references/support-matrix.md is relative to the current file, which already lives under references/. On GitHub this will resolve to references/references/support-matrix.md (broken link). Use a relative link to the sibling file instead (e.g., support-matrix.md).

Suggested change
When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.
When deploying a model not in the validated support matrix (`support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.14%. Comparing base (9050188) to head (2732ff4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1236   +/-   ##
=======================================
  Coverage   72.14%   72.14%           
=======================================
  Files         350      350           
  Lines       40478    40478           
=======================================
  Hits        29202    29202           
  Misses      11276    11276           
Flag Coverage Δ
unit 55.53% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Edwardf0t1 added a commit that referenced this pull request Apr 12, 2026
- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval
  pipeline, workspace continuity, unsupported model handling, NEL
  deployment.command pattern, and NEL CI vs SLURM executor decision table
- Add cross-skill workspace flow to workspace-management.md
- Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation
- Add pipeline integration note to evaluation/SKILL.md

Depends on PR #1236 (deployment/references/unsupported-models.md).

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants