[1/N] Polish deployment skills - Add a debug loop for unsupported models by Edwardf0t1 · Pull Request #1236 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-04-11T05:50:52Z

What does this PR do?

Type of change: Skills update

Add a debug loop guide for deploying unsupported models to the deployment skill. When deploying models not in the validated support matrix (e.g., newly quantized VLMs or models with new architectures like Devstral/ministral3), the inference framework (vLLM, SGLang, TRT-LLM) often fails during model init or weight loading.

This PR adds:

references/unsupported-models.md — a 5-step iterative debug workflow: run → read error → diagnose → patch framework source → re-run
A short pointer in SKILL.md under "Unsupported Models" (keeps SKILL.md concise, matching the PTQ skill's pattern)

The guide covers five common error categories with real-world examples:

Weight key mismatches (e.g., vllm#39406)
Quantized/unquantized layer confusion (e.g., sglang#18937)
Missing architecture support (e.g., ministral3 not handled in vLLM's mistral3.py)
Transformers version mismatches
Kernel-level issues (escalate to framework team)

Motivated by deploying a Devstral-Small-2-24B NVFP4 checkpoint on vLLM, where vLLM's mistral3.py didn't handle ministral3 as a text backbone model type.

Testing

Validated end-to-end: NVFP4 quantization of Devstral-Small-2-24B → vLLM deployment on B100 GPUs with the debug loop (3 iterations to get the server running).

Before your PR is "Ready for review"

Is this change backward compatible?: N/A (documentation only)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A (skill documentation)
Did you update Changelog?: N/A

Summary by CodeRabbit

Documentation
- Added guidance for deploying models outside the supported matrix, including troubleshooting steps for weight key mismatches, layer quantization conflicts, and architecture mapping issues.
- Provided iterative debugging workflow with clear escalation criteria to the framework team for kernel-level issues.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

copy-pr-bot · 2026-04-11T05:50:55Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-11T05:50:59Z

📝 Walkthrough

Walkthrough

Added documentation for handling unsupported model deployments. A new troubleshooting guide was created at references/unsupported-models.md with iterative debugging workflows, root-cause categorization (weight key mismatches, quantization confusion, missing architecture support, version mismatches, kernel issues), and remediation instructions. The main deployment skill document was updated to reference this guide.

Changes

Cohort / File(s)	Summary
Deployment Documentation `.claude/skills/deployment/SKILL.md`, `.claude/skills/deployment/references/unsupported-models.md`	Added "Unsupported Models" section to main skill document and created new deployment guide with iterative debug workflow, root-cause categorization (weight key mismatches, quantization confusion, architecture gaps, version mismatches, kernel issues), and targeted remediation instructions for each failure mode.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a debug workflow for unsupported models in the deployment skill. It is concise, specific, and clearly represents the primary purpose of this documentation-focused PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	This is a documentation-only PR with no Python code changes, no unsafe functions, and no new dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zhiyu/polish-deployment-skills

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-11T05:55:19Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1236/
Built to branch `gh-pages` at 2026-04-11 05:54 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.claude/skills/deployment/references/unsupported-models.md (1)

23-23: Avoid an absolute “never modify config/checkpoint” directive.

This is mostly good guidance, but written as an unconditional rule it can mislead when the checkpoint metadata/export is actually invalid. Consider softening to “prefer framework-side fixes first, unless checkpoint metadata is clearly malformed.”

✏️ Suggested wording

-Focus on **small, targeted patches** to the framework source. Do not modify `config.json` or the checkpoint — fix the framework's handling instead.
+Focus on **small, targeted patches** to the framework source. Prefer framework-side fixes first; only modify `config.json` or checkpoint artifacts when metadata/export defects are clearly identified.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/deployment/references/unsupported-models.md at line 23,
Replace the absolute directive "Do not modify `config.json` or the checkpoint —
fix the framework's handling instead." with a softened conditional phrasing that
prefers framework-side fixes but allows editing checkpoint/export when its
metadata is demonstrably malformed; e.g., change to "Prefer small, targeted
framework fixes first; only edit `config.json` or checkpoint exports when the
checkpoint metadata is clearly invalid or irreparably corrupt." Ensure the new
phrasing appears in place of the original sentence in the same paragraph so
guidance remains focused on small, targeted patches while permitting exceptions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 18: Update the grammar in the table row titled "Transformers version
mismatch": change the phrase "transformers that doesn't know the model type" to
"transformers that don't know the model type" so the subject-verb agreement is
correct; edit the string in the markdown table entry under the error explanation
for Transformers version mismatch.

---

Nitpick comments:
In @.claude/skills/deployment/references/unsupported-models.md:
- Line 23: Replace the absolute directive "Do not modify `config.json` or the
checkpoint — fix the framework's handling instead." with a softened conditional
phrasing that prefers framework-side fixes but allows editing checkpoint/export
when its metadata is demonstrably malformed; e.g., change to "Prefer small,
targeted framework fixes first; only edit `config.json` or checkpoint exports
when the checkpoint metadata is clearly invalid or irreparably corrupt." Ensure
the new phrasing appears in place of the original sentence in the same paragraph
so guidance remains focused on small, targeted patches while permitting
exceptions.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: af792be5-6e2a-415c-9d31-cd69b946a8a3

📥 Commits

Reviewing files that changed from the base of the PR and between 82cf851 and 2732ff4.

📒 Files selected for processing (2)

.claude/skills/deployment/SKILL.md
.claude/skills/deployment/references/unsupported-models.md

coderabbitai · 2026-04-11T05:55:25Z

.claude/skills/deployment/references/unsupported-models.md

+| **Weight key mismatch** | `KeyError`, `Unexpected key`, `Missing key` during weight loading | Checkpoint uses `model.language_model.layers.*` but framework expects `model.layers.*`. See [vllm#39406](https://github.com/vllm-project/vllm/pull/39406) |
+| **Quantized/unquantized layer confusion** | Wrong layer type loaded, dtype errors, shape mismatches | Framework tries to load unquantized layers with FP4 kernel due to overly broad `quantization_config.ignore` patterns or missing ignore entries. See [sglang#18937](https://github.com/sgl-project/sglang/pull/18937) |
+| **Missing architecture support** | `NoneType is not iterable`, `KeyError` on model type, unknown architecture | Framework's model handler doesn't recognize the text backbone type (e.g., `ministral3` not handled in vLLM's `mistral3.py` init). Fix: extend the model type mapping |
+| **Transformers version mismatch** | `ImportError`, `KeyError` on config fields | Framework ships with older transformers that doesn't know the model type. Fix: upgrade transformers after installing the framework |


⚠️ Potential issue | 🟡 Minor

Fix grammar in the transformers mismatch guidance.

The sentence uses incorrect subject-verb agreement: “transformers that doesn’t know”. Update to “transformers that don’t know”.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/deployment/references/unsupported-models.md at line 18, Update the grammar in the table row titled "Transformers version mismatch": change the phrase "transformers that doesn't know the model type" to "transformers that don't know the model type" so the subject-verb agreement is correct; edit the string in the markdown table entry under the error explanation for Transformers version mismatch.

Copilot

Pull request overview

This PR updates the deployment skill documentation to add guidance for debugging and deploying models that are not in the validated support matrix (e.g., new architectures or newly-quantized checkpoints) when vLLM/SGLang/TRT-LLM fail during initialization or weight loading.

Changes:

Add an “Unsupported Models” section to the deployment skill with a pointer to a dedicated debug-loop guide.
Add a new reference doc describing a 5-step iterative workflow (run → read error → diagnose → patch → re-run) and common failure categories.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`.claude/skills/deployment/SKILL.md`	Adds a concise “Unsupported Models” pointer to the new reference guide.
`.claude/skills/deployment/references/unsupported-models.md`	New guide documenting an iterative debug loop and common error categories with examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T05:55:28Z

.claude/skills/deployment/references/unsupported-models.md

@@ -0,0 +1,63 @@
+# Deploying Unsupported Models
+
+When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.


In this reference doc, the path references/support-matrix.md is relative to the current file, which already lives under references/. On GitHub this will resolve to references/references/support-matrix.md (broken link). Use a relative link to the sibling file instead (e.g., support-matrix.md).

Suggested change

When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.

When deploying a model not in the validated support matrix (`support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.

codecov · 2026-04-11T06:02:43Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.14%. Comparing base (9050188) to head (2732ff4).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1236   +/-   ##
=======================================
  Coverage   72.14%   72.14%           
=======================================
  Files         350      350           
  Lines       40478    40478           
=======================================
  Hits        29202    29202           
  Misses      11276    11276

Flag	Coverage Δ
unit	`55.53% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval pipeline, workspace continuity, unsupported model handling, NEL deployment.command pattern, and NEL CI vs SLURM executor decision table - Add cross-skill workspace flow to workspace-management.md - Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation - Add pipeline integration note to evaluation/SKILL.md Depends on PR #1236 (deployment/references/unsupported-models.md). Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

add a debug loop in deployment skills for unsupported models

2732ff4

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 marked this pull request as ready for review April 11, 2026 05:53

Edwardf0t1 requested review from Copilot, kaix-nv, mxinO and shengliangxu April 11, 2026 05:53

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Edwardf0t1 mentioned this pull request Apr 12, 2026

[1/N] Polish evaluation skills and common skills based on an E2E workflow testing #1239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236

[1/N] Polish deployment skills - Add a debug loop for unsupported models#1236
Edwardf0t1 wants to merge 1 commit intomainfrom
zhiyu/polish-deployment-skills

Edwardf0t1 commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 11, 2026

Uh oh!

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 11, 2026

Built to branch `gh-pages` at 2026-04-11 05:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

codecov bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,63 @@
		# Deploying Unsupported Models

		When deploying a model not in the validated support matrix (`references/support-matrix.md`), expect failures. This guide covers the iterative debug loop for getting unsupported models running on vLLM, SGLang, or TRT-LLM.

Conversation

Edwardf0t1 commented Apr 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 11, 2026

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 11, 2026

Built to branch gh-pages at 2026-04-11 05:54 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Edwardf0t1 commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-11 05:54 UTC.
Preview will be ready when the GitHub Pages deployment is complete.