Add end-to-end workflow doc and cross-skill references

Edwardf0t1 · Edwardf0t1 · commit 2cb3b39cc5dd · 2026-04-12T13:55:38.000-07:00
- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval pipeline, workspace continuity, unsupported model handling, NEL deployment.command pattern, and NEL CI vs SLURM executor decision table - Add cross-skill workspace flow to workspace-management.md - Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation - Add pipeline integration note to evaluation/SKILL.md Depends on PR #1236 (deployment/references/unsupported-models.md). Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
diff --git a/.claude/skills/common/end-to-end-workflow.md b/.claude/skills/common/end-to-end-workflow.md
@@ -0,0 +1,70 @@
+# End-to-End Workflow: PTQ → Deploy → Eval
+
+This document ties together the three domain skills (PTQ, Deployment, Evaluation) for the common workflow of quantizing a model, deploying it, and evaluating accuracy.
+
+## Pipeline Overview
+
+```text
+PTQ (quantize)          → Deployment (serve)         → Evaluation (benchmark)
+─────────────────         ──────────────────           ────────────────────────
+hf_ptq.py                vLLM / SGLang / TRT-LLM      NEL (SLURM or JET)
+  ↓                         ↓                            ↓
+NVFP4/FP8 checkpoint      OpenAI-compatible API        MMLU, GSM8K, GPQA scores
+  (safetensors)            (http://host:8000)           (results.yml)
+```
+
+## Workspace Continuity
+
+All three stages share the same workspace directory. The PTQ output becomes the deployment input, and eval results land alongside:
+
+```text
+workspaces/model-name-format/
+  output/              ← PTQ checkpoint (safetensors + config.json)
+  eval_results/        ← NEL evaluation artifacts (results.yml per task)
+  eval_config.yaml     ← NEL config for evaluation
+  scripts/             ← Custom run scripts (if needed)
+  logs/                ← SLURM job logs
+```
+
+When starting a deployment or evaluation step, always check for an existing workspace from a prior PTQ run:
+
+```bash
+ls workspaces/
+```
+
+## Unsupported Models
+
+Models not in the verified support matrices require extra work at each stage:
+
+| Stage | What can go wrong | Reference |
+|-------|-------------------|-----------|
+| **PTQ** | Unknown architecture, FP8 source checkpoint, VLM structure | `ptq/references/unsupported-models.md` |
+| **Deployment** | Missing architecture mapping, weight key mismatches, quant/unquant layer confusion | `deployment/references/unsupported-models.md` |
+| **Evaluation** | Framework patches needed in deployment container, gated datasets, cluster storage | `evaluation/references/nel-ci-guide.md` |
+
+Each stage has its own debug loop (run → read error → diagnose → patch → re-run). Fixes from one stage often inform the next — e.g., if PTQ required a transformers upgrade, deployment and evaluation will too.
+
+## NEL Evaluation with Custom Deployments
+
+When the serving framework needs runtime patches (e.g., transformers upgrade, model handler fix), override `deployment.command` in the NEL config to inject fixes before serving:
+
+```yaml
+deployment:
+  command: >-
+    pip install "transformers>=5.0.0.dev0" --pre -q &&
+    sed -i 's/old_pattern/new_pattern/' /path/to/framework/file.py &&
+    ${deployment.base_command}
+```
+
+This works with both NEL SLURM executor and NEL CI (via `NEL_DEPLOYMENT_COMMAND`).
+
+## Decision: NEL SLURM Executor vs NEL CI (JET)
+
+| Factor | NEL SLURM executor | NEL CI (JET) |
+|--------|-------------------|--------------|
+| **When to use** | Iterative debugging, checkpoint on non-JET cluster, custom patches needed | Production evals, MLflow tracking, reproducible configs |
+| **Checkpoint location** | Any cluster you have SSH access to | Must be on JET cluster `/lustre/` storage |
+| **Secrets (HF_TOKEN, NGC)** | Provide your own via `host:` env vars | Managed centrally via JET secrets |
+| **Container patches** | Override `deployment.command` | Use `NEL_DEPLOYMENT_COMMAND` |
+| **MLflow export** | Manual setup | Automatic |
+| **Gated datasets** | Your HF account needs access | Handled by `COMPEVAL_HF_TOKEN` |
diff --git a/.claude/skills/common/workspace-management.md b/.claude/skills/common/workspace-management.md
@@ -92,6 +92,21 @@ rsync -a --quiet \
     "$MODELOPT_REPO_DIR/" "$MODELOPT_WORKSPACE_ROOT/<name>/"
 ```
 
+## Cross-Skill Workspace Flow
+
+Workspaces carry over across the PTQ → Deploy → Eval pipeline. Each stage adds to the same directory:
+
+```text
+workspaces/model-name-format/
+  output/              ← PTQ: quantized checkpoint
+  eval_results/        ← Evaluation: NEL artifacts (results.yml per task)
+  eval_config.yaml     ← Evaluation: NEL config
+  scripts/             ← Deployment/PTQ: custom run scripts
+  logs/                ← All: SLURM job logs
+```
+
+See `skills/common/end-to-end-workflow.md` for the full pipeline.
+
 ## Example Flow
 
 ```text
@@ -104,6 +119,10 @@ User: "deploy the model I just quantized"
 Agent: ls workspaces/ → sees "qwen3-0.6b-nvfp4"
        → reuse, find checkpoint at workspaces/qwen3-0.6b-nvfp4/output/
 
+User: "evaluate the quantized model on MMLU and GSM8K"
+Agent: ls workspaces/ → sees "qwen3-0.6b-nvfp4"
+       → reuse, write eval_config.yaml, results to workspaces/qwen3-0.6b-nvfp4/eval_results/
+
 User: "now quantize Llama-3.1-8B with fp8"
 Agent: ls workspaces/ → no llama
        → mkdir workspaces/llama-3.1-8b-fp8
diff --git a/.claude/skills/evaluation/SKILL.md b/.claude/skills/evaluation/SKILL.md
@@ -12,10 +12,12 @@ license: Apache-2.0
 
 You're an expert in NeMo Evaluator Launcher! Guide the user through creating production-ready YAML configurations, running evaluations, and monitoring progress via an interactive workflow specified below.
 
-### Workspace (multi-user / Slack bot)
+### Workspace and Pipeline Integration
 
 If `MODELOPT_WORKSPACE_ROOT` is set, read `skills/common/workspace-management.md`. Check for existing workspaces — especially if evaluating a model from a prior PTQ or deployment step. Reuse the existing workspace so you have access to the quantized checkpoint and any code modifications.
 
+This skill is often the final stage of the PTQ → Deploy → Eval pipeline. If the model required runtime patches during deployment (transformers upgrade, framework source fixes), carry those patches into the NEL config via `deployment.command`. See `skills/common/end-to-end-workflow.md` for the full pipeline.
+
 ### Workflow
 
 ```text
diff --git a/.claude/skills/ptq/SKILL.md b/.claude/skills/ptq/SKILL.md
@@ -113,6 +113,8 @@ ls -lh <output_path>/
 
 Report the path and size to the user.
 
+**Next steps**: If the user wants to deploy or evaluate the quantized checkpoint, use the **deployment** or **evaluation** skill. The checkpoint workspace carries over — see `skills/common/end-to-end-workflow.md` for the full PTQ → Deploy → Eval pipeline. If the model required patches during PTQ (e.g., transformers upgrade), the same fixes will likely be needed at deployment and evaluation time.
+
 ## Key API Rules
 
 - `mtq.register()` classes **must** define `_setup()` and call it from `__init__`