[1/N] Polish evaluation skills and common skills based on an E2E workflow testing by Edwardf0t1 · Pull Request #1239 · NVIDIA/Model-Optimizer

Edwardf0t1 · 2026-04-12T09:37:59Z

What does this PR do?

Type of change: Documentation

Polish evaluation and common skills based on end-to-end experience quantizing + deploying + evaluating Devstral-Small-2-24B-Instruct (NVFP4 MLP-only) on dlcluster B100 GPUs.

Depends on PR #1236 (creates deployment/references/unsupported-models.md referenced here). Merge #1236 first.

Changes

New files:

common/end-to-end-workflow.md — Cross-skill pipeline doc covering PTQ → Deploy → Eval workflow, workspace continuity, unsupported model handling across stages, NEL deployment.command pattern for runtime patches, and NEL CI vs SLURM executor decision table
evaluation/references/nel-ci-guide.md — Comprehensive NEL CI evaluation guide covering:
- NEL SLURM executor vs NEL CI (JET) execution paths
- Cluster reference table (oci-hsg, cw, oci-nrt, dlcluster) with GPUs, storage, walltime
- Checkpoint storage availability per cluster type (not all use /lustre/)
- NEL CI GitLab trigger pattern with NEL_DEPLOYMENT_COMMAND
- Env var prefixes (host:, lit:) for SLURM executor vs JET secrets
- Gated dataset handling (JET manages secrets centrally; SLURM needs user's own HF_TOKEN + dataset access)
- SGLang --host 0.0.0.0 requirement
- Common issues table (NGC auth, permissions, walltime, TP mismatch, etc.)
- JET cluster directory setup with chmod 777 for svc-jet

Updated files:

common/remote-execution.md — Added "Checkpoint and storage availability" section (compute nodes may not share login node filesystems; dlcluster uses /home/omniml_data_*, not /lustre/)
common/slurm-setup.md — Added "Container registry credentials (pyxis)" section for NGC enroot auth setup
common/workspace-management.md — Added "Cross-Skill Workspace Flow" showing directory layout across PTQ/deploy/eval stages
evaluation/SKILL.md — Added "NEL CI and Cluster-Specific Notes" pointer section; renamed workspace section to include pipeline integration note
ptq/SKILL.md — Added "Next steps" after Step 5 pointing to deployment/evaluation skills + end-to-end workflow

Motivation

Learnings from:

Quantizing Devstral-Small-2-24B (FP8 VLM → NVFP4 MLP-only) on dlcluster
Deploying on vLLM with ministral3 architecture patch
Evaluating with NEL SLURM executor (MMLU, GSM8K, GPQA) — including NGC auth, gated dataset access, env var prefixes
Previous NEL CI experience on oci-hsg cluster (from internal doc)

Testing

Validated end-to-end: PTQ (6 min) → vLLM deployment (3 debug iterations) → NEL evaluation (MMLU 77.4%, GSM8K 80%, GPQA 40% on limit_samples=10).

Before your PR is "Ready for review"

Is this change backward compatible?: N/A (documentation only)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A (skill documentation)
Did you update Changelog?: N/A

Summary by CodeRabbit

Documentation
- Added an end-to-end PTQ → Deploy → Eval workflow with shared workspace layout and artifact locations
- Added checkpoint/storage availability guidance with cluster-specific paths and staging commands
- Documented container registry authentication for SLURM and an observed unauthorized failure mode
- Added runtime deployment-fix guidance (overrideable command/wrapper pattern) and SLURM vs CI decision table
- Expanded CI/cluster notes, troubleshooting, gated dataset handling, and PTQ next-step pointers for deployment/evaluation

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

copy-pr-bot · 2026-04-12T09:38:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-12T09:38:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 965a0781-79e4-454d-b5cc-1d278c6edab6

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcede4 and 8176fc7.

📒 Files selected for processing (1)

.claude/skills/evaluation/references/nel-ci-guide.md

✅ Files skipped from review due to trivial changes (1)

.claude/skills/evaluation/references/nel-ci-guide.md

📝 Walkthrough

Walkthrough

Adds documentation tying PTQ → Deployment → Evaluation into a single pipeline: workspace conventions and artifact locations, cluster/CI execution guidance (SLURM vs NEL CI), registry/auth instructions, workspace persistence across stages, runtime deployment override patterns, and operational troubleshooting for JET and SLURM.

Changes

Cohort / File(s)	Summary
End-to-end workflow `.claude/skills/common/end-to-end-workflow.md`	New doc defining the PTQ → Deployment → Evaluation pipeline, shared workspace layout (`workspaces/<model>/`), expected PTQ/deploy/eval artifact locations, unsupported-model failure modes, and deployment override mechanisms (`deployment.command`, `NEL_DEPLOYMENT_COMMAND`).
Cluster & remote execution `.claude/skills/common/remote-execution.md`, `.claude/skills/common/slurm-setup.md`, `.claude/skills/evaluation/references/nel-ci-guide.md`	Added checkpoint accessibility checks and cluster storage path mappings with copy examples; Pyxis/Enroot registry auth steps and noted `srun` 401 failure mode; comprehensive NEL CI guide covering SLURM vs GitLab flows, GPU/tensor-parallel sizing, env-var/secret formats, gated dataset behavior, checkpoint transfer commands, and JET-specific operational snippets.
Workspace & pipeline integration `.claude/skills/common/workspace-management.md`, `.claude/skills/ptq/SKILL.md`	Documented cross-skill workspace flow persisting PTQ outputs and adding `eval_results/` + `eval_config.yaml`; updated example flow to reuse workspaces; PTQ SKILL now points users to deployment/evaluation next steps and notes carrying PTQ patches forward.
Evaluation guidance updates `.claude/skills/evaluation/SKILL.md`	Inserted guidance to carry deployment-time runtime patches into NEL config via `deployment.command`; added “NEL CI and Cluster-Specific Notes” checklist referencing `nel-ci-guide.md` and cluster/CI operational considerations.

Sequence Diagram(s)

sequenceDiagram
    participant User as User / Agent
    participant PTQ as PTQ Job
    participant WS as Workspace\n(workspaces/<model>)
    participant Deploy as Deployment Job\n(NEL/SLURM/CI)
    participant Serve as Model Server
    participant Eval as Evaluation Job
    participant Storage as Cluster Storage

    User->>PTQ: submit PTQ (produce quantized checkpoint -> WS/output/)
    PTQ->>WS: write checkpoint & artifacts
    User->>Deploy: trigger deployment (consumes WS/output/, may set deployment.command / NEL_DEPLOYMENT_COMMAND)
    Deploy->>Serve: stage model and start serving
    Serve->>Eval: run evaluation tasks (reads from Serve, writes WS/eval_results/)
    Eval->>WS: write eval artifacts and eval_config.yaml
    Eval->>Storage: persist final results

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main change: polishing evaluation and common skills based on end-to-end workflow testing, which aligns with the primary objective of documenting learnings from PTQ → Deploy → Eval pipeline validation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	PR contains only documentation changes to markdown files in .claude/skills/ directory; no Python code or configuration files modified.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zhiyu/polish-eval-skills

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-12T09:42:54Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1239/
Built to branch `gh-pages` at 2026-04-13 05:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

codecov · 2026-04-12T09:59:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.91%. Comparing base (0357cb9) to head (8176fc7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1239   +/-   ##
=======================================
  Coverage   76.91%   76.91%           
=======================================
  Files         350      350           
  Lines       40480    40480           
=======================================
  Hits        31137    31137           
  Misses       9343     9343

Flag	Coverage Δ
unit	`55.51% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add common/end-to-end-workflow.md documenting the PTQ → Deploy → Eval pipeline, workspace continuity, unsupported model handling, NEL deployment.command pattern, and NEL CI vs SLURM executor decision table - Add cross-skill workspace flow to workspace-management.md - Add "Next steps" to ptq/SKILL.md pointing to deployment/evaluation - Add pipeline integration note to evaluation/SKILL.md Depends on PR #1236 (deployment/references/unsupported-models.md). Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Copilot

Pull request overview

Adds end-to-end documentation that connects the PTQ, deployment, and evaluation “skills”, with cluster-specific guidance for running NeMo Evaluator Launcher (NEL) on SLURM vs JET/NEL CI.

Changes:

Added a new PTQ → Deploy → Eval workflow doc and cross-skill workspace flow guidance.
Added a comprehensive NEL CI / cluster guide (JET vs SLURM executor, storage paths, env var formats, common issues).
Updated existing skill docs to point users to the new workflow and evaluation references.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`.claude/skills/ptq/SKILL.md`	Adds a “Next steps” pointer into the end-to-end pipeline doc.
`.claude/skills/evaluation/SKILL.md`	Adds pipeline integration notes and a pointer to the new NEL CI guide.
`.claude/skills/evaluation/references/nel-ci-guide.md`	New detailed guide for NEL CI/JET vs SLURM executor usage, clusters, storage, and troubleshooting.
`.claude/skills/common/workspace-management.md`	Documents how a single workspace is reused across PTQ/deploy/eval stages.
`.claude/skills/common/slurm-setup.md`	Adds enroot/pyxis private registry credential setup for nvcr.io.
`.claude/skills/common/remote-execution.md`	Adds compute-node storage availability guidance to prevent “checkpoint not found” issues.
`.claude/skills/common/end-to-end-workflow.md`	New end-to-end workflow doc tying the three skills together.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.claude/skills/common/end-to-end-workflow.md

+| Stage | What can go wrong | Reference |
+|-------|-------------------|-----------|
+| **PTQ** | Unknown architecture, FP8 source checkpoint, VLM structure | `ptq/references/unsupported-models.md` |
+| **Deployment** | Missing architecture mapping, weight key mismatches, quant/unquant layer confusion | `deployment/references/unsupported-models.md` |


.claude/skills/common/end-to-end-workflow.md

+When the serving framework needs runtime patches (e.g., transformers upgrade, model handler fix), override `deployment.command` in the NEL config to inject fixes before serving:
+
+```yaml
+deployment:
+  command: >-
+    pip install "transformers>=5.0.0.dev0" --pre -q &&


.claude/skills/evaluation/references/nel-ci-guide.md

+```bash
+export GITLAB_TOKEN=<your_gitlab_token>
+
+curl -k --request POST \


.claude/skills/evaluation/references/nel-ci-guide.md

+If the checkpoint is on a workstation, **copy it to cluster storage first**:
+
+```bash
+rsync -av /path/to/local/checkpoint \
+    <cluster-login>:/lustre/fsw/portfolios/coreai/users/$USER/checkpoints/
+```
+
+For dlcluster, the checkpoint paths are directly accessible since the NFS mounts are shared between login and compute nodes.
+


.claude/skills/evaluation/references/nel-ci-guide.md

+mkdir -p /lustre/fsw/portfolios/coreai/users/$USER/cache/huggingface
+mkdir -p /lustre/fsw/portfolios/coreai/users/$USER/cache/vllm
+mkdir -p /lustre/fsw/portfolios/coreai/users/$USER/nv-eval-rundirs
+chmod 777 /lustre/fsw/portfolios/coreai/users/$USER/cache/huggingface
+chmod 777 /lustre/fsw/portfolios/coreai/users/$USER/cache/vllm
+chmod 777 /lustre/fsw/portfolios/coreai/users/$USER/nv-eval-rundirs
+```
+
+`chmod 777` is required because `svc-jet` (JET service account) runs containers and needs write access.


.claude/skills/common/slurm-setup.md

+# To add NGC credentials:
+mkdir -p ~/.config/enroot
+echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>' > ~/.config/enroot/.credentials


.claude/skills/evaluation/references/nel-ci-guide.md

+
+- Binds to `0.0.0.0` by default — health checks work out of the box
+- For NVFP4: `--quantization modelopt_fp4`
+- For unsupported models (e.g., ministral3): may need custom `deployment.command` to patch the framework before serving (see `deployment/references/unsupported-models.md`)


coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.claude/skills/common/slurm-setup.md (1)

62-62: Clarify that $oauthtoken is a literal string.

Users might misinterpret $oauthtoken as a shell variable that needs to be set. Consider adding a brief note that $oauthtoken is NGC's required literal login string (not a variable).

📝 Suggested clarification

 # To add NGC credentials:
 mkdir -p ~/.config/enroot
-echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>' > ~/.config/enroot/.credentials
+# Note: '$oauthtoken' is a literal string required by NGC (not a shell variable)
+echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>' > ~/.config/enroot/.credentials
 chmod 600 ~/.config/enroot/.credentials

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.claude/skills/common/slurm-setup.md at line 62, The line that writes
credentials uses the token string "$oauthtoken" which readers may think is a
shell variable; update the documentation line that currently shows echo 'machine
nvcr.io login $oauthtoken password <NGC_API_KEY>' >
~/.config/enroot/.credentials to explicitly state that $oauthtoken is the
literal NGC login string (not a shell variable) and add a short parenthetical
like "(use the literal string $oauthtoken as NGC requires, do not replace with a
shell variable)" or escape it in the example so readers understand it's not to
be substituted; reference the exact example command text "machine nvcr.io login
$oauthtoken password <NGC_API_KEY>" when making the clarification.

.claude/skills/evaluation/references/nel-ci-guide.md (1)

178-193: Consider noting the security implications of chmod 777.

While chmod 777 is necessary for the JET service account to access these directories (as explained), it makes them world-writable. Consider adding a brief note about the security trade-off, or suggesting users scope these permissions to only the directories actually needed for evaluation runs.
📝 Suggested addition
 chmod 777 /lustre/fsw/portfolios/coreai/users/$USER/nv-eval-rundirs
+
+# Note: chmod 777 is required for svc-jet service account access, but makes
+# directories world-writable. Limit to only the directories needed for your runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/evaluation/references/nel-ci-guide.md around lines 178 - 193,
Add a short note after the chmod 777 commands explaining the security trade-off
of making those paths world-writable and suggest safer alternatives: prefer
scoping permissions to the service account or group (e.g., chown svc-jet or set
group ownership and use chmod 770/770-like perms), or use filesystem ACLs to
grant only the svc-jet account write access to the listed directories
(/lustre/fsw/portfolios/coreai/users/$USER/cache/huggingface, /.../cache/vllm,
/.../nv-eval-rundirs) instead of blanket chmod 777.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/common/end-to-end-workflow.md:
- Around line 47-59: The doc currently claims the deployment.command override
works "with both NEL SLURM executor and NEL CI (via NEL_DEPLOYMENT_COMMAND)";
remove or amend this to avoid asserting unsupported CI support: either delete
the "NEL CI (via NEL_DEPLOYMENT_COMMAND)" clause or add a clear note that using
NEL_DEPLOYMENT_COMMAND is a custom/unsupported extension (not part of official
NEL executors) and list only the officially supported executors (e.g., NEL
SLURM, local, Lepton AI); update the text around deployment.command and
NEL_DEPLOYMENT_COMMAND to reflect this change.

---

Nitpick comments:
In @.claude/skills/common/slurm-setup.md:
- Line 62: The line that writes credentials uses the token string "$oauthtoken"
which readers may think is a shell variable; update the documentation line that
currently shows echo 'machine nvcr.io login $oauthtoken password <NGC_API_KEY>'
> ~/.config/enroot/.credentials to explicitly state that $oauthtoken is the
literal NGC login string (not a shell variable) and add a short parenthetical
like "(use the literal string $oauthtoken as NGC requires, do not replace with a
shell variable)" or escape it in the example so readers understand it's not to
be substituted; reference the exact example command text "machine nvcr.io login
$oauthtoken password <NGC_API_KEY>" when making the clarification.

In @.claude/skills/evaluation/references/nel-ci-guide.md:
- Around line 178-193: Add a short note after the chmod 777 commands explaining
the security trade-off of making those paths world-writable and suggest safer
alternatives: prefer scoping permissions to the service account or group (e.g.,
chown svc-jet or set group ownership and use chmod 770/770-like perms), or use
filesystem ACLs to grant only the svc-jet account write access to the listed
directories (/lustre/fsw/portfolios/coreai/users/$USER/cache/huggingface,
/.../cache/vllm, /.../nv-eval-rundirs) instead of blanket chmod 777.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c8b9ea9-5ab6-43f7-96a8-fb58f0a1804e

📥 Commits

Reviewing files that changed from the base of the PR and between 0357cb9 and 2cb3b39.

📒 Files selected for processing (7)

.claude/skills/common/end-to-end-workflow.md
.claude/skills/common/remote-execution.md
.claude/skills/common/slurm-setup.md
.claude/skills/common/workspace-management.md
.claude/skills/evaluation/SKILL.md
.claude/skills/evaluation/references/nel-ci-guide.md
.claude/skills/ptq/SKILL.md

coderabbitai · 2026-04-12T21:03:34Z

.claude/skills/common/end-to-end-workflow.md

+## NEL Evaluation with Custom Deployments
+
+When the serving framework needs runtime patches (e.g., transformers upgrade, model handler fix), override `deployment.command` in the NEL config to inject fixes before serving:
+
+```yaml
+deployment:
+  command: >-
+    pip install "transformers>=5.0.0.dev0" --pre -q &&
+    sed -i 's/old_pattern/new_pattern/' /path/to/framework/file.py &&
+    ${deployment.base_command}
+```
+
+This works with both NEL SLURM executor and NEL CI (via `NEL_DEPLOYMENT_COMMAND`).


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does NeMo Evaluator Launcher (NEL) support overriding deployment.command in both SLURM executor and GitLab CI modes?

💡 Result:

Yes, NeMo Evaluator Launcher (NEL) supports overriding deployment.command in SLURM executor mode. No evidence found for GitLab CI mode support; official docs only document local, Slurm, and Lepton AI executors.

Citations:

1: https://docs.nvidia.com/nemo/evaluator/latest/deployment/launcher-orchestrated/slurm.html

2: https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator-launcher/configuration/executors/slurm.html

3: https://docs.nvidia.com/nemo/evaluator/latest/libraries/nemo-evaluator-launcher/index.html

Remove or verify unsupported CI mode claim for deployment.command override.

Line 59 claims the deployment.command override pattern works with "both NEL SLURM executor and NEL CI (via NEL_DEPLOYMENT_COMMAND)". However, official NEL documentation only documents support for SLURM executor, local, and Lepton AI executors—not GitLab CI. Either remove the CI claim or add clear documentation that this is an extension beyond officially supported NEL features.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/common/end-to-end-workflow.md around lines 47 - 59, The doc currently claims the deployment.command override works "with both NEL SLURM executor and NEL CI (via NEL_DEPLOYMENT_COMMAND)"; remove or amend this to avoid asserting unsupported CI support: either delete the "NEL CI (via NEL_DEPLOYMENT_COMMAND)" clause or add a clear note that using NEL_DEPLOYMENT_COMMAND is a custom/unsupported extension (not part of official NEL executors) and list only the officially supported executors (e.g., NEL SLURM, local, Lepton AI); update the text around deployment.command and NEL_DEPLOYMENT_COMMAND to reflect this change.

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

…ra escaping - Add wrapper script pattern for complex deployment commands that break Hydra's override parser (put serve.sh in checkpoint dir, reference as bash /checkpoint/serve.sh) - Add NEL_CONFIG_BASE64 + Python trigger pattern for custom configs - Add cross-cluster checkpoint copy via tar pipe - Add Hydra LexerNoViableAltException and Bad Request to common issues Learned from triggering full AA evaluation (MMLU-PRO, GPQA Diamond, LiveCodeBench, SCICODE, AIME 2025, Terminal-Bench Hard) for Devstral-Small-2-24B NVFP4 on oci-hsg via NEL CI. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

.claude/skills/evaluation/references/nel-ci-guide.md (1)
70-70: Prefer least-privilege wording for chmod 777 guidance

Given this is shared infra guidance, consider documenting 777 as a fallback-only option and prefer group ACL/owner-group write where possible to reduce accidental overexposure.

Also applies to: 128-128, 270-275
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/evaluation/references/nel-ci-guide.md at line 70, Replace the
blanket "chmod -R 777 /lustre/.../checkpoints/model-name" guidance with
least-privilege instructions: set the directory owner/group to svc-jet (chown
svc-jet:svc-jet) and grant group write/execute (chmod g+rwX) or use POSIX ACLs
(setfacl) to grant only svc-jet the required access, and note that 777 should be
documented as a last-resort fallback; apply the same change to the other
occurrences referenced (lines 128 and 270-275) and ensure examples mention using
setfacl or group-owner changes rather than recommending 777.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/evaluation/references/nel-ci-guide.md:
- Line 87: Remove the insecure TLS bypass from the curl examples by deleting the
"-k" flag in the command that currently appears as "curl -k --request POST" (and
the other occurrence) and either leave the example using standard TLS
verification or replace it with an explicit documented option to set a CA bundle
(e.g., mention "--cacert /path/to/corporate-ca.pem" or a pinned CA path) so the
examples do not send PRIVATE-TOKEN over disabled verification.
- Around line 133-135: The fenced code block containing the JSON snippet with
the key "NEL_DEPLOYMENT_COMMAND" should include a language tag to satisfy
markdownlint MD040; update the block fence to use ```json so the snippet
({"key": "NEL_DEPLOYMENT_COMMAND", "value": "bash /checkpoint/serve.sh"}) is
marked as JSON, ensuring the markdown linter recognizes the language.

---

Nitpick comments:
In @.claude/skills/evaluation/references/nel-ci-guide.md:
- Line 70: Replace the blanket "chmod -R 777 /lustre/.../checkpoints/model-name"
guidance with least-privilege instructions: set the directory owner/group to
svc-jet (chown svc-jet:svc-jet) and grant group write/execute (chmod g+rwX) or
use POSIX ACLs (setfacl) to grant only svc-jet the required access, and note
that 777 should be documented as a last-resort fallback; apply the same change
to the other occurrences referenced (lines 128 and 270-275) and ensure examples
mention using setfacl or group-owner changes rather than recommending 777.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2e6188dc-54d7-48bd-8e55-d7d6b0fc9ed8

📥 Commits

Reviewing files that changed from the base of the PR and between 1b94fc9 and b1be817.

📒 Files selected for processing (1)

.claude/skills/evaluation/references/nel-ci-guide.md

coderabbitai · 2026-04-13T04:43:39Z

.claude/skills/evaluation/references/nel-ci-guide.md

+```bash
+export GITLAB_TOKEN=<your_gitlab_token>
+
+curl -k --request POST \


⚠️ Potential issue | 🟠 Major

Avoid curl -k in token-bearing API examples

Both trigger examples disable TLS verification while transmitting PRIVATE-TOKEN, which normalizes an insecure pattern. Please remove -k (or document a pinned CA/corporate CA path explicitly) in the primary examples.

Proposed doc fix

-curl -k --request POST \ +curl --request POST \

- ["curl", "-k", "--request", "POST", + ["curl", "--request", "POST",

Also applies to: 166-166

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/evaluation/references/nel-ci-guide.md at line 87, Remove the insecure TLS bypass from the curl examples by deleting the "-k" flag in the command that currently appears as "curl -k --request POST" (and the other occurrence) and either leave the example using standard TLS verification or replace it with an explicit documented option to set a CA bundle (e.g., mention "--cacert /path/to/corporate-ca.pem" or a pinned CA path) so the examples do not send PRIVATE-TOKEN over disabled verification.

.claude/skills/evaluation/references/nel-ci-guide.md

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

When using NEL_DEPLOYMENT_COMMAND with a custom --served-model-name, deployment.served_model_name must also be overridden via NEL_OTHER_OVERRIDES — NEL uses the config field (not the actual serve command) to set the eval client's model_id. Without this, the client sends the checkpoint path as model_id, causing 404 errors. Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 added 2 commits April 12, 2026 02:35

Polish eval skills

4bf8253

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Polish eval skills

2e84f3b

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

update

e952bcd

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 marked this pull request as ready for review April 12, 2026 20:58

Edwardf0t1 self-assigned this Apr 12, 2026

Edwardf0t1 requested review from Copilot, kaix-nv, mxinO and shengliangxu April 12, 2026 20:58

Copilot started reviewing on behalf of Edwardf0t1 April 12, 2026 20:58 View session

Edwardf0t1 changed the title ~~[1/N] Polish evaluation skills~~ [1/N] Polish evaluation skills and common skills based on an E2E workflow testing Apr 12, 2026

Copilot AI reviewed Apr 12, 2026

View reviewed changes

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Edwardf0t1 added 2 commits April 12, 2026 17:55

fix format

1b94fc9

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Edwardf0t1 added 2 commits April 12, 2026 21:44

fix format

7dcede4

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N] Polish evaluation skills and common skills based on an E2E workflow testing#1239

[1/N] Polish evaluation skills and common skills based on an E2E workflow testing#1239
Edwardf0t1 wants to merge 8 commits intomainfrom
zhiyu/polish-eval-skills

Edwardf0t1 commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

github-actions bot commented Apr 12, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-13 05:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Edwardf0t1 commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Motivation

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

github-actions bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-13 05:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Edwardf0t1 commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

github-actions bot commented Apr 12, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-13 05:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Apr 12, 2026 •

edited

Loading