Skip to content

evals: add benchmark evals for cuopt-developer skill#1399

Open
rgsl888prabhu wants to merge 1 commit into
mainfrom
add-benchmark-evals-cuopt-developer
Open

evals: add benchmark evals for cuopt-developer skill#1399
rgsl888prabhu wants to merge 1 commit into
mainfrom
add-benchmark-evals-cuopt-developer

Conversation

@rgsl888prabhu
Copy link
Copy Markdown
Collaborator

@rgsl888prabhu rgsl888prabhu commented Jun 5, 2026

Adds skills/cuopt-developer/evals/evals.json with 3 eval cases covering the benchmarks/ folder.

@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner June 5, 2026 20:09
@rgsl888prabhu rgsl888prabhu requested a review from tmckayus June 5, 2026 20:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Linter diff in the way? Review this PR in Change Stack to focus on meaningful changes and expand context only when needed.

Review Change Stack

📝 Walkthrough

Walkthrough

A new evaluation registry file is added containing three benchmark evaluation entries for cuOpt developer skill assessment: Mittelmann LP benchmark with LP-specific build configuration, MIPLIB benchmark with per-instance logging and time limits, and multi-GPU MIPLIB batching scenario with machine-level instance distribution.

Changes

Benchmark Evaluation Configurations

Layer / File(s) Summary
New benchmark evaluation entries
skills/cuopt-developer/evals/evals.json
Three structured evaluation items define benchmark questions, required build flags and environment setup in ground_truth, and expected response checklist items in expected_behavior for Mittelmann LP, MIPLIB, and multi-GPU MIPLIB batching scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested labels

non-breaking, improvement

Suggested reviewers

  • Iroy30
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'evals: add benchmark evals for cuopt-developer skill' clearly and concisely summarizes the main change—adding evaluation cases for the cuopt-developer skill focused on benchmarks.
Description check ✅ Passed The description 'Adds skills/cuopt-developer/evals/evals.json with 3 eval cases covering the benchmarks/ folder' directly relates to the changeset by specifying the file added and its purpose.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-benchmark-evals-cuopt-developer

Comment @coderabbitai help to get the list of available commands and usage tips.

Adds skills/cuopt-developer/evals/evals.json with 3 eval cases
covering the benchmarks/ folder (not covered by any existing eval):

- dev-eval-001: Mittelmann LP benchmark (solve_LP binary,
  BUILD_LP_BENCHMARKS flag, CUDA_MODULE_LOADING=EAGER, get_datasets.py)
- dev-eval-002: MIPLIB setup and run (download/gunzip, run_mps_files.sh,
  --write-log-file, --presolve t, --log-to-console false)
- dev-eval-003: multi-GPU + batch splitting (--gpus-per-instance 2,
  --batch-num / --n-batches, CUDA_VISIBLE_DEVICES, --cut-mode)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rgsl888prabhu rgsl888prabhu force-pushed the add-benchmark-evals-cuopt-developer branch from de5bdb5 to e5f1ae0 Compare June 5, 2026 20:21
@rgsl888prabhu
Copy link
Copy Markdown
Collaborator Author

/nvskills-ci

@rgsl888prabhu rgsl888prabhu self-assigned this Jun 5, 2026
@rgsl888prabhu rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Jun 5, 2026
@rgsl888prabhu rgsl888prabhu changed the title evals: add 3 benchmark evals for cuopt-developer skill evals: add benchmark evals for cuopt-developer skill Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant