Commit 5d8edf6
feat: rework mine-tasks skill for external users (quick eval mode)
Rewrites the mine-tasks skill from an internal benchmark development
tool to a user-facing "point at your repo, get a baseline-vs-MCP
comparison" workflow. Key changes:
- Add Phase 0 (eval goals): quick eval vs full mining, MCP provider
selection (Sourcegraph, GitHub Copilot, custom, or placeholder)
- Quick eval mode: mines 5-10 tasks, auto-generates ground truth from
PR patches, produces standalone run_eval.sh with no Harbor/Daytona
dependency
- Auto-generate ground_truth.json from PR patch data (files changed,
patch stats, source PR metadata)
- Generate dual Dockerfiles per task: Dockerfile.baseline (full code,
no MCP) and Dockerfile.mcp (truncated source + MCP tools)
- Private repo support via docker build --secret for git clone auth
- Language-specific test commands in generated test.sh
- Standalone run_eval.sh runner: builds images, runs agents, collects
scores, prints baseline-vs-MCP comparison table
- MCP provider is configurable, not hardcoded to Sourcegraph
- Simplified task naming (no internal csb_sdlc_* taxonomy in quick mode)
- Full mining mode preserved for CodeScaleBench contributors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 412680e commit 5d8edf6
1 file changed
+520
-241
lines changed
0 commit comments