Skip to content

Commit 5d8edf6

Browse files
sjarmakclaude
andcommitted
feat: rework mine-tasks skill for external users (quick eval mode)
Rewrites the mine-tasks skill from an internal benchmark development tool to a user-facing "point at your repo, get a baseline-vs-MCP comparison" workflow. Key changes: - Add Phase 0 (eval goals): quick eval vs full mining, MCP provider selection (Sourcegraph, GitHub Copilot, custom, or placeholder) - Quick eval mode: mines 5-10 tasks, auto-generates ground truth from PR patches, produces standalone run_eval.sh with no Harbor/Daytona dependency - Auto-generate ground_truth.json from PR patch data (files changed, patch stats, source PR metadata) - Generate dual Dockerfiles per task: Dockerfile.baseline (full code, no MCP) and Dockerfile.mcp (truncated source + MCP tools) - Private repo support via docker build --secret for git clone auth - Language-specific test commands in generated test.sh - Standalone run_eval.sh runner: builds images, runs agents, collects scores, prints baseline-vs-MCP comparison table - MCP provider is configurable, not hardcoded to Sourcegraph - Simplified task naming (no internal csb_sdlc_* taxonomy in quick mode) - Full mining mode preserved for CodeScaleBench contributors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 412680e commit 5d8edf6

File tree

1 file changed

+520
-241
lines changed

1 file changed

+520
-241
lines changed

0 commit comments

Comments
 (0)