|
123 | 123 | - SG repo mapping uses the primary repo where the interface is most commonly referenced (not necessarily defined) |
124 | 124 | - Both tasks use category=symbol_resolution (consistent with sym-001/002/003), though "interface_implementors" would be more specific |
125 | 125 | --- |
| 126 | +## 2026-02-16 - US-005 |
| 127 | +- Implemented crossrepo-chain-001 and crossrepo-chain-002: Dependency chain resolution tasks |
| 128 | +- Files created: |
| 129 | + - benchmarks/ccb_crossrepo/crossrepo-chain-001/task.toml |
| 130 | + - benchmarks/ccb_crossrepo/crossrepo-chain-001/instruction.md |
| 131 | + - benchmarks/ccb_crossrepo/crossrepo-chain-001/environment/Dockerfile |
| 132 | + - benchmarks/ccb_crossrepo/crossrepo-chain-001/tests/test.sh (partial credit scorer) |
| 133 | + - benchmarks/ccb_crossrepo/crossrepo-chain-001/tests/ground_truth.json |
| 134 | + - benchmarks/ccb_crossrepo/crossrepo-chain-002/task.toml |
| 135 | + - benchmarks/ccb_crossrepo/crossrepo-chain-002/instruction.md |
| 136 | + - benchmarks/ccb_crossrepo/crossrepo-chain-002/environment/Dockerfile |
| 137 | + - benchmarks/ccb_crossrepo/crossrepo-chain-002/tests/test.sh (partial credit scorer) |
| 138 | + - benchmarks/ccb_crossrepo/crossrepo-chain-002/tests/ground_truth.json |
| 139 | +- Files modified: |
| 140 | + - configs/selected_benchmark_tasks.json (added 2 task entries, updated counts: 195→197 total, 8→10 crossrepo) |
| 141 | + - configs/crossrepo_2config.sh (added SG repo mappings for both chain tasks) |
| 142 | +- crossrepo-chain-001: Kubernetes TypeMeta dependency chain |
| 143 | + - 3 repos: kubernetes/kubernetes → kubernetes/api → kubernetes/apimachinery |
| 144 | + - Traces TypeMeta struct from Pod usage through import chain to original definition |
| 145 | + - Ground truth: 3 steps (usage at line 5465, import at line 21, definition at line 42) |
| 146 | + - Partial credit scorer: each step worth 1/3 of total score, +/- 50 line tolerance |
| 147 | +- crossrepo-chain-002: Envoy RouteConfiguration dependency chain |
| 148 | + - 3 repos: istio/istio → envoyproxy/go-control-plane → envoyproxy/data-plane-api |
| 149 | + - Traces RouteConfiguration from Istio RDS generator through generated Go code to protobuf definition |
| 150 | + - Ground truth: 3 steps (usage in httproute.go:115, generated struct in route.pb.go:45, proto definition in route.proto:26) |
| 151 | + - Same partial credit scorer as chain-001 |
| 152 | +- **Learnings for future iterations:** |
| 153 | + - Dependency chain tasks differ from caller/implementor tasks: they trace a symbol through import chains, not find all usages |
| 154 | + - Partial credit scoring is essential for chain tasks — agents may get some steps right but not all |
| 155 | + - Line number tolerance (+/- 50 lines) allows for minor code changes without invalidating ground truth |
| 156 | + - Kubernetes staging directory pattern means logical "3 repos" are physically subdirs in k/k monorepo — task Dockerfile clones separate repos to simulate cross-repo navigation |
| 157 | + - Envoy xDS ecosystem has clearer 3-repo chain: protobuf definitions (data-plane-api) → generated Go code (go-control-plane) → consumer (istio) |
| 158 | + - Both tasks use category=symbol_resolution (consistent with sym/impl tasks), difficulty=very_hard (cross-repo navigation is challenging) |
| 159 | + - Test scorer uses "steps" array in ground_truth.json (not "entries" like F1 tasks) and matches by step number or position |
| 160 | + - Scorer normalizes file paths (strips /workspace/ prefix) and allows missing line numbers without penalty |
| 161 | + - All repos must be indexed on Sourcegraph for MCP advantage in go_to_definition cross-repo jumps |
| 162 | +--- |
0 commit comments