Skip to content

Commit 3e7cd50

Browse files
author
LoCoBench Bot
committed
chore: mark US-005 as passing, update progress log
1 parent 2fa8382 commit 3e7cd50

File tree

2 files changed

+39
-2
lines changed

2 files changed

+39
-2
lines changed

ralph-gapfill-crossrepo/prd.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,8 +77,8 @@
7777
"Both tasks registered in selected_benchmark_tasks.json and crossrepo_2config.sh"
7878
],
7979
"priority": 5,
80-
"passes": false,
81-
"notes": "Good candidate: K8s → client-go → apimachinery type chain. Or Istio → envoy-api → protobuf definition chain. go_to_definition with cross-repo jumps is the killer MCP feature for this."
80+
"passes": true,
81+
"notes": "crossrepo-chain-001: Kubernetes TypeMeta chain (k/k → k/apik/apimachinery). crossrepo-chain-002: Envoy RouteConfiguration chain (istio → go-control-plane → data-plane-api). Both use partial credit scorer for each step in the chain. Ground truth verified via Sourcegraph."
8282
}
8383
]
8484
}

ralph-gapfill-crossrepo/progress.txt

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,3 +123,40 @@
123123
- SG repo mapping uses the primary repo where the interface is most commonly referenced (not necessarily defined)
124124
- Both tasks use category=symbol_resolution (consistent with sym-001/002/003), though "interface_implementors" would be more specific
125125
---
126+
## 2026-02-16 - US-005
127+
- Implemented crossrepo-chain-001 and crossrepo-chain-002: Dependency chain resolution tasks
128+
- Files created:
129+
- benchmarks/ccb_crossrepo/crossrepo-chain-001/task.toml
130+
- benchmarks/ccb_crossrepo/crossrepo-chain-001/instruction.md
131+
- benchmarks/ccb_crossrepo/crossrepo-chain-001/environment/Dockerfile
132+
- benchmarks/ccb_crossrepo/crossrepo-chain-001/tests/test.sh (partial credit scorer)
133+
- benchmarks/ccb_crossrepo/crossrepo-chain-001/tests/ground_truth.json
134+
- benchmarks/ccb_crossrepo/crossrepo-chain-002/task.toml
135+
- benchmarks/ccb_crossrepo/crossrepo-chain-002/instruction.md
136+
- benchmarks/ccb_crossrepo/crossrepo-chain-002/environment/Dockerfile
137+
- benchmarks/ccb_crossrepo/crossrepo-chain-002/tests/test.sh (partial credit scorer)
138+
- benchmarks/ccb_crossrepo/crossrepo-chain-002/tests/ground_truth.json
139+
- Files modified:
140+
- configs/selected_benchmark_tasks.json (added 2 task entries, updated counts: 195→197 total, 8→10 crossrepo)
141+
- configs/crossrepo_2config.sh (added SG repo mappings for both chain tasks)
142+
- crossrepo-chain-001: Kubernetes TypeMeta dependency chain
143+
- 3 repos: kubernetes/kubernetes → kubernetes/api → kubernetes/apimachinery
144+
- Traces TypeMeta struct from Pod usage through import chain to original definition
145+
- Ground truth: 3 steps (usage at line 5465, import at line 21, definition at line 42)
146+
- Partial credit scorer: each step worth 1/3 of total score, +/- 50 line tolerance
147+
- crossrepo-chain-002: Envoy RouteConfiguration dependency chain
148+
- 3 repos: istio/istio → envoyproxy/go-control-plane → envoyproxy/data-plane-api
149+
- Traces RouteConfiguration from Istio RDS generator through generated Go code to protobuf definition
150+
- Ground truth: 3 steps (usage in httproute.go:115, generated struct in route.pb.go:45, proto definition in route.proto:26)
151+
- Same partial credit scorer as chain-001
152+
- **Learnings for future iterations:**
153+
- Dependency chain tasks differ from caller/implementor tasks: they trace a symbol through import chains, not find all usages
154+
- Partial credit scoring is essential for chain tasks — agents may get some steps right but not all
155+
- Line number tolerance (+/- 50 lines) allows for minor code changes without invalidating ground truth
156+
- Kubernetes staging directory pattern means logical "3 repos" are physically subdirs in k/k monorepo — task Dockerfile clones separate repos to simulate cross-repo navigation
157+
- Envoy xDS ecosystem has clearer 3-repo chain: protobuf definitions (data-plane-api) → generated Go code (go-control-plane) → consumer (istio)
158+
- Both tasks use category=symbol_resolution (consistent with sym/impl tasks), difficulty=very_hard (cross-repo navigation is challenging)
159+
- Test scorer uses "steps" array in ground_truth.json (not "entries" like F1 tasks) and matches by step number or position
160+
- Scorer normalizes file paths (strips /workspace/ prefix) and allows missing line numbers without penalty
161+
- All repos must be indexed on Sourcegraph for MCP advantage in go_to_definition cross-repo jumps
162+
---

0 commit comments

Comments
 (0)