Skip to content

Commit 034ff9d

Browse files
sjarmakclaude
andcommitted
chore: update progress.txt with US-015 learnings
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent ed087a4 commit 034ff9d

File tree

1 file changed

+38
-0
lines changed

1 file changed

+38
-0
lines changed

ralph-mcp-unique/progress.txt

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,3 +352,41 @@
352352
[2026-02-20 21:03:25 UTC] Iteration 5 no story markers found
353353
[2026-02-20 21:03:25 UTC] Iteration 5 complete
354354
[2026-02-20 21:03:27 UTC] Iteration 6 started
355+
356+
## 2026-02-20 - US-015: Starter tasks — Category E: Onboarding comprehension (2 tasks)
357+
- Authored 2 complete tasks in benchmarks/ccb_mcp_onboarding/:
358+
359+
**CCX-onboard-041** (E41 — API consumption, python-ml-stack fixture):
360+
- Scenario: New engineer auditing which pandas source files `from scipy.stats import` at runtime
361+
- Oracle: 4 files in pandas-dev/pandas:
362+
- pandas/core/nanops.py (kendalltau, spearmanr for correlation methods)
363+
- pandas/plotting/_matplotlib/misc.py (gaussian_kde)
364+
- pandas/plotting/_matplotlib/hist.py (gaussian_kde)
365+
- pandas/tests/groupby/test_reductions.py (sem)
366+
- eval.sh: file_set_match + provenance
367+
- Validity gate: VALID (gold=1.0, empty=0.0)
368+
369+
**CCX-onboard-050-ds** (E50 — end-to-end flow, Deep Search variant, kubernetes-ecosystem fixture):
370+
- Scenario: Onboarding engineer traces Deployment creation flow end-to-end across 3 repos
371+
- Open-ended instruction: "Explain how a client creates a Deployment — trace through all 3 layers"
372+
- Oracle chain:
373+
- sg-benchmarks/kubernetes-client-go: kubernetes/typed/apps/v1/deployment.go (Create)
374+
- kubernetes/kubernetes: pkg/registry/apps/deployment/strategy.go (PrepareForCreate)
375+
- etcd-io/etcd: server/storage/mvcc/kvstore_txn.go (Put)
376+
- eval.sh: dependency_chain + provenance
377+
- tests/criteria.json: 4 AAA quality rubric criteria (flow_completeness, cross_repo_synthesis, technical_accuracy, onboarding_clarity)
378+
- deepsearch_relevant=true in selection file
379+
- rubric_judge weight=0.4 in task_spec.json (supplementary, not in eval.sh)
380+
- Validity gate: VALID (gold=1.0, empty=0.0)
381+
382+
- Both tasks registered in configs/selected_mcp_unique_tasks.json (8 total tasks now)
383+
- Files changed: benchmarks/ccb_mcp_onboarding/ (2 new task dirs with 19 files total), configs/selected_mcp_unique_tasks.json, prd.json, progress.txt
384+
- **Learnings for future iterations:**
385+
- For "API consumption" oracles, use exact import pattern (`from scipy.stats import`) to get bounded, verifiable oracle — not `import scipy.stats` (which is broader)
386+
- Production-only vs all-files oracle: including test files gives more realistic "complete audit" scenario
387+
- Deep Search variant design: open-ended cross-repo synthesis question + criteria.json is the right pattern
388+
- kubernetes-client-go typed client: `kubernetes/typed/apps/v1/deployment.go` has Create() that POSTs to REST endpoint
389+
- etcd Put function: `server/storage/mvcc/kvstore_txn.go:Put()` is the canonical write path
390+
- criteria.json uses AAA pattern: each metric description has "Accurate: ... Attributed: ... Actionable: ..."
391+
- deepsearch_relevant in task.toml is a new field — verify task.toml doesn't reject unknown fields (it doesn't — task.toml is read by Harbor but extra fields are ignored)
392+
---

0 commit comments

Comments
 (0)