Commit c7d78c5
Fix GT coverage pipeline: 42% → 99.1% benchmark event coverage
- Move benchmarks/ccb_contextbench/ → calibration/curator_calibration/
to stop calibration tasks from polluting benchmark GT scans
- Fix audit_gt_coverage.py: don't short-circuit on invalid-schema
ground_truth.json when valid oracle_answer.json exists; recognize
expected_files format from scaling-gap tasks
- Fix update_gt_registry.py: handle expected_files format; replace
stale entries instead of accumulating them (248 → 402 entries)
- Fix normalize_retrieval_events.py: strip bl_/sgonly_/artifact_
prefixes and Harbor suffixes in _normalize_task_name(); add
case-insensitive GT registry fallback for uppercase CCX- task names
- Update script references for contextbench path change
- Clean up 958 stale prefixed event files from retrieval_events/
Benchmark task GT: 100% (404/404 tasks)
Benchmark event GT: 99.1% (4516/4559 events)
Registry: 402 entries covering all benchmark tasks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 773096d commit c7d78c5
File tree
507 files changed
+13082
-4277
lines changed- calibration/curator_calibration
- cb-multi-swe-bench__c__maintenance__bugfix__634fe9b8
- environment
- tests
- cb-multi-swe-bench__c__maintenance__bugfix__a47dfbbf
- environment
- tests
- cb-multi-swe-bench__cpp__maintenance__bugfix__4a37a167
- environment
- tests
- cb-multi-swe-bench__go__maintenance__bugfix__3d85271b
- environment
- tests
- cb-multi-swe-bench__go__maintenance__bugfix__4d9664f3
- environment
- tests
- cb-multi-swe-bench__go__maintenance__bugfix__52c152ba
- environment
- tests
- cb-multi-swe-bench__go__maintenance__bugfix__85c030cf
- environment
- tests
- cb-multi-swe-bench__java__maintenance__bugfix__2bd87230
- environment
- tests
- cb-multi-swe-bench__javascript__maintenance__bugfix__5b47b0dd
- environment
- tests
- cb-multi-swe-bench__rust__maintenance__bugfix__1cadcb7d
- environment
- tests
- cb-multi-swe-bench__rust__maintenance__bugfix__3c69099b
- environment
- tests
- cb-multi-swe-bench__typescript__maintenance__bugfix__05c53458
- environment
- tests
- cb-multi-swe-bench__typescript__maintenance__bugfix__6a14056a
- environment
- tests
- cb-swe-bench-pro__go__maintenance__bugfix__6efcf999
- environment
- tests
- cb-swe-bench-pro__javascript__maintenance__bugfix__ac8400d9
- environment
- tests
- cb-swe-bench-pro__python__maintenance__bugfix__607fc4ff
- environment
- tests
- cb-swe-bench-pro__python__maintenance__bugfix__62badbbf
- environment
- tests
- cb-swe-bench-pro__python__maintenance__bugfix__942d0b14
- environment
- tests
- cb-swe-bench-pro__python__maintenance__bugfix__e2b70931
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__16c72e4c
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__186c0af4
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__1a760e52
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__1efc2b51
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__2c34be8a
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__3146e19b
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__31d4fe9d
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__34e61891
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__4b691a35
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__55a3ef80
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__5c82134f
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__60b000ec
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__84effcbc
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__9a05fe0c
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__a8414dbd
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__d2a786e2
- environment
- tests
- cb-swe-bench-verified__python__maintenance__bugfix__d3bf673c
- environment
- tests
- cb-swe-polybench__javascript__maintenance__bugfix__21086305
- environment
- tests
- cb-swe-polybench__javascript__maintenance__bugfix__27e1903f
- environment
- tests
- cb-swe-polybench__javascript__maintenance__bugfix__78039f77
- environment
- tests
- cb-swe-polybench__javascript__maintenance__bugfix__e647c8ce
- environment
- tests
- cb-swe-polybench__python__evolution__feature__8bb50331
- environment
- tests
- cb-swe-polybench__python__maintenance__bugfix__023915d6
- environment
- tests
- cb-swe-polybench__python__maintenance__bugfix__40f09c26
- environment
- tests
- cb-swe-polybench__python__maintenance__bugfix__9ea927ce
- environment
- tests
- cb-swe-polybench__python__maintenance__bugfix__e3c9c53c
- environment
- tests
- cb-swe-polybench__typescript__maintenance__bugfix__42165c4e
- environment
- tests
- cb-swe-polybench__typescript__maintenance__bugfix__52180d42
- environment
- tests
- cb-swe-polybench__typescript__maintenance__bugfix__678fa217
- environment
- tests
- cb-swe-polybench__typescript__maintenance__bugfix__708894b2
- environment
- tests
- cb-swe-polybench__typescript__maintenance__bugfix__7d106697
- environment
- tests
- configs
- scripts
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
507 files changed
+13082
-4277
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
8 | | - | |
9 | 7 | | |
| 8 | + | |
| 9 | + | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| |||
Lines changed: 99 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
0 commit comments