Skip to content

Commit 323c213

Browse files
sjarmakclaude
andcommitted
feat: promote crossrepo+crossorg batches, generate MCP-unique variance gap config
- Promote ccb_mcp_crossrepo_haiku_20260302_034936 (10 tasks, 0 crits) - Promote ccb_mcp_crossorg_haiku_20260302_034936 (24 tasks, 0 crits) - Generate configs/variance_reruns/variance_gap_mcp_unique.json (187 tasks, 281 paired runs) - Post-promotion: 33/220 MCP-unique at 3+ pairs, 136 at 2-pair, 8 at 1-pair, 43 at zero - All 13 scaffolded tasks (272-284) verified complete and runnable - All 14 onboarding-search tasks (201-214) verified active in benchmarks/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8d7a2da commit 323c213

File tree

76 files changed

+69119
-713
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+69119
-713
lines changed

configs/variance_reruns/variance_gap_mcp_unique.json

Lines changed: 2769 additions & 0 deletions
Large diffs are not rendered by default.

docs/official_results/README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This bundle is generated from `runs/official/` and includes only valid scored tasks (`passed`/`failed` with numeric reward).
44

5-
Generated: `2026-03-02T04:18:08.444959+00:00`
5+
Generated: `2026-03-02T17:45:37.306771+00:00`
66

77
## Local Browse
88

@@ -33,14 +33,14 @@ Historical reruns/backfills remain available in `data/official_results.json` und
3333
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `baseline-local-direct` | 21 | 54 | 0.318 | 0.810 | FLAG: below minimum |
3434
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `mcp-remote-artifact` | 1 | 54 | 0.742 | 1.000 | FLAG: below minimum |
3535
| [ccb_mcp_compliance](suites/ccb_mcp_compliance.md) | `mcp-remote-direct` | 54 | 54 | 0.394 | 0.889 | ok |
36-
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `baseline-local-artifact` | 4 | 21 | 0.406 | 0.750 | FLAG: below minimum |
37-
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `baseline-local-direct` | 17 | 21 | 0.170 | 0.647 | FLAG: below minimum |
38-
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `mcp-remote-artifact` | 4 | 21 | 0.586 | 0.750 | FLAG: below minimum |
39-
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `mcp-remote-direct` | 21 | 21 | 0.344 | 0.762 | ok |
40-
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `baseline-local-artifact` | 5 | 81 | 0.565 | 0.600 | FLAG: below minimum |
41-
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `baseline-local-direct` | 42 | 81 | 0.296 | 0.762 | FLAG: below minimum |
42-
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `mcp-remote-artifact` | 5 | 81 | 0.654 | 1.000 | FLAG: below minimum |
43-
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `mcp-remote-direct` | 81 | 81 | 0.369 | 0.815 | ok |
36+
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `baseline-local-artifact` | 4 | 33 | 0.406 | 0.750 | FLAG: below minimum |
37+
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `baseline-local-direct` | 17 | 33 | 0.167 | 0.588 | FLAG: below minimum |
38+
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `mcp-remote-artifact` | 4 | 33 | 0.586 | 0.750 | FLAG: below minimum |
39+
| [ccb_mcp_crossorg](suites/ccb_mcp_crossorg.md) | `mcp-remote-direct` | 33 | 33 | 0.253 | 0.667 | ok |
40+
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `baseline-local-artifact` | 5 | 86 | 0.565 | 0.600 | FLAG: below minimum |
41+
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `baseline-local-direct` | 42 | 86 | 0.296 | 0.762 | FLAG: below minimum |
42+
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `mcp-remote-artifact` | 5 | 86 | 0.654 | 1.000 | FLAG: below minimum |
43+
| [ccb_mcp_crossrepo](suites/ccb_mcp_crossrepo.md) | `mcp-remote-direct` | 86 | 86 | 0.362 | 0.826 | ok |
4444
| [ccb_mcp_domain](suites/ccb_mcp_domain.md) | `baseline-local-artifact` | 3 | 49 | 0.000 | 0.000 | FLAG: below minimum |
4545
| [ccb_mcp_domain](suites/ccb_mcp_domain.md) | `baseline-local-direct` | 20 | 49 | 0.351 | 0.900 | FLAG: below minimum |
4646
| [ccb_mcp_domain](suites/ccb_mcp_domain.md) | `mcp-remote-artifact` | 3 | 49 | 0.529 | 1.000 | FLAG: below minimum |
@@ -200,6 +200,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
200200
| [ccb_mcp_crossorg_haiku_20260228_133005](runs/ccb_mcp_crossorg_haiku_20260228_133005.md) | `ccb_mcp_crossorg` | `baseline-local-direct` | 2 | 0.334 | 1.000 |
201201
| [ccb_mcp_crossorg_haiku_20260302_014939](runs/ccb_mcp_crossorg_haiku_20260302_014939.md) | `ccb_mcp_crossorg` | `baseline-local-direct` | 12 | 0.107 | 0.583 |
202202
| [ccb_mcp_crossorg_haiku_20260302_014939](runs/ccb_mcp_crossorg_haiku_20260302_014939.md) | `ccb_mcp_crossorg` | `mcp-remote-direct` | 12 | 0.181 | 0.667 |
203+
| [ccb_mcp_crossorg_haiku_20260302_034936](runs/ccb_mcp_crossorg_haiku_20260302_034936.md) | `ccb_mcp_crossorg` | `baseline-local-direct` | 12 | 0.103 | 0.500 |
204+
| [ccb_mcp_crossorg_haiku_20260302_034936](runs/ccb_mcp_crossorg_haiku_20260302_034936.md) | `ccb_mcp_crossorg` | `mcp-remote-direct` | 12 | 0.096 | 0.500 |
203205
| [ccb_mcp_crossrepo_haiku_20260226_035617](runs/ccb_mcp_crossrepo_haiku_20260226_035617.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 1 | 0.767 | 1.000 |
204206
| [ccb_mcp_crossrepo_haiku_20260226_035622_variance](runs/ccb_mcp_crossrepo_haiku_20260226_035622_variance.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 1 | 0.644 | 1.000 |
205207
| [ccb_mcp_crossrepo_haiku_20260226_035628_variance](runs/ccb_mcp_crossrepo_haiku_20260226_035628_variance.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 1 | 0.767 | 1.000 |
@@ -214,6 +216,8 @@ Historical reruns/backfills remain available in `data/official_results.json` und
214216
| [ccb_mcp_crossrepo_haiku_20260301_201320](runs/ccb_mcp_crossrepo_haiku_20260301_201320.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 6 | 0.049 | 0.333 |
215217
| [ccb_mcp_crossrepo_haiku_20260302_014939](runs/ccb_mcp_crossrepo_haiku_20260302_014939.md) | `ccb_mcp_crossrepo` | `baseline-local-direct` | 11 | 0.293 | 1.000 |
216218
| [ccb_mcp_crossrepo_haiku_20260302_014939](runs/ccb_mcp_crossrepo_haiku_20260302_014939.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 11 | 0.291 | 1.000 |
219+
| [ccb_mcp_crossrepo_haiku_20260302_034936](runs/ccb_mcp_crossrepo_haiku_20260302_034936.md) | `ccb_mcp_crossrepo` | `baseline-local-direct` | 5 | 0.253 | 1.000 |
220+
| [ccb_mcp_crossrepo_haiku_20260302_034936](runs/ccb_mcp_crossrepo_haiku_20260302_034936.md) | `ccb_mcp_crossrepo` | `mcp-remote-direct` | 5 | 0.250 | 1.000 |
217221
| [ccb_mcp_crossrepo_tracing_haiku_022126](runs/ccb_mcp_crossrepo_tracing_haiku_022126.md) | `ccb_mcp_crossrepo` | `baseline-local-artifact` | 3 | 0.941 | 1.000 |
218222
| [ccb_mcp_crossrepo_tracing_haiku_022126](runs/ccb_mcp_crossrepo_tracing_haiku_022126.md) | `ccb_mcp_crossrepo` | `mcp-remote-artifact` | 3 | 0.899 | 1.000 |
219223
| [ccb_mcp_crossrepo_tracing_haiku_20260224_181919](runs/ccb_mcp_crossrepo_tracing_haiku_20260224_181919.md) | `ccb_mcp_crossrepo` | `mcp-remote-artifact` | 2 | 0.287 | 1.000 |

0 commit comments

Comments
 (0)