22
33This bundle is generated from ` runs/official/ ` and includes only valid scored tasks (` passed ` /` failed ` with numeric reward).
44
5- Generated: ` 2026-02-27T02:23:03.814992 +00:00 `
5+ Generated: ` 2026-02-27T02:26:00.850511 +00:00 `
66
77## Local Browse
88
@@ -17,25 +17,15 @@ Historical reruns/backfills remain available in `data/official_results.json` und
1717
1818| Suite | Config | Valid Tasks | Mean Reward | Pass Rate |
1919| ---| ---| ---:| ---:| ---:|
20- | [ ccb_build] ( suites/ccb_build.md ) | ` baseline ` | 19 | 0.511 | 0.789 |
2120| [ ccb_build] ( suites/ccb_build.md ) | ` baseline-local-direct ` | 20 | 0.527 | 0.800 |
22- | [ ccb_build] ( suites/ccb_build.md ) | ` mcp ` | 25 | 0.372 | 0.640 |
2321| [ ccb_build] ( suites/ccb_build.md ) | ` mcp-remote-direct ` | 25 | 0.372 | 0.640 |
24- | [ ccb_debug] ( suites/ccb_debug.md ) | ` baseline ` | 20 | 0.670 | 1.000 |
2522| [ ccb_debug] ( suites/ccb_debug.md ) | ` baseline-local-direct ` | 20 | 0.670 | 1.000 |
26- | [ ccb_debug] ( suites/ccb_debug.md ) | ` mcp ` | 20 | 0.487 | 0.600 |
2723| [ ccb_debug] ( suites/ccb_debug.md ) | ` mcp-remote-direct ` | 20 | 0.487 | 0.600 |
28- | [ ccb_design] ( suites/ccb_design.md ) | ` baseline ` | 13 | 0.770 | 1.000 |
2924| [ ccb_design] ( suites/ccb_design.md ) | ` baseline-local-direct ` | 20 | 0.753 | 0.950 |
30- | [ ccb_design] ( suites/ccb_design.md ) | ` mcp ` | 20 | 0.718 | 1.000 |
3125| [ ccb_design] ( suites/ccb_design.md ) | ` mcp-remote-direct ` | 20 | 0.718 | 1.000 |
32- | [ ccb_document] ( suites/ccb_document.md ) | ` baseline ` | 14 | 0.904 | 1.000 |
3326| [ ccb_document] ( suites/ccb_document.md ) | ` baseline-local-direct ` | 20 | 0.847 | 1.000 |
34- | [ ccb_document] ( suites/ccb_document.md ) | ` mcp ` | 15 | 0.953 | 1.000 |
3527| [ ccb_document] ( suites/ccb_document.md ) | ` mcp-remote-direct ` | 25 | 0.802 | 1.000 |
36- | [ ccb_fix] ( suites/ccb_fix.md ) | ` baseline ` | 17 | 0.535 | 0.706 |
3728| [ ccb_fix] ( suites/ccb_fix.md ) | ` baseline-local-direct ` | 28 | 0.428 | 0.571 |
38- | [ ccb_fix] ( suites/ccb_fix.md ) | ` mcp ` | 17 | 0.538 | 0.647 |
3929| [ ccb_fix] ( suites/ccb_fix.md ) | ` mcp-remote-direct ` | 28 | 0.467 | 0.571 |
4030| [ ccb_mcp_compliance] ( suites/ccb_mcp_compliance.md ) | ` baseline-local-artifact ` | 1 | 0.375 | 1.000 |
4131| [ ccb_mcp_compliance] ( suites/ccb_mcp_compliance.md ) | ` baseline-local-direct ` | 6 | 0.668 | 1.000 |
@@ -85,17 +75,11 @@ Historical reruns/backfills remain available in `data/official_results.json` und
8575| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` mcp ` | 2 | 0.821 | 1.000 |
8676| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` mcp-remote-artifact ` | 4 | 0.777 | 1.000 |
8777| [ ccb_mcp_security] ( suites/ccb_mcp_security.md ) | ` mcp-remote-direct ` | 16 | 0.705 | 1.000 |
88- | [ ccb_secure] ( suites/ccb_secure.md ) | ` baseline ` | 18 | 0.688 | 0.944 |
8978| [ ccb_secure] ( suites/ccb_secure.md ) | ` baseline-local-direct ` | 20 | 0.669 | 0.950 |
90- | [ ccb_secure] ( suites/ccb_secure.md ) | ` mcp ` | 18 | 0.705 | 1.000 |
9179| [ ccb_secure] ( suites/ccb_secure.md ) | ` mcp-remote-direct ` | 22 | 0.645 | 0.909 |
92- | [ ccb_test] ( suites/ccb_test.md ) | ` baseline ` | 9 | 0.472 | 0.778 |
9380| [ ccb_test] ( suites/ccb_test.md ) | ` baseline-local-direct ` | 20 | 0.480 | 0.750 |
94- | [ ccb_test] ( suites/ccb_test.md ) | ` mcp ` | 8 | 0.555 | 0.625 |
9581| [ ccb_test] ( suites/ccb_test.md ) | ` mcp-remote-direct ` | 31 | 0.403 | 0.613 |
96- | [ ccb_understand] ( suites/ccb_understand.md ) | ` baseline ` | 13 | 0.592 | 0.692 |
9782| [ ccb_understand] ( suites/ccb_understand.md ) | ` baseline-local-direct ` | 20 | 0.660 | 0.800 |
98- | [ ccb_understand] ( suites/ccb_understand.md ) | ` mcp ` | 13 | 0.841 | 1.000 |
9983| [ ccb_understand] ( suites/ccb_understand.md ) | ` mcp-remote-direct ` | 20 | 0.851 | 1.000 |
10084
10185<details >
@@ -106,23 +90,23 @@ Historical reruns/backfills remain available in `data/official_results.json` und
10690| ---| ---| ---| ---:| ---:| ---:|
10791| [ build_haiku_20260223_124805] ( runs/build_haiku_20260223_124805.md ) | ` ccb_build ` | ` baseline-local-direct ` | 19 | 0.511 | 0.789 |
10892| [ build_haiku_20260223_124805] ( runs/build_haiku_20260223_124805.md ) | ` ccb_build ` | ` mcp-remote-direct ` | 25 | 0.372 | 0.640 |
109- | [ ccb_build_haiku_022326] ( runs/ccb_build_haiku_022326.md ) | ` ccb_build ` | ` baseline ` | 19 | 0.511 | 0.789 |
110- | [ ccb_build_haiku_022326] ( runs/ccb_build_haiku_022326.md ) | ` ccb_build ` | ` mcp ` | 25 | 0.372 | 0.640 |
93+ | [ ccb_build_haiku_022326] ( runs/ccb_build_haiku_022326.md ) | ` ccb_build ` | ` baseline-local-direct ` | 19 | 0.511 | 0.789 |
94+ | [ ccb_build_haiku_022326] ( runs/ccb_build_haiku_022326.md ) | ` ccb_build ` | ` mcp-remote-direct ` | 25 | 0.372 | 0.640 |
11195| [ ccb_build_haiku_20260225_234223] ( runs/ccb_build_haiku_20260225_234223.md ) | ` ccb_build ` | ` baseline-local-direct ` | 1 | 0.820 | 1.000 |
11296| [ ccb_build_haiku_20260226_015500_backfill] ( runs/ccb_build_haiku_20260226_015500_backfill.md ) | ` ccb_build ` | ` baseline-local-direct ` | 1 | 0.820 | 1.000 |
113- | [ ccb_debug_haiku_022326] ( runs/ccb_debug_haiku_022326.md ) | ` ccb_debug ` | ` baseline ` | 20 | 0.670 | 1.000 |
114- | [ ccb_debug_haiku_022326] ( runs/ccb_debug_haiku_022326.md ) | ` ccb_debug ` | ` mcp ` | 20 | 0.487 | 0.600 |
115- | [ ccb_design_haiku_022326] ( runs/ccb_design_haiku_022326.md ) | ` ccb_design ` | ` baseline ` | 13 | 0.770 | 1.000 |
116- | [ ccb_design_haiku_022326] ( runs/ccb_design_haiku_022326.md ) | ` ccb_design ` | ` mcp ` | 20 | 0.718 | 1.000 |
97+ | [ ccb_debug_haiku_022326] ( runs/ccb_debug_haiku_022326.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 20 | 0.670 | 1.000 |
98+ | [ ccb_debug_haiku_022326] ( runs/ccb_debug_haiku_022326.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 20 | 0.487 | 0.600 |
99+ | [ ccb_design_haiku_022326] ( runs/ccb_design_haiku_022326.md ) | ` ccb_design ` | ` baseline-local-direct ` | 13 | 0.770 | 1.000 |
100+ | [ ccb_design_haiku_022326] ( runs/ccb_design_haiku_022326.md ) | ` ccb_design ` | ` mcp-remote-direct ` | 20 | 0.718 | 1.000 |
117101| [ ccb_design_haiku_20260225_234223] ( runs/ccb_design_haiku_20260225_234223.md ) | ` ccb_design ` | ` baseline-local-direct ` | 7 | 0.723 | 0.857 |
118102| [ ccb_design_haiku_20260226_015500_backfill] ( runs/ccb_design_haiku_20260226_015500_backfill.md ) | ` ccb_design ` | ` baseline-local-direct ` | 7 | 0.723 | 0.857 |
119- | [ ccb_document_haiku_022326] ( runs/ccb_document_haiku_022326.md ) | ` ccb_document ` | ` baseline ` | 14 | 0.904 | 1.000 |
120- | [ ccb_document_haiku_022326] ( runs/ccb_document_haiku_022326.md ) | ` ccb_document ` | ` mcp ` | 15 | 0.953 | 1.000 |
103+ | [ ccb_document_haiku_022326] ( runs/ccb_document_haiku_022326.md ) | ` ccb_document ` | ` baseline-local-direct ` | 14 | 0.904 | 1.000 |
104+ | [ ccb_document_haiku_022326] ( runs/ccb_document_haiku_022326.md ) | ` ccb_document ` | ` mcp-remote-direct ` | 15 | 0.953 | 1.000 |
121105| [ ccb_document_haiku_20260224_174311] ( runs/ccb_document_haiku_20260224_174311.md ) | ` ccb_document ` | ` baseline-local-direct ` | 5 | 0.658 | 1.000 |
122106| [ ccb_document_haiku_20260224_174311] ( runs/ccb_document_haiku_20260224_174311.md ) | ` ccb_document ` | ` mcp-remote-direct ` | 5 | 0.720 | 1.000 |
123107| [ ccb_document_haiku_20260226_015500_backfill] ( runs/ccb_document_haiku_20260226_015500_backfill.md ) | ` ccb_document ` | ` baseline-local-direct ` | 1 | 1.000 | 1.000 |
124- | [ ccb_fix_haiku_022326] ( runs/ccb_fix_haiku_022326.md ) | ` ccb_fix ` | ` baseline ` | 17 | 0.535 | 0.706 |
125- | [ ccb_fix_haiku_022326] ( runs/ccb_fix_haiku_022326.md ) | ` ccb_fix ` | ` mcp ` | 17 | 0.538 | 0.647 |
108+ | [ ccb_fix_haiku_022326] ( runs/ccb_fix_haiku_022326.md ) | ` ccb_fix ` | ` baseline-local-direct ` | 17 | 0.535 | 0.706 |
109+ | [ ccb_fix_haiku_022326] ( runs/ccb_fix_haiku_022326.md ) | ` ccb_fix ` | ` mcp-remote-direct ` | 17 | 0.538 | 0.647 |
126110| [ ccb_fix_haiku_20260224_203138] ( runs/ccb_fix_haiku_20260224_203138.md ) | ` ccb_fix ` | ` baseline-local-direct ` | 1 | 0.710 | 1.000 |
127111| [ ccb_fix_haiku_20260224_203138] ( runs/ccb_fix_haiku_20260224_203138.md ) | ` ccb_fix ` | ` mcp-remote-direct ` | 1 | 0.740 | 1.000 |
128112| [ ccb_fix_haiku_20260226_015500_backfill] ( runs/ccb_fix_haiku_20260226_015500_backfill.md ) | ` ccb_fix ` | ` baseline-local-direct ` | 2 | 0.235 | 0.500 |
@@ -254,18 +238,18 @@ Historical reruns/backfills remain available in `data/official_results.json` und
254238| [ ccb_mcp_security_haiku_20260226_035633_variance] ( runs/ccb_mcp_security_haiku_20260226_035633_variance.md ) | ` ccb_mcp_security ` | ` baseline-local-direct ` | 1 | 0.586 | 1.000 |
255239| [ ccb_mcp_security_haiku_20260226_035633_variance] ( runs/ccb_mcp_security_haiku_20260226_035633_variance.md ) | ` ccb_mcp_security ` | ` mcp-remote-direct ` | 4 | 0.731 | 1.000 |
256240| [ ccb_mcp_security_haiku_20260226_205845] ( runs/ccb_mcp_security_haiku_20260226_205845.md ) | ` ccb_mcp_security ` | ` baseline-local-direct ` | 3 | 0.682 | 1.000 |
257- | [ ccb_secure_haiku_022326] ( runs/ccb_secure_haiku_022326.md ) | ` ccb_secure ` | ` baseline ` | 18 | 0.688 | 0.944 |
258- | [ ccb_secure_haiku_022326] ( runs/ccb_secure_haiku_022326.md ) | ` ccb_secure ` | ` mcp ` | 18 | 0.705 | 1.000 |
241+ | [ ccb_secure_haiku_022326] ( runs/ccb_secure_haiku_022326.md ) | ` ccb_secure ` | ` baseline-local-direct ` | 18 | 0.688 | 0.944 |
242+ | [ ccb_secure_haiku_022326] ( runs/ccb_secure_haiku_022326.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 18 | 0.705 | 1.000 |
259243| [ ccb_secure_haiku_20260224_213146] ( runs/ccb_secure_haiku_20260224_213146.md ) | ` ccb_secure ` | ` baseline-local-direct ` | 2 | 0.500 | 1.000 |
260244| [ ccb_secure_haiku_20260224_213146] ( runs/ccb_secure_haiku_20260224_213146.md ) | ` ccb_secure ` | ` mcp-remote-direct ` | 2 | 0.250 | 0.500 |
261- | [ ccb_test_haiku_022326] ( runs/ccb_test_haiku_022326.md ) | ` ccb_test ` | ` baseline ` | 9 | 0.472 | 0.778 |
262- | [ ccb_test_haiku_022326] ( runs/ccb_test_haiku_022326.md ) | ` ccb_test ` | ` mcp ` | 8 | 0.555 | 0.625 |
245+ | [ ccb_test_haiku_022326] ( runs/ccb_test_haiku_022326.md ) | ` ccb_test ` | ` baseline-local-direct ` | 9 | 0.472 | 0.778 |
246+ | [ ccb_test_haiku_022326] ( runs/ccb_test_haiku_022326.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 8 | 0.555 | 0.625 |
263247| [ ccb_test_haiku_20260224_180149] ( runs/ccb_test_haiku_20260224_180149.md ) | ` ccb_test ` | ` baseline-local-direct ` | 11 | 0.486 | 0.727 |
264248| [ ccb_test_haiku_20260224_180149] ( runs/ccb_test_haiku_20260224_180149.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 11 | 0.387 | 0.727 |
265249| [ ccb_test_haiku_20260226_015500_backfill] ( runs/ccb_test_haiku_20260226_015500_backfill.md ) | ` ccb_test ` | ` baseline-local-direct ` | 1 | 0.370 | 1.000 |
266250| [ ccb_test_haiku_20260226_015500_backfill] ( runs/ccb_test_haiku_20260226_015500_backfill.md ) | ` ccb_test ` | ` mcp-remote-direct ` | 1 | 0.900 | 1.000 |
267- | [ ccb_understand_haiku_022426] ( runs/ccb_understand_haiku_022426.md ) | ` ccb_understand ` | ` baseline ` | 13 | 0.592 | 0.692 |
268- | [ ccb_understand_haiku_022426] ( runs/ccb_understand_haiku_022426.md ) | ` ccb_understand ` | ` mcp ` | 13 | 0.841 | 1.000 |
251+ | [ ccb_understand_haiku_022426] ( runs/ccb_understand_haiku_022426.md ) | ` ccb_understand ` | ` baseline-local-direct ` | 13 | 0.592 | 0.692 |
252+ | [ ccb_understand_haiku_022426] ( runs/ccb_understand_haiku_022426.md ) | ` ccb_understand ` | ` mcp-remote-direct ` | 13 | 0.841 | 1.000 |
269253| [ debug_haiku_20260223_154724] ( runs/debug_haiku_20260223_154724.md ) | ` ccb_debug ` | ` baseline-local-direct ` | 20 | 0.670 | 1.000 |
270254| [ debug_haiku_20260223_154724] ( runs/debug_haiku_20260223_154724.md ) | ` ccb_debug ` | ` mcp-remote-direct ` | 20 | 0.487 | 0.600 |
271255| [ design_haiku_20260223_124652] ( runs/design_haiku_20260223_124652.md ) | ` ccb_design ` | ` baseline-local-direct ` | 13 | 0.770 | 1.000 |
0 commit comments