Skip to content

Commit af3e69a

Browse files
sjarmakclaude
andcommitted
feat: oracle hydration, model fix, protonmail removal, MCP-unique task prep
- Fix critical model default in run_selected_tasks.sh: opus -> haiku - Hydrate 47 MCP-unique task_spec.json oracles from oracle_answer.json - Remove 3 protonmail tasks (unresolvable git apply --allow-empty verifier bug) - Update task.toml timeouts for build/design tasks based on rerun analysis - Add 4 new MCP-unique tasks (057, 042, 050, 091) with eval.sh verifiers - Add direct_verifier.sh for 5 ccb_mcp_org tasks (081-083, 122, 127) - Add batch selection configs: mcp_unique_batch, rerun_haiku_timeout, etc. - Update MCP task instructions and metadata across all 10 ccb_mcp_* suites - Add Makefile, analysis scripts, doc updates - Update generate_manifest.py with MCP-unique suite detection - Update selected_benchmark_tasks.json and config_utils.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8921894 commit af3e69a

File tree

295 files changed

+11177
-11446
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

295 files changed

+11177
-11446
lines changed

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ full operations manual.
4444

4545
## Maintenance
4646
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.
47+
- `docs/START_HERE_BY_TASK.md` is generated from `docs/ops/task_routes.json`.
4748
- Regenerate after edits (single command):
4849
```bash
4950
python3 scripts/refresh_agent_navigation.py

Makefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
.PHONY: agent-nav agent-nav-check docs-consistency repo-health-quick
2+
3+
agent-nav:
4+
python3 scripts/refresh_agent_navigation.py
5+
6+
agent-nav-check:
7+
python3 scripts/refresh_agent_navigation.py --check
8+
9+
docs-consistency:
10+
python3 scripts/docs_consistency_check.py
11+
12+
repo-health-quick:
13+
python3 scripts/repo_health.py --quick

benchmarks/ccb_build/camel-fix-protocol-feat-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ reward_type = "ir_checklist"
2323
description = "Compilation check + IR metrics (file coverage) + keyword scoring for feature implementation"
2424

2525
[environment]
26-
build_timeout_sec = 1800.0
26+
build_timeout_sec = 3600.0
2727

2828
[environment.setup_scripts]
2929
mcp_config = """#!/bin/bash

benchmarks/ccb_build/flink-pricing-window-feat-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ reward_type = "ir_checklist"
2323
description = "Compilation check + IR metrics (file coverage) + keyword scoring for feature implementation"
2424

2525
[environment]
26-
build_timeout_sec = 1800.0
26+
build_timeout_sec = 3600.0
2727

2828
[environment.setup_scripts]
2929
mcp_config = """#!/bin/bash

benchmarks/ccb_build/k8s-noschedule-taint-feat-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ command = "bash /tests/test.sh"
2121
reward_type = "checklist"
2222
description = "Weighted checklist of required code structure and patterns"
2323
[environment]
24-
build_timeout_sec = 1800.0
24+
build_timeout_sec = 3600.0
2525

2626
[environment.setup_scripts]
2727
mcp_config = """#!/bin/bash

benchmarks/ccb_build/k8s-score-normalizer-refac-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ reward_type = "ir_checklist"
2323
description = "IR metrics (file coverage/completeness) + compilation check for cross-file refactoring"
2424

2525
[environment]
26-
build_timeout_sec = 1800.0
26+
build_timeout_sec = 3600.0
2727

2828
[environment.setup_scripts]
2929
mcp_config = """#!/bin/bash

benchmarks/ccb_build/rust-subtype-relation-refac-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ reward_type = "ir_checklist"
2323
description = "IR metrics (file coverage/completeness) + compilation check for cross-file refactoring"
2424

2525
[environment]
26-
build_timeout_sec = 1800.0
26+
build_timeout_sec = 3600.0
2727

2828
[environment.setup_scripts]
2929
mcp_config = """#!/bin/bash

benchmarks/ccb_build/servo-scrollend-event-feat-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ command = "bash /tests/test.sh"
2121
reward_type = "checklist"
2222
description = "Weighted checklist of required code structure and patterns"
2323
[environment]
24-
build_timeout_sec = 2400.0
24+
build_timeout_sec = 3600.0
2525

2626
[environment.setup_scripts]
2727
mcp_config = """#!/bin/bash

benchmarks/ccb_design/camel-routing-arch-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ reward_type = "ir_checklist"
2323
description = "IR metrics (file recall/precision) + keyword overlap for architectural analysis"
2424

2525
[environment]
26-
build_timeout_sec = 1800.0
26+
build_timeout_sec = 3600.0
2727

2828
[environment.setup_scripts]
2929
mcp_config = """#!/bin/bash

benchmarks/ccb_design/etcd-grpc-api-upgrade-001/task.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ reward_type = "semantic_similarity"
1919
description = "Semantic similarity of content, file references, and patterns"
2020

2121
[environment]
22-
build_timeout_sec = 300.0
22+
build_timeout_sec = 3600.0
2323
cpus = 2
2424
memory_mb = 4096
2525
storage_mb = 10240

0 commit comments

Comments
 (0)