Skip to content

Commit 92b4eea

Browse files
sjarmakclaude
andcommitted
fix: extract script handles old-format dirs, add V2 report audit
- Fix extract_v2_report_data.py to scan both old-format (ccb_*/csb_* batch dirs) and new-format (timestamp dirs), recovering 1366 evals from 98 old-format config dirs. Fixes bustub-hyperloglog-impl-001 appearing unpaired (369→370 paired tasks). - Add audit_v2_report_data.py for comprehensive V2 report data validation: coverage checks, reward integrity, suspicious pattern detection, normalization verification, and old-format impact analysis. - Archive V1 technical report to docs/technical_reports/archive/. - Regenerate script registry and index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 6dd3f87 commit 92b4eea

File tree

5 files changed

+1208
-1
lines changed

5 files changed

+1208
-1
lines changed

docs/ops/SCRIPT_INDEX.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
164164
- `scripts/add_verification_metadata.py` - Utility script for add verification metadata.
165165
- `scripts/audit_official_scores.py` - Utility script for audit official scores.
166166
- `scripts/audit_unpinned_repos.py` - Utility script for audit unpinned repos.
167+
- `scripts/audit_v2_report_data.py` - Utility script for audit v2 report data.
167168
- `scripts/backfill_instruction_artifacts.py` [one_off] - Historical one-off script: backfill instruction artifacts.
168169
- `scripts/backfill_size_metadata.py` [one_off] - Historical one-off script: backfill size metadata.
169170
- `scripts/backfill_triage_from_manifest.py` [one_off] - Historical one-off script: backfill triage from manifest.
@@ -182,11 +183,13 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
182183
- `scripts/docgen_quality_sweep.py` - Utility script for docgen quality sweep.
183184
- `scripts/doe_power_curves.py` - Utility script for doe power curves.
184185
- `scripts/doe_select_tasks.py` - Utility script for doe select tasks.
186+
- `scripts/ds_hybrid_retrieval.py` - Utility script for ds hybrid retrieval.
185187
- `scripts/ds_wrapper.sh` - Utility script for ds wrapper.
186188
- `scripts/export_official_results.py` - Utility script for export official results.
187189
- `scripts/extract_analysis_metrics.py` - Utility script for extract analysis metrics.
188190
- `scripts/extract_build_diary.py` - Utility script for extract build diary.
189191
- `scripts/extract_build_narrative.py` - Utility script for extract build narrative.
192+
- `scripts/extract_v2_report_data.py` - Utility script for extract v2 report data.
190193
- `scripts/find_mcp_distracted.py` - Utility script for find mcp distracted.
191194
- `scripts/fix_h3_tokens.py` [one_off] - Historical one-off script: fix h3 tokens.
192195
- `scripts/fix_workspace_perms.py` [one_off] - Historical one-off script: fix workspace perms.
File renamed without changes.

0 commit comments

Comments
 (0)