Skip to content

Commit b152728

Browse files
committed
removed extra docs
1 parent 8caa3ef commit b152728

File tree

9 files changed

+27
-10536
lines changed

9 files changed

+27
-10536
lines changed

docs/codescalebench_blog_v1.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -109,15 +109,17 @@ Curated GT deltas (`MCP - baseline`, combined):
109109

110110
### 2) Gains persist across size bins, with strongest lift in 1M-5M proxy bucket
111111

112-
Curated GT deltas (`MCP - baseline`):
113-
- `<1M`: F1@10 +0.1007, Total +0.1318
114-
- `1M-5M`: F1@10 +0.2680, Total +0.2392
115-
- `5M-20M`: F1@10 +0.0648, Total +0.0565
116-
- `>20M`: F1@10 +0.1247, Total +0.1075
112+
Curated GT deltas (`MCP - baseline`) by revised LOC size bands:
113+
- `<400K` (n=15): F1@10 +0.2503, Total +0.2780
114+
- `400K-2M` (n=31): F1@10 +0.2618, Total +0.2424
115+
- `2M-8M` (n=143): F1@10 +0.1796, Total +0.1622
116+
- `8M-40M` (n=74): F1@10 +0.0719, Total +0.0590
117+
- `>40M` (n=3): F1@10 +0.0242, Total +0.0667
118+
- `unknown` (n=63): F1@10 +0.0992, Total +0.1601
117119

118120
Interpretation: retrieval lift is not uniform, but MCP shows clear upside where task context is more distributed and retrieval-heavy.
119121

120-
Method note: I corrected an Org path-normalization bug in an earlier draft where some baseline paths were mismatched due to path shape differences (for example `repo/repo/path` vs `repo/path`).
122+
Method note: I corrected an Org path-normalization bug in an earlier draft where some baseline paths were mismatched due to path shape differences (for example `repo/repo/path` vs `repo/path`). I also replaced SDLC size proxies with non-proxy repository size mapping for the size-bin slice in this version.
121123

122124
## Cost and Speed
123125

docs/ops/SCRIPT_INDEX.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ Generated from `scripts/registry.json` by `scripts/generate_script_index.py`.
209209
- `scripts/plot_build_diary.py` - Utility script for plot build diary.
210210
- `scripts/plot_build_diary_supplementary.py` - Utility script for plot build diary supplementary.
211211
- `scripts/plot_build_narrative.py` - Utility script for plot build narrative.
212+
- `scripts/plot_csb_mcp_blog_figures.py` - Utility script for plot csb mcp blog figures.
212213
- `scripts/prepare_analysis_runs.py` - Utility script for prepare analysis runs.
213214
- `scripts/promote_agent_oracles.py` - Utility script for promote agent oracles.
214215
- `scripts/push_base_images_ghcr.sh` - Utility script for push base images ghcr.

docs/technical_reports/TECHNICAL_REPORT_V2.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -992,18 +992,20 @@ On curated ground truth (`MCP - Baseline`, combined):
992992
| single_repo | 158 | +0.0853 | +0.1119 |
993993
| multi_repo | 171 | +0.2089 | +0.1862 |
994994
995-
Curated size-bin deltas (`MCP - Baseline`):
995+
Curated size-bin deltas (`MCP - Baseline`), using the revised LOC bands shared with prior repo-size analysis:
996996
997-
| Size Bin (proxy) | n | Δ F1@10 | Δ Total File Recall |
997+
| Size Bin | n | Δ F1@10 | Δ Total File Recall |
998998
|------------------|---:|--------:|--------------------:|
999-
| <1M | 139 | +0.1007 | +0.1318 |
1000-
| 1M-5M | 104 | +0.2680 | +0.2392 |
1001-
| 5M-20M | 57 | +0.0648 | +0.0565 |
1002-
| >20M | 29 | +0.1247 | +0.1075 |
999+
| <400K | 15 | +0.2503 | +0.2780 |
1000+
| 400K-2M | 31 | +0.2618 | +0.2424 |
1001+
| 2M-8M | 143 | +0.1796 | +0.1622 |
1002+
| 8M-40M | 74 | +0.0719 | +0.0590 |
1003+
| >40M | 3 | +0.0242 | +0.0667 |
1004+
| unknown | 63 | +0.0992 | +0.1601 |
10031005
10041006
These slices indicate MCP retrieval gains are larger on multi-repo tasks than single-repo tasks in this snapshot.
10051007
1006-
Methodology note: size bins here are metadata-driven proxies (`repo_set` fixture LOC totals for Org; `context_length` proxy for SDLC with fallback), so they should be interpreted as directional rather than exact physical repository size measurements.
1008+
Methodology note: size bins here are no longer `context_length` proxies. Org tasks use fixture `loc_estimate` totals; SDLC tasks use repository size from GitHub metadata mapped into the same LOC bands (`docs/analysis/repo_size_bins_revised_20260303.json` conventions). `unknown` indicates tasks without resolved size metadata in this pass.
10071009
10081010
### 11.6 Correlation Analysis
10091011

0 commit comments

Comments
 (0)