Skip to content

Commit d4279d4

Browse files
committed
Recompute Section 11.8 costs on paired 251-task dataset
1 parent 61d0a53 commit d4279d4

File tree

2 files changed

+34
-42
lines changed

2 files changed

+34
-42
lines changed

docs/WHITE_PAPER_REPORT_V2.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1032,30 +1032,26 @@ The **fix** suite has the lowest MCP ratio (0.350) and highest local call count
10321032
10331033
### 11.8 Cost Analysis
10341034
1035-
At Haiku pricing ($1/Mtok input, $5/Mtok output):
1036-
1037-
**Per-suite cost (baseline):**
1038-
1039-
| Suite | n | Mean Input Tokens | Mean Output Tokens | Est. Cost/Task |
1040-
|-------|---|------------------|-------------------|---------------|
1041-
| build | 19 | 5,940,659 | 722 | $5.94 |
1042-
| debug | 20 | 3,866,034 | 186 | $3.87 |
1043-
| design | 13 | 2,045,816 | 213 | $2.05 |
1044-
| document | 14 | 1,533,600 | 81 | $1.53 |
1045-
| fix | 20 | 8,321,921 | 400 | $8.32 |
1046-
| secure | 37 | 3,200,342 | 367 | $3.20 |
1047-
| test | 17 | 3,928,643 | 543 | $3.93 |
1048-
| understand | 37 | 1,916,541 | 262 | $1.92 |
1049-
| mcp_unique | 12 | 1,402,706 | 104 | $1.40 |
1050-
1051-
**Aggregate cost comparison:**
1035+
Costs below are recomputed on the same **251 paired tasks** used in Section 11.2, using a single method: `task_metrics.json` `cost_usd` (model-aware pricing including cache read/write tokens).
1036+
1037+
| Suite | n | Baseline Mean Cost/Task | MCP Mean Cost/Task |
1038+
|-------|---|-------------------------|--------------------|
1039+
| build | 25 | $0.457 | $0.552 |
1040+
| debug | 20 | $0.381 | $0.480 |
1041+
| design | 20 | $0.342 | $0.312 |
1042+
| document | 20 | $0.302 | $0.279 |
1043+
| fix | 25 | $0.619 | $0.686 |
1044+
| secure | 20 | $0.409 | $0.430 |
1045+
| test | 20 | $0.280 | $0.293 |
1046+
| understand | 20 | $0.436 | $0.365 |
1047+
| mcp_unique | 81 | $0.188 | $0.175 |
10521048
10531049
| Config | n | Mean Cost/Task | Total Cost |
1054-
|--------|---|---------------|-----------|
1055-
| Baseline | 234 | $0.75 | $175.68 |
1056-
| MCP | 206 | $0.47 | $97.01 |
1050+
|--------|---|----------------|------------|
1051+
| Baseline | 251 | $0.339 | $85.12 |
1052+
| MCP | 251 | $0.352 | $88.35 |
10571053
1058-
MCP runs cost **37% less** on average ($0.47 vs $0.75 per task). This is driven by the truncated-source environment: with less local code to read, the agent processes fewer input tokens. The **fix** suite is the most expensive ($8.32/task baseline) due to large codebases and extensive multi-file editing. The **mcp_unique** suite is cheapest ($1.40/task) because artifact-mode tasks produce a short JSON answer rather than extensive code changes.
1054+
On this paired slice, MCP is **~3.8% higher cost** on average (+$0.013/task), not lower. Cost impact is suite-dependent: MCP is cheaper in `design`, `document`, `understand`, and `mcp_unique`, and more expensive in `build`, `debug`, `fix`, `secure`, and `test`.
10591055
10601056
### 11.9 Correlation Analysis
10611057

docs/technical_reports/TECHNICAL_REPORT_V1.md

Lines changed: 17 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1032,30 +1032,26 @@ The **fix** suite has the lowest MCP ratio (0.350) and highest local call count
10321032
10331033
### 11.8 Cost Analysis
10341034
1035-
At Haiku pricing ($1/Mtok input, $5/Mtok output):
1036-
1037-
**Per-suite cost (baseline):**
1038-
1039-
| Suite | n | Mean Input Tokens | Mean Output Tokens | Est. Cost/Task |
1040-
|-------|---|------------------|-------------------|---------------|
1041-
| build | 19 | 5,940,659 | 722 | $5.94 |
1042-
| debug | 20 | 3,866,034 | 186 | $3.87 |
1043-
| design | 13 | 2,045,816 | 213 | $2.05 |
1044-
| document | 14 | 1,533,600 | 81 | $1.53 |
1045-
| fix | 20 | 8,321,921 | 400 | $8.32 |
1046-
| secure | 37 | 3,200,342 | 367 | $3.20 |
1047-
| test | 17 | 3,928,643 | 543 | $3.93 |
1048-
| understand | 37 | 1,916,541 | 262 | $1.92 |
1049-
| mcp_unique | 12 | 1,402,706 | 104 | $1.40 |
1050-
1051-
**Aggregate cost comparison:**
1035+
Costs below are recomputed on the same **251 paired tasks** used in Section 11.2, using a single method: `task_metrics.json` `cost_usd` (model-aware pricing including cache read/write tokens).
1036+
1037+
| Suite | n | Baseline Mean Cost/Task | MCP Mean Cost/Task |
1038+
|-------|---|-------------------------|--------------------|
1039+
| build | 25 | $0.457 | $0.552 |
1040+
| debug | 20 | $0.381 | $0.480 |
1041+
| design | 20 | $0.342 | $0.312 |
1042+
| document | 20 | $0.302 | $0.279 |
1043+
| fix | 25 | $0.619 | $0.686 |
1044+
| secure | 20 | $0.409 | $0.430 |
1045+
| test | 20 | $0.280 | $0.293 |
1046+
| understand | 20 | $0.436 | $0.365 |
1047+
| mcp_unique | 81 | $0.188 | $0.175 |
10521048
10531049
| Config | n | Mean Cost/Task | Total Cost |
1054-
|--------|---|---------------|-----------|
1055-
| Baseline | 234 | $0.75 | $175.68 |
1056-
| MCP | 206 | $0.47 | $97.01 |
1050+
|--------|---|----------------|------------|
1051+
| Baseline | 251 | $0.339 | $85.12 |
1052+
| MCP | 251 | $0.352 | $88.35 |
10571053
1058-
MCP runs cost **37% less** on average ($0.47 vs $0.75 per task). This is driven by the truncated-source environment: with less local code to read, the agent processes fewer input tokens. The **fix** suite is the most expensive ($8.32/task baseline) due to large codebases and extensive multi-file editing. The **mcp_unique** suite is cheapest ($1.40/task) because artifact-mode tasks produce a short JSON answer rather than extensive code changes.
1054+
On this paired slice, MCP is **~3.8% higher cost** on average (+$0.013/task), not lower. Cost impact is suite-dependent: MCP is cheaper in `design`, `document`, `understand`, and `mcp_unique`, and more expensive in `build`, `debug`, `fix`, `secure`, and `test`.
10591055
10601056
### 11.9 Correlation Analysis
10611057

0 commit comments

Comments
 (0)