Commit dfe1bde
fix: re-run security oracle with working SG search (F1 0.0 → 0.667)
User-Agent fix enables SG API calls. Agent now discovers express repo
via Sourcegraph, achieving recall=1.0 on ccx-vuln-remed-011.
Overall pilot: mean F1=0.727, kappa=0.380.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent c9ebe17 commit dfe1bde
File tree
1 file changed
+17
-8
lines changed- benchmarks/ccb_mcp_security/ccx-vuln-remed-011/tests
1 file changed
+17
-8
lines changedLines changed: 17 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
4 | 13 | | |
5 | 14 | | |
6 | 15 | | |
7 | 16 | | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
14 | 23 | | |
15 | 24 | | |
0 commit comments