Commit b8d91c5
Fix evalbuff signal quality: commit docs in test repos, isolate Claude calls, filter lockfiles
- Commit pre-copied docs in test repos so they don't appear in the agent's
diff — fixes corrupted diff attribution where judges penalized agents for
docs they didn't create
- Run prompt generator and doc writer Claude calls with cwd=tmpDir to prevent
them from reading the repo's CLAUDE.md/AGENTS.md
- Filter lockfiles (bun.lock, package-lock.json, etc.) from diffs and file lists
- Add 0.3-point minimum threshold for score comparisons to reduce noise
- Cap improvement loop at 5 iterations
- Pass edit history (accepted/rejected docs with scores) to the doc writer
so it can avoid repeating rejected approaches and build on what worked
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 6d8bf39 commit b8d91c5
File tree
3 files changed
+132
-14
lines changed- evalbuff/src
3 files changed
+132
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
17 | 39 | | |
18 | 40 | | |
19 | 41 | | |
| |||
68 | 90 | | |
69 | 91 | | |
70 | 92 | | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
| 93 | + | |
79 | 94 | | |
80 | 95 | | |
81 | 96 | | |
82 | 97 | | |
83 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
84 | 111 | | |
85 | 112 | | |
86 | 113 | | |
| |||
124 | 151 | | |
125 | 152 | | |
126 | 153 | | |
| 154 | + | |
127 | 155 | | |
128 | 156 | | |
129 | 157 | | |
| |||
209 | 237 | | |
210 | 238 | | |
211 | 239 | | |
| 240 | + | |
| 241 | + | |
212 | 242 | | |
213 | 243 | | |
214 | 244 | | |
| 245 | + | |
215 | 246 | | |
216 | 247 | | |
217 | 248 | | |
| |||
245 | 276 | | |
246 | 277 | | |
247 | 278 | | |
248 | | - | |
| 279 | + | |
249 | 280 | | |
250 | 281 | | |
251 | 282 | | |
252 | 283 | | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
253 | 290 | | |
254 | 291 | | |
255 | 292 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
66 | 84 | | |
67 | 85 | | |
68 | 86 | | |
69 | 87 | | |
70 | 88 | | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
71 | 97 | | |
72 | 98 | | |
73 | 99 | | |
74 | 100 | | |
75 | 101 | | |
76 | 102 | | |
77 | 103 | | |
| 104 | + | |
78 | 105 | | |
79 | 106 | | |
80 | 107 | | |
81 | 108 | | |
82 | 109 | | |
83 | 110 | | |
84 | 111 | | |
| 112 | + | |
85 | 113 | | |
86 | 114 | | |
87 | 115 | | |
| |||
145 | 173 | | |
146 | 174 | | |
147 | 175 | | |
| 176 | + | |
| 177 | + | |
148 | 178 | | |
149 | 179 | | |
150 | 180 | | |
| |||
156 | 186 | | |
157 | 187 | | |
158 | 188 | | |
| 189 | + | |
| 190 | + | |
159 | 191 | | |
160 | 192 | | |
161 | 193 | | |
| 194 | + | |
162 | 195 | | |
163 | 196 | | |
164 | 197 | | |
| |||
298 | 331 | | |
299 | 332 | | |
300 | 333 | | |
| 334 | + | |
| 335 | + | |
301 | 336 | | |
| 337 | + | |
| 338 | + | |
302 | 339 | | |
303 | 340 | | |
304 | 341 | | |
305 | 342 | | |
306 | | - | |
307 | | - | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
308 | 346 | | |
309 | 347 | | |
310 | 348 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
176 | 183 | | |
177 | 184 | | |
178 | 185 | | |
| |||
182 | 189 | | |
183 | 190 | | |
184 | 191 | | |
| 192 | + | |
185 | 193 | | |
186 | 194 | | |
| 195 | + | |
187 | 196 | | |
188 | 197 | | |
189 | 198 | | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
190 | 217 | | |
191 | 218 | | |
192 | 219 | | |
| |||
213 | 240 | | |
214 | 241 | | |
215 | 242 | | |
216 | | - | |
217 | | - | |
| 243 | + | |
| 244 | + | |
218 | 245 | | |
219 | 246 | | |
220 | 247 | | |
| |||
259 | 286 | | |
260 | 287 | | |
261 | 288 | | |
| 289 | + | |
| 290 | + | |
262 | 291 | | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
263 | 297 | | |
264 | 298 | | |
265 | 299 | | |
| |||
273 | 307 | | |
274 | 308 | | |
275 | 309 | | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
276 | 314 | | |
277 | 315 | | |
278 | 316 | | |
279 | 317 | | |
280 | 318 | | |
281 | 319 | | |
282 | 320 | | |
| 321 | + | |
283 | 322 | | |
284 | 323 | | |
285 | 324 | | |
| |||
325 | 364 | | |
326 | 365 | | |
327 | 366 | | |
| 367 | + | |
| 368 | + | |
328 | 369 | | |
329 | 370 | | |
330 | 371 | | |
| |||
351 | 392 | | |
352 | 393 | | |
353 | 394 | | |
| 395 | + | |
| 396 | + | |
354 | 397 | | |
355 | 398 | | |
356 | 399 | | |
| |||
0 commit comments