Skip to content

Commit d5b2a92

Browse files
jahoomaclaude
andcommitted
Improve evalbuff prompt generation: full file context, buffbench-style prompts
- Read full file contents at parent commit (up to 500K) to give the prompt generator rich context about the codebase, matching buffbench's approach - Include the complete diff (up to 200K chars) instead of truncating at 8K - Rewrite system prompt to produce human-like prompts: high-level functional requirements, natural language, no file paths unless a human would mention them - Skip commits with diffs >200K instead of >50K Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d68ffba commit d5b2a92

File tree

1 file changed

+112
-24
lines changed

1 file changed

+112
-24
lines changed

evalbuff/src/commit-task-generator.ts

Lines changed: 112 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ export interface CommitTask {
1212
filesChanged: string[]
1313
}
1414

15+
const MAX_DIFF_CHARS = 200_000
16+
1517
/**
1618
* Get a list of commits from the repo, oldest first.
1719
* Starts from `startAfterSha` (exclusive) or HEAD~commitCount if no state.
@@ -87,41 +89,125 @@ export function getCommitInfo(
8789
}
8890

8991
/**
90-
* Generate a human-like task prompt from a commit's message and diff.
91-
* Uses Claude CLI to rephrase the commit into a natural coding task.
92+
* Read a file's content at a specific commit SHA.
93+
* Returns null if the file doesn't exist at that commit.
9294
*/
93-
export async function generatePromptFromCommit(
94-
message: string,
95-
diff: string,
95+
function readFileAtCommit(
96+
repoPath: string,
97+
sha: string,
98+
filePath: string,
99+
): string | null {
100+
try {
101+
return execSync(`git show ${sha}:${JSON.stringify(filePath)}`, {
102+
cwd: repoPath,
103+
encoding: 'utf-8',
104+
maxBuffer: 10 * 1024 * 1024,
105+
})
106+
} catch {
107+
return null
108+
}
109+
}
110+
111+
/**
112+
* Read the full contents of all files being modified at the parent commit.
113+
* This gives the prompt generator context about what the code looks like
114+
* before the change, so it can write a realistic human prompt.
115+
*/
116+
function readFilesAtParent(
117+
repoPath: string,
118+
parentSha: string,
96119
filesChanged: string[],
97-
): Promise<string> {
98-
const systemPrompt = `You are generating a task prompt that a developer might write to ask a coding agent to make changes to a codebase. You'll be given a git commit message and diff. Your job is to write a natural, human-sounding prompt that would lead an agent to make similar changes.
120+
): Record<string, string> {
121+
const files: Record<string, string> = {}
122+
let totalSize = 0
123+
const maxTotalSize = 500_000 // 500K total for all files
124+
125+
for (const filePath of filesChanged) {
126+
if (totalSize >= maxTotalSize) break
127+
128+
const content = readFileAtCommit(repoPath, parentSha, filePath)
129+
if (content != null && content.length > 0) {
130+
files[filePath] = content
131+
totalSize += content.length
132+
}
133+
}
134+
135+
return files
136+
}
137+
138+
const PROMPT_GEN_SYSTEM = `You are generating a task prompt that a human developer would realistically write to ask an AI coding agent to make changes to their codebase.
139+
140+
You will receive:
141+
- A git diff showing exactly what was changed
142+
- The full contents of all files being modified (as they looked BEFORE the change)
143+
- The commit message (as a hint, but don't just copy it)
144+
145+
Your job is to write a natural, human-sounding prompt — the kind of thing a developer would type into a chat with an AI assistant.
146+
147+
## Key Principles
148+
149+
1. Focus on high-level functional requirements, not implementation details
150+
- GOOD: "add user authentication to the API"
151+
- BAD: "implement an authenticateUser function in src/auth/middleware.ts"
152+
153+
2. Use natural language — like a Slack message or ticket description
154+
- GOOD: "the nightly CI is pointing at the wrong directory, it should be agents not .agents"
155+
- BAD: "Update the directory reference in .github/workflows/nightly-e2e.yml from .agents to agents"
156+
157+
3. Describe what you WANT or what's WRONG, not how to fix it
158+
- GOOD: "the hover state on buttons looks broken"
159+
- BAD: "change the CSS hover opacity from 0.5 to 0.8 in Button.tsx"
160+
161+
4. Don't reference specific file paths unless a human naturally would. Humans describe the feature area, not the file tree.
162+
- GOOD: "our login page needs to redirect to freebuff.com instead of codebuff.com"
163+
- BAD: "update src/auth/login.ts, src/config/urls.ts, and tests/auth.test.ts to change codebuff.com to freebuff.com"
99164
100-
## Rules
165+
5. Don't over-specify. Leave room for the agent to figure out the implementation.
101166
102-
1. Write as if you're a developer describing what you want done — NOT as if you've seen the solution
103-
2. Be vague enough that the agent has to figure out the implementation details, but specific enough about the desired outcome
104-
3. Do NOT mention specific line numbers, exact variable names from the diff, or implementation details
105-
4. DO mention the general area of the codebase, the feature/bug, and the desired behavior
106-
5. Keep it to 1-4 sentences
107-
6. Sound natural — like a Slack message or a ticket description, not a formal spec
167+
6. Keep it to 1-4 sentences.
168+
169+
7. Read the FULL file contents to understand context. The diff alone can be misleading — understanding the surrounding code helps you write a prompt that makes sense for this codebase.
108170
109171
## Output
110172
111-
Respond with ONLY the prompt text, nothing else.`
173+
Respond with ONLY the prompt text. No quotes, no preamble, no explanation.`
112174

113-
const userPrompt = `Commit message: ${message}
175+
/**
176+
* Generate a human-like task prompt from a commit.
177+
* Reads the full files at the parent commit for context, similar to how
178+
* buffbench uses file-explorer agents to understand the codebase.
179+
*/
180+
export async function generatePromptFromCommit(
181+
repoPath: string,
182+
parentSha: string,
183+
message: string,
184+
diff: string,
185+
filesChanged: string[],
186+
): Promise<string> {
187+
// Read full file contents at the parent commit for context
188+
const fileContents = readFilesAtParent(repoPath, parentSha, filesChanged)
189+
190+
let filesSection = ''
191+
if (Object.keys(fileContents).length > 0) {
192+
filesSection = `## File Contents (before the change)\n\n`
193+
for (const [filePath, content] of Object.entries(fileContents)) {
194+
filesSection += `### ${filePath}\n\`\`\`\n${content}\n\`\`\`\n\n`
195+
}
196+
}
114197

115-
Files changed: ${filesChanged.join(', ')}
198+
const userPrompt = `## Commit Message
199+
${message}
116200
117-
Diff (first 3000 chars):
118-
${diff.slice(0, 3000)}`
201+
${filesSection}## Diff
202+
\`\`\`diff
203+
${diff}
204+
\`\`\``
119205

120206
const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'evalbuff-promptgen-'))
121207
const promptFile = path.join(tmpDir, 'PROMPT_GEN.md')
122208

123209
try {
124-
fs.writeFileSync(promptFile, `${systemPrompt}\n\n---\n\n${userPrompt}`)
210+
fs.writeFileSync(promptFile, `${PROMPT_GEN_SYSTEM}\n\n---\n\n${userPrompt}`)
125211

126212
const output = execSync(
127213
`claude --dangerously-skip-permissions -p "Read ${promptFile} and follow all instructions. Respond with ONLY the task prompt text."`,
@@ -133,7 +219,7 @@ ${diff.slice(0, 3000)}`
133219
},
134220
).trim()
135221

136-
return output || `${message}`
222+
return output || message
137223
} catch {
138224
// Fallback to the commit message itself
139225
return message
@@ -144,7 +230,7 @@ ${diff.slice(0, 3000)}`
144230

145231
/**
146232
* Build a full CommitTask from a SHA.
147-
* Returns null if the commit can't be used (merge, initial, etc).
233+
* Returns null if the commit can't be used (merge, initial, too large diff, etc).
148234
*/
149235
export async function buildCommitTask(
150236
repoPath: string,
@@ -153,8 +239,8 @@ export async function buildCommitTask(
153239
const info = getCommitInfo(repoPath, sha)
154240
if (!info) return null
155241

156-
// Skip commits with very large diffs (likely auto-generated)
157-
if (info.diff.length > 50_000) {
242+
// Skip commits with diffs that exceed our limit
243+
if (info.diff.length > MAX_DIFF_CHARS) {
158244
console.log(`Skipping ${sha.slice(0, 8)}: diff too large (${info.diff.length} chars)`)
159245
return null
160246
}
@@ -165,6 +251,8 @@ export async function buildCommitTask(
165251
}
166252

167253
const prompt = await generatePromptFromCommit(
254+
repoPath,
255+
info.parentSha,
168256
info.message,
169257
info.diff,
170258
info.filesChanged,

0 commit comments

Comments
 (0)