Skip to content

Commit f8ee6e8

Browse files
committed
evalbuff: add evalbuff/interpreting-task-prompts.md (6259c17)
1 parent e79c6a1 commit f8ee6e8

File tree

3 files changed

+188
-0
lines changed

3 files changed

+188
-0
lines changed

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,5 @@ Make an efficient learning agent that can do anything.
4242
- [`docs/environment-variables.md`](docs/environment-variables.md) — Env var rules, DI helpers, loading order
4343
- [`docs/agents-and-tools.md`](docs/agents-and-tools.md) — Agent system, shell shims, tool definitions
4444
- [`docs/patterns/handle-steps-generators.md`](docs/patterns/handle-steps-generators.md) — handleSteps generator patterns and spawn_agents tool calls
45+
- [docs/evalbuff/interpreting-task-prompts.md](docs/evalbuff/interpreting-task-prompts.md)
46+
- [docs/conventions/simplifying-documentation.md](docs/conventions/simplifying-documentation.md)
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Simplifying Documentation: Distill, Don't Delete
2+
3+
When asked to "simplify" or "make more concise" documentation, the goal is to **distill content to its essential information**, not to remove entire sections.
4+
5+
## Common Mistake
6+
7+
Agents often interpret simplification requests as license to delete:
8+
- Entire sections ("Goals", "Key Technologies", "Conventions")
9+
- Important qualifiers ("advanced" vs just "coding agent")
10+
- Critical developer guidance (force-push rules, git conventions)
11+
12+
## Correct Approach
13+
14+
### 1. Identify Core Information
15+
16+
Before removing anything, determine what's **essential** vs **verbose**:
17+
- Project mission/goals → distill to one clear sentence, don't remove
18+
- Key technologies → keep the list, remove explanatory text
19+
- Important conventions → keep critical rules (git force-push, security patterns), remove trivial ones
20+
21+
### 2. Compress, Don't Cut
22+
23+
**WRONG:**
24+
```diff
25+
-## Goals
26+
-
27+
-- Make expert engineers faster (power-user focus).
28+
-- Reduce time/effort for common programming tasks.
29+
-- Improve via iteration/feedback (learn/adapt from usage).
30+
```
31+
32+
**CORRECT:**
33+
```diff
34+
-## Goals
35+
+## Goal
36+
37+
-- Make expert engineers faster (power-user focus).
38+
-- Reduce time/effort for common programming tasks.
39+
-- Improve via iteration/feedback (learn/adapt from usage).
40+
+Make an efficient learning agent that can do anything.
41+
```
42+
43+
### 3. Preserve Critical Context
44+
45+
Some information seems verbose but is **architecturally important**:
46+
- Qualifiers like "advanced" that distinguish the project's capabilities
47+
- Relationships between components ("freebuff and evalbuff are parts of Codebuff")
48+
- Developer conventions that prevent bugs (force-push rules, interactive git command handling)
49+
50+
### 4. Check Against Original Intent
51+
52+
Before finalizing:
53+
1. Does the simplified version still answer "What is this project?"
54+
2. Does it preserve critical developer guidance?
55+
3. Would a new team member understand the essential architecture?
56+
4. Are key differentiators ("advanced", component relationships) still present?
57+
58+
## Example: AGENTS.md Simplification
59+
60+
### What NOT to Do
61+
62+
```markdown
63+
# Codebuff
64+
65+
Codebuff is a coding agent with a composable agent framework.
66+
67+
## Structure
68+
69+
- `cli/` — TUI client
70+
- `sdk/` — JS/TS SDK
71+
```
72+
73+
**Problems:**
74+
- Lost "advanced" qualifier
75+
- Removed project goal entirely
76+
- Removed all conventions including critical git rules
77+
- Too minimal to be useful
78+
79+
### Correct Simplification
80+
81+
```markdown
82+
# Codebuff
83+
84+
Codebuff is an advanced coding agent with a composable agent framework. It also includes:
85+
- freebuff, the free coding agent
86+
- evalbuff, a project to improve an agent through evals
87+
88+
## Goal
89+
90+
Make an efficient learning agent that can do anything.
91+
92+
## Key Technologies
93+
94+
- TypeScript monorepo (Bun workspaces)
95+
- Next.js (web app + API routes)
96+
- Multiple LLM providers
97+
98+
## Conventions
99+
100+
- Never force-push `main` unless explicitly requested.
101+
- Run interactive git commands in tmux.
102+
```
103+
104+
**Why this works:**
105+
- Retains "advanced" qualifier and component context
106+
- Distills goals to one essential line (not removed)
107+
- Keeps Key Technologies section with condensed entries
108+
- Preserves critical developer conventions
109+
- Still concise (reduced from verbose original) but not minimal to the point of uselessness
110+
111+
## Red Flags You're Over-Simplifying
112+
113+
1. You removed an entire section that had multiple paragraphs → probably should condense to 1-2 sentences instead
114+
2. You removed qualifiers or relationships ("advanced", "X is part of Y") → these provide important context
115+
3. The simplified version doesn't explain what makes this project different from alternatives
116+
4. You removed all conventions/rules → keep the most critical 2-3 that prevent common mistakes
117+
118+
## Summary
119+
120+
**Simplify = compress and distill**
121+
**Simplify ≠ delete everything**
122+
123+
When in doubt, condense verbose explanations into concise statements rather than removing sections entirely.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Interpreting Task Prompts (Especially Eval-Generated Ones)
2+
3+
When working with task prompts, especially those auto-generated from commit history for evaluation purposes, the prompt text may not accurately describe the actual work needed.
4+
5+
## The Problem
6+
7+
Evalbuff generates task prompts by analyzing commits. Sometimes the prompt will say "create documentation about X" when the actual ground truth is "fix test scripts in package.json and CI workflow files." This happens when:
8+
9+
1. The commit message is misleading (e.g., "Simplify AGENTS.md" when it actually removes test scripts)
10+
2. The prompt generator focuses on visible file additions rather than the semantic meaning of the change
11+
3. The task is stated in terms of what a developer might ASK for, not what they actually need
12+
13+
## Solution: Always Check Ground Truth First
14+
15+
Before implementing ANY task:
16+
17+
1. **Check if there's a ground truth diff available** - look for references to expected changes, test files, or "what should have been done"
18+
2. **Examine file paths and extensions in the ground truth**:
19+
- `.json` files (especially `package.json`) → likely config/dependency changes
20+
- `.yml`/`.yaml` files in `.github/workflows/` → CI/CD configuration changes
21+
- `.md` files → documentation (but could also be removing or editing existing docs)
22+
- `.ts`/`.js` files → code changes
23+
3. **Read the actual diff content, not just the prompt** - the diff shows EXACTLY what changed
24+
4. **Distinguish between creation vs. modification**:
25+
- Does the ground truth show `new file mode` or additions to existing files?
26+
- Is this refactoring, removal, or net-new functionality?
27+
28+
## Example: The AGENTS.md Confusion
29+
30+
Prompt said:
31+
> "Can you create an AGENTS.md file at the root that provides an overview..."
32+
33+
Ground truth showed:
34+
```diff
35+
--- a/.agents/package.json
36+
+++ b/.agents/package.json
37+
- "test:e2e": "bun test e2e"
38+
--- a/.github/workflows/nightly-e2e.yml
39+
+++ b/.github/workflows/nightly-e2e.yml
40+
- run: cd .agents && bun run test:e2e
41+
+ run: cd agents && bun run test:e2e
42+
```
43+
44+
The actual task was about:
45+
- Removing a test script from package.json
46+
- Fixing directory references in a CI workflow
47+
- NOT about creating documentation
48+
49+
The agent should have recognized the ground truth shows `.json` and `.yml` config files, not `.md` documentation files.
50+
51+
## When In Doubt
52+
53+
If the prompt seems to conflict with file paths/types in the ground truth:
54+
1. Trust the ground truth diff over the prompt text
55+
2. Read the actual file contents being changed
56+
3. Understand the PURPOSE of the change (fixing tests, updating config, refactoring) before implementing
57+
4. Ask clarifying questions if the task is genuinely ambiguous
58+
59+
## Red Flags
60+
61+
- Prompt says "create docs" but ground truth shows only config file changes → likely NOT a docs task
62+
- Prompt says "add feature X" but ground truth removes code → likely a cleanup/refactor task
63+
- Prompt uses vague language ("simplify", "improve") → read the diff to understand the specific technical change

0 commit comments

Comments
 (0)