Skip to content

Commit ef01d52

Browse files
jahoomaclaude
andauthored
Add evalbuff: iterative agent improvement via docs optimization (#479)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 224d6e1 commit ef01d52

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+5773
-21
lines changed

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Codebuff is a tool for editing codebases via natural-language instructions to Bu
2424
- `common/` — shared types, tools, schemas, utilities
2525
- `agents/` — main agents shipped with codebuff
2626
- `.agents/` — local agent templates (prompt + programmatic agents)
27+
- `evalbuff/` — automated docs optimization loop (run agent → judge → analyze → improve docs)
2728

2829
## Request Flow
2930

@@ -48,3 +49,4 @@ Codebuff is a tool for editing codebases via natural-language instructions to Bu
4849
- [`docs/testing.md`](docs/testing.md) — DI over mocking, tmux CLI testing
4950
- [`docs/environment-variables.md`](docs/environment-variables.md) — Env var rules, DI helpers, loading order
5051
- [`docs/agents-and-tools.md`](docs/agents-and-tools.md) — Agent system, shell shims, tool definitions
52+
- [`docs/patterns/handle-steps-generators.md`](docs/patterns/handle-steps-generators.md) — handleSteps generator patterns and spawn_agents tool calls

bun.lock

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# handleSteps Generator Pattern for Programmatic Agents
2+
3+
When creating agents that use `handleSteps` generators to programmatically execute tool calls, follow these exact patterns to avoid TypeScript compilation errors.
4+
5+
## Correct handleSteps Signature
6+
7+
```typescript
8+
import type { AgentDefinition } from '../types/agent-definition'
9+
10+
const definition: AgentDefinition = {
11+
// ... other fields
12+
13+
handleSteps: function* ({ agentState, prompt, params }) {
14+
// Generator body
15+
},
16+
}
17+
```
18+
19+
## Yielding Tool Calls
20+
21+
Yield objects with `toolName` and `input` properties. The input schema must match the tool's expected parameters exactly.
22+
23+
### spawn_agents Tool
24+
25+
```typescript
26+
handleSteps: function* ({ agentState, prompt, params }) {
27+
const promptWithDefault = prompt ?? 'Default prompt'
28+
29+
yield {
30+
toolName: 'spawn_agents',
31+
input: {
32+
agents: [
33+
{
34+
agent_type: 'agent-id-1',
35+
prompt: promptWithDefault,
36+
},
37+
{
38+
agent_type: 'agent-id-2',
39+
prompt: promptWithDefault,
40+
},
41+
],
42+
},
43+
}
44+
45+
// After tool execution, yield 'STEP' to let the agent process results
46+
yield 'STEP'
47+
},
48+
```
49+
50+
### Common Mistakes
51+
52+
**WRONG:** Using incorrect property names or nested structures
53+
```typescript
54+
// ❌ Incorrect - wrong tool call structure
55+
yield {
56+
type: 'tool_call',
57+
name: 'spawn_agents',
58+
arguments: { ... }
59+
}
60+
```
61+
62+
**WRONG:** Using `think_deeply` or custom tool names that don't exist
63+
```typescript
64+
// ❌ Incorrect - this tool doesn't exist
65+
yield {
66+
toolName: 'think_deeply',
67+
input: { ... }
68+
}
69+
```
70+
71+
**CORRECT:** Use `toolName` and `input` at the top level
72+
```typescript
73+
// ✅ Correct
74+
yield {
75+
toolName: 'spawn_agents',
76+
input: {
77+
agents: [{ agent_type: 'my-agent', prompt: 'Do something' }]
78+
}
79+
}
80+
```
81+
82+
## Yielding STEP
83+
84+
After yielding tool calls, yield the string `'STEP'` to let the main agent process the results:
85+
86+
```typescript
87+
handleSteps: function* ({ prompt }) {
88+
yield {
89+
toolName: 'spawn_agents',
90+
input: { agents: [...] },
91+
}
92+
93+
// This tells the runtime to run an LLM step to process spawn results
94+
yield 'STEP'
95+
},
96+
```
97+
98+
## Agent Definition Requirements for Spawning
99+
100+
Agents that spawn sub-agents must include:
101+
102+
1. `toolNames: ['spawn_agents']` - Enable the spawn tool
103+
2. `spawnableAgents: ['agent-id-1', 'agent-id-2']` - List allowed sub-agents
104+
105+
```typescript
106+
const definition: AgentDefinition = {
107+
id: 'coordinator',
108+
model: 'openai/gpt-5',
109+
toolNames: ['spawn_agents'],
110+
spawnableAgents: ['sub-agent-1', 'sub-agent-2', 'sub-agent-3'],
111+
// ...
112+
}
113+
```
114+
115+
## Complete Example: Multi-Model Coordinator
116+
117+
See `.agents/deep-thinking/deep-thinker.ts` for a working example:
118+
119+
```typescript
120+
import type { AgentDefinition } from '../types/agent-definition'
121+
122+
const definition: AgentDefinition = {
123+
id: 'deep-thinker',
124+
displayName: 'Deep Thinker Agent',
125+
model: 'openai/gpt-5',
126+
127+
toolNames: ['spawn_agents'],
128+
spawnableAgents: ['gpt5-thinker', 'sonnet-thinker', 'gemini-thinker'],
129+
130+
inputSchema: {
131+
prompt: {
132+
type: 'string',
133+
description: 'The topic to analyze',
134+
},
135+
},
136+
137+
outputMode: 'last_message',
138+
139+
handleSteps: function* ({ prompt }) {
140+
const promptWithDefault = prompt ?? 'Think about this topic'
141+
142+
yield {
143+
toolName: 'spawn_agents',
144+
input: {
145+
agents: [
146+
{ agent_type: 'gpt5-thinker', prompt: promptWithDefault },
147+
{ agent_type: 'sonnet-thinker', prompt: promptWithDefault },
148+
{ agent_type: 'gemini-thinker', prompt: promptWithDefault },
149+
],
150+
},
151+
}
152+
153+
yield 'STEP'
154+
},
155+
}
156+
157+
export default definition
158+
```
159+
160+
## Directory Structure
161+
162+
Place related agents in subdirectories under `.agents/`:
163+
164+
```
165+
.agents/
166+
└── deep-thinking/
167+
├── deep-thinker.ts # Coordinator
168+
├── deepest-thinker.ts # Meta-coordinator
169+
├── gpt5-thinker.ts # Sub-agent
170+
├── sonnet-thinker.ts # Sub-agent
171+
└── gemini-thinker.ts # Sub-agent
172+
```
173+
174+
## Avoid Over-Engineering
175+
176+
When implementing agents:
177+
- Only create files that are directly requested
178+
- Don't add documentation files unless explicitly asked
179+
- Keep agent definitions simple - use `AgentDefinition` type, not custom wrappers
180+
- Don't create factory patterns unless there's clear reuse need

0 commit comments

Comments
 (0)