-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Objective
Add a mutator subagent to agentv-bench that autonomously generates improved versions of the artifact under test (skill, prompt, config) based on failure analysis from the analyzer subagent.
Currently, agentv-bench Step 5 (Improve) is human-directed — the analyzer identifies failures and suggests improvements, but the human decides what to change. The mutator subagent closes this gap by generating the rewrite itself, enabling unattended optimization loops.
Design Latitude
Location: plugins/agentv-dev/skills/agentv-bench/agents/mutator.md
Inputs (provided by agentv-bench orchestrator):
- Current best artifact content
- Per-assertion pass rates (e.g.,
IDENTIFIES_CLARITY_ISSUES: 3/5) - Top failure descriptions from analyzer
- Original artifact (for reference, not as mutation base)
Output: A rewritten artifact that addresses failing criteria.
Mutation strategy (adapted from karpathy/autoresearch and pi-autoresearch):
- For any assertion below 80% pass rate: add explicit, concrete instructions
- Preserve instructions that already pass consistently
- Prefer simplification when score is maintained (Karpathy's "simplicity criterion" — cleaner code at equal performance is an improvement)
- Never add speculative features — only address observed failures
Integration with agentv-bench Step 5:
- Interactive mode (existing): human still directs improvements; mutator available as optional assist ("generate a suggestion based on failures")
- Autoresearch mode (see feat(bench): autoresearch mode — unattended eval-improve loop with hill-climbing ratchet #748): mutator dispatched automatically, no human input needed
Acceptance Signals
agents/mutator.mdexists in agentv-bench with clear instructions for generating artifact rewrites- agentv-bench Step 5 can dispatch the mutator subagent as an alternative to human-directed improvement
- Mutator output is a complete rewritten artifact (not a diff or suggestion list)
- Mutator reads from "best" version, not from the failed candidate (hill-climbing ratchet)
- Works for skill files (SKILL.md), prompt templates, and agent configs
Non-Goals
- Not a general-purpose code rewriter — only rewrites the specific artifact being evaluated
- Not a replacement for human-directed improvement — both modes coexist
- Does not modify the eval definition (EVAL.yaml) — only the artifact under test
- Does not run evals itself — agentv-bench orchestrates the full loop
Context
The autoresearch pattern — proven by karpathy/autoresearch (ML training optimization) and pi-autoresearch (generic optimization loops) — automates the improvement step: score → keep/drop → mutate → repeat. The mutator subagent is the core building block that enables this in agentv-bench.
Key design insight from Karpathy: constrain mutation to a single file (the artifact) and keep the evaluation harness immutable. This prevents the agent from gaming its own scoring.
Related
- feat(eval): Ralph Loop — iterative improvement with feedback injection #699 — Ralph Loop (complementary: Ralph re-prompts the target with feedback during a run; mutator rewrites the artifact between runs)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status