Skip to content

feat(bench): mutator subagent — autonomous artifact rewriting from failure analysis #746

@christso

Description

@christso

Objective

Add a mutator subagent to agentv-bench that autonomously generates improved versions of the artifact under test (skill, prompt, config) based on failure analysis from the analyzer subagent.

Currently, agentv-bench Step 5 (Improve) is human-directed — the analyzer identifies failures and suggests improvements, but the human decides what to change. The mutator subagent closes this gap by generating the rewrite itself, enabling unattended optimization loops.

Design Latitude

Location: plugins/agentv-dev/skills/agentv-bench/agents/mutator.md

Inputs (provided by agentv-bench orchestrator):

  • Current best artifact content
  • Per-assertion pass rates (e.g., IDENTIFIES_CLARITY_ISSUES: 3/5)
  • Top failure descriptions from analyzer
  • Original artifact (for reference, not as mutation base)

Output: A rewritten artifact that addresses failing criteria.

Mutation strategy (adapted from karpathy/autoresearch and pi-autoresearch):

  • For any assertion below 80% pass rate: add explicit, concrete instructions
  • Preserve instructions that already pass consistently
  • Prefer simplification when score is maintained (Karpathy's "simplicity criterion" — cleaner code at equal performance is an improvement)
  • Never add speculative features — only address observed failures

Integration with agentv-bench Step 5:

Acceptance Signals

  • agents/mutator.md exists in agentv-bench with clear instructions for generating artifact rewrites
  • agentv-bench Step 5 can dispatch the mutator subagent as an alternative to human-directed improvement
  • Mutator output is a complete rewritten artifact (not a diff or suggestion list)
  • Mutator reads from "best" version, not from the failed candidate (hill-climbing ratchet)
  • Works for skill files (SKILL.md), prompt templates, and agent configs

Non-Goals

  • Not a general-purpose code rewriter — only rewrites the specific artifact being evaluated
  • Not a replacement for human-directed improvement — both modes coexist
  • Does not modify the eval definition (EVAL.yaml) — only the artifact under test
  • Does not run evals itself — agentv-bench orchestrates the full loop

Context

The autoresearch pattern — proven by karpathy/autoresearch (ML training optimization) and pi-autoresearch (generic optimization loops) — automates the improvement step: score → keep/drop → mutate → repeat. The mutator subagent is the core building block that enables this in agentv-bench.

Key design insight from Karpathy: constrain mutation to a single file (the artifact) and keep the evaluation harness immutable. This prevents the agent from gaming its own scoring.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions