feat(core): support structured llm-grader context by christso · Pull Request #1342 · EntityProcess/agentv

christso · 2026-06-10T02:58:55Z

Summary

AgentV can now express Dexter-style semantic grading directly with llm-grader instead of routing through a code-grader shim. Eval suites can share source metadata once, structured task data stays in the existing input message model, and reusable llm-grader prompt files can receive metadata and rubric entries as template variables.

This keeps the feature AgentV-native and non-breaking: all new YAML fields are optional, wire-format fields stay snake_case, and existing governance inheritance remains supported.

Design notes

Suite-level metadata: is inherited into each test's metadata; per-test metadata merges over suite metadata.
Structured task input reuses the existing input field. Object-valued input now expands to a single user message with JSON object content, matching the existing code-grader pattern of receiving canonical input messages.
llm-grader custom prompts now receive structured metadata and rubrics variables in freeform, rubric, and score-range modes.
Rubric objects accept criteria: as an alias for canonical outcome:, so native Dexter rows like { operator, criteria } do not need dataset-specific transformation.
No separate input_object / inputObject field or template variable is introduced.

av-zk0.3 handoff

Use suite-level metadata once in the financial-research-agent eval:

metadata:
  source_repo: https://github.com/virattt/dexter
  source_commit: 8d9419829f443f84b804d033bb2c3b1fbd788629
  source_file: src/evals/dataset/finance_agent.csv

Each test inherits this into metadata. Per-test metadata merges over it: arrays concatenate suite-first with de-duping, nested objects merge recursively, and per-test scalar values override suite scalars. Top-level governance: still overrides metadata.governance for the governance-specific path.

Use a reusable grader prompt file by referencing it explicitly:

assertions:
  - type: llm-grader
    prompt: file://prompts/dexter-grader.md

If the full grader config is shared, keep it under suite-level assertions or an assertion include. If rubrics vary per test, place the llm-grader assertion on each test while pointing all of them at the same file://... prompt.

Put structured task input in input:

tests:
  - id: apple-finance
    input:
      company: Apple
      ticker: AAPL
    assertions:
      - type: llm-grader
        prompt: file://prompts/dexter-grader.md
        rubrics:
          - operator: correctness
            criteria: Uses the provided ticker.
          - operator: contradiction
            criteria: Does not contradict the source data.

Available llm-grader template variables for the prompt file:

Existing text variables: {{input}}, {{output}}, {{expected_output}}, {{criteria}}
Structured variables: {{metadata}}, {{metadata_json}}, {{rubrics}}, {{rubrics_json}}
Non-_json structured values are formatted JSON; _json values are compact JSON. Missing values render as an empty string.
When input contains a JSON object, {{input}} renders that object as formatted JSON.

Suggested av-zk0.3 validation:

Pull this AgentV change and run bun run build before CLI checks because the CLI imports @agentv/core from dist.
Load the financial-research-agent eval YAML and confirm the parsed test has inherited metadata, object-valued input rendered into question, and rubric outcome populated from criteria.
Run a live eval using llm-grader and inspect exported JSONL: scores[].type should be llm-grader, and the grader prompt should include the expected source metadata, structured input text, and Dexter rubric entries.

Red/green UAT

Red on origin/main: a Dexter-like suite with top-level metadata, object-valued input, and rubric criteria did not surface the suite metadata and skipped the rubric because it was missing canonical outcome.

Green on this branch: the same shape parses with inherited source metadata, object-valued input preserved as canonical user message content and rendered into question, and a rubric outcome populated from criteria:

{
  "metadata": {
    "source_repo": "https://github.com/virattt/dexter",
    "source_commit": "8d9419829f443f84b804d033bb2c3b1fbd788629",
    "source_file": "src/evals/dataset/finance_agent.csv"
  },
  "input": {
    "role": "user",
    "content": { "company": "Apple", "ticker": "AAPL" }
  },
  "question": "{\n  \"company\": \"Apple\",\n  \"ticker\": \"AAPL\"\n}",
  "rubric": {
    "id": "rubric-1",
    "outcome": "Uses the provided ticker.",
    "operator": "correctness",
    "weight": 1,
    "required": true
  }
}

Verification

bun test packages/core/test/evaluation/yaml-parser-metadata.test.ts packages/core/test/evaluation/evaluators_variables.test.ts packages/core/test/evaluation/graders/prompt-resolution.test.ts
bun run lint
bun run typecheck
bun run test
bun run validate:examples
git diff --check
Manual parser UAT for suite metadata + object-valued input + rubric criteria alias

Post-Deploy Monitoring & Validation

No additional production monitoring is required; this changes local eval parsing and grader prompt assembly only. Validation window is the first CI run plus the av-zk0.3 Dexter follow-up. Healthy signals: the Dexter eval YAML parses without rubric warnings, prompt material includes metadata/rubrics and structured task input via {{input}}, and result JSONL uses llm-grader scores. Failure signals: missing-outcome warnings, unresolved-template-variable warnings, or skipped rubric entries; mitigation is to revert this PR or temporarily keep the code-grader shim while av-zk0.3 is adjusted.

cloudflare-workers-and-pages · 2026-06-10T03:25:49Z

Deploying agentv with Cloudflare Pages

Latest commit:	`02bd698`
Status:	✅ Deploy successful!
Preview URL:	https://7c4d1c28.agentv.pages.dev
Branch Preview URL:	https://feat-llm-grader-structured-i.agentv.pages.dev

View logs

christso added 2 commits June 10, 2026 04:54

feat(core): add structured llm grader inputs

e9888b3

docs: clarify optional llm grader input object

dcdde97

refactor(core): reuse input for structured llm grader data

02bd698

christso changed the title ~~feat(core): add structured llm-grader inputs~~ feat(core): support structured llm-grader context Jun 10, 2026

christso merged commit e43a4d4 into main Jun 10, 2026
8 checks passed

christso deleted the feat/llm-grader-structured-input branch June 10, 2026 04:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): support structured llm-grader context#1342

feat(core): support structured llm-grader context#1342
christso merged 3 commits into
mainfrom
feat/llm-grader-structured-input

christso commented Jun 10, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design notes

av-zk0.3 handoff

Red/green UAT

Verification

Post-Deploy Monitoring & Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jun 10, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 10, 2026 •

edited

Loading