Skip to content

Improve LLM rubric authoring with structured input objects #1341

@christso

Description

@christso

Problem

LLM rubrics are semantic graders, but the current authoring surface makes richer rubric shapes awkward. Code graders can receive structured pass-through config objects, while llm-grader usage is mostly centered on prompt strings, criteria, and AgentV's built-in rubric object shape.

This creates friction for datasets like Dexter that already carry rubric metadata as structured objects, for example { operator, criteria }, where correctness and contradiction need different prompt semantics without adding dataset-specific core fields.

Desired DX

Make llm-grader feel as composable as code graders for semantic grading:

  • allow a custom prompt/template to receive a structured custom input object
  • preserve arbitrary rubric metadata in a documented config field, rather than forcing it into built-in rubric fields
  • support template variables for both common text fields and serialized structured data
  • keep the core primitive generic, with no Dexter-specific schema or operator semantics baked into AgentV

Acceptance Criteria

  • An eval author can write an llm-grader assertion with a custom prompt and a structured input object.
  • The prompt can reference that object through a documented template variable, likely as JSON.
  • Existing llm-grader behavior remains backwards compatible.
  • Built-in rubrics remain available for standard checklist/score-range grading.
  • Docs/examples show when to use built-in rubrics versus a custom prompt + structured input object.
  • A regression test covers a custom llm-grader prompt receiving structured rubric metadata.

Notes

This is motivated by the Dexter financial eval conversion. Dexter-style rubrics should be representable as data passed to an LLM grader prompt, rather than using unsupported fields on AgentV built-in rubric objects or routing semantic grading through a code-grader shim.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions