feat(eval): code grader multimodal — structured Content in CodeGraderInput by christso · Pull Request #841 · EntityProcess/agentv

christso · 2026-03-29T03:09:39Z

Closes #821

Changes

Schema: typed Content blocks in `@agentv/eval`

Added ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema (Zod discriminated union on type)
Updated MessageSchema.content from loose string | Record | Record[] to typed string | Content[]
ContentImage uses path (file path), never inline base64 — matches wire format contract
Exported Content schemas and inferred types from @agentv/eval

Payload builder: image materialization in `code-evaluator.ts`

Added materializeContentForGrader() — converts ContentImage blocks for code grader consumption:
- Data URI images (data:image/png;base64,...) → decoded to temp file, replaced with file path
- Path/URL images → source carried through as path field
- Text/file blocks → passed through unchanged
- String content → passed through unchanged (zero-copy fast path)
Lazy temp dir creation (agentv-img-*) — only allocated when images exist
Temp dir cleaned up in finally block alongside file-backed output cleanup

Tests

11 unit tests for materializeContentForGrader (null/undefined, text-only, data URIs, paths, JPEG extension, multiple images, ContentFile preservation, field preservation)
3 integration tests for CodeEvaluator multimodal flow (text-only, image materialization, temp cleanup)
7 schema validation tests (ContentSchema, MessageSchema content variants, CodeGraderInput with Content[])

Depends on

feat(eval): simplify template variables — drop _text suffix, align with industry patterns #839 (feat/825-template-vars)
feat(core): preserve multimodal content blocks in provider responses #833 (feat/818-provider-preserve)

Update Claude and Pi providers to preserve non-text content blocks (images) in Message.content instead of discarding them via extractTextContent(). This enables multimodal content to flow from provider response through to evaluators. Changes: - Create shared claude-content.ts with toContentArray() and extractTextContent() used by all 3 Claude providers - Update claude-cli, claude-sdk, claude providers to use structuredContent ?? textContent pattern - Add toPiContentArray() to pi-utils.ts for Pi provider - Update pi-coding-agent convertAgentMessage() to preserve structured content - Add 23 unit tests covering content preservation, backward compat, and end-to-end multimodal flow Text-only responses still produce plain strings (no unnecessary wrapping). extractTextContent() remains available for backward compatibility. Closes #818 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… feat/821-code-grader-mm

…Input - Add ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema as Zod discriminated union in packages/eval/src/schemas.ts - Update MessageSchema.content to accept string | Content[] (typed blocks) - Add materializeContentForGrader() in code-evaluator.ts: - Data URI images decoded and written to temp files (path, not base64) - Non-URI images pass source through as path field - Text/file blocks unchanged; string content unchanged - Lazy temp dir creation for image files, cleaned up in finally block - Export Content schemas and types from @agentv/eval - Add comprehensive unit tests for schema validation and materialization - Add integration tests for CodeEvaluator with multimodal output Closes #821 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

christso and others added 3 commits March 29, 2026 02:34

Merge remote-tracking branch 'origin/feat/818-provider-preserve' into…

620c5a8

… feat/821-code-grader-mm

christso closed this Mar 29, 2026

christso changed the title ~~feat(eval): LLM grader multimodal — auto-append images to judge message~~ feat(eval): code grader multimodal — structured Content in CodeGraderInput Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): code grader multimodal — structured Content in CodeGraderInput#841

feat(eval): code grader multimodal — structured Content in CodeGraderInput#841
christso wants to merge 3 commits intofeat/825-template-varsfrom
feat/821-code-grader-mm

christso commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Schema: typed Content blocks in @agentv/eval

Payload builder: image materialization in code-evaluator.ts

Tests

Depends on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Mar 29, 2026 •

edited

Loading

Schema: typed Content blocks in `@agentv/eval`

Payload builder: image materialization in `code-evaluator.ts`