feat(eval): code grader multimodal — structured Content in CodeGraderInput#841
Closed
christso wants to merge 3 commits intofeat/825-template-varsfrom
Closed
feat(eval): code grader multimodal — structured Content in CodeGraderInput#841christso wants to merge 3 commits intofeat/825-template-varsfrom
christso wants to merge 3 commits intofeat/825-template-varsfrom
Conversation
Update Claude and Pi providers to preserve non-text content blocks (images) in Message.content instead of discarding them via extractTextContent(). This enables multimodal content to flow from provider response through to evaluators. Changes: - Create shared claude-content.ts with toContentArray() and extractTextContent() used by all 3 Claude providers - Update claude-cli, claude-sdk, claude providers to use structuredContent ?? textContent pattern - Add toPiContentArray() to pi-utils.ts for Pi provider - Update pi-coding-agent convertAgentMessage() to preserve structured content - Add 23 unit tests covering content preservation, backward compat, and end-to-end multimodal flow Text-only responses still produce plain strings (no unnecessary wrapping). extractTextContent() remains available for backward compatibility. Closes #818 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… feat/821-code-grader-mm
…Input - Add ContentTextSchema, ContentImageSchema, ContentFileSchema, ContentSchema as Zod discriminated union in packages/eval/src/schemas.ts - Update MessageSchema.content to accept string | Content[] (typed blocks) - Add materializeContentForGrader() in code-evaluator.ts: - Data URI images decoded and written to temp files (path, not base64) - Non-URI images pass source through as path field - Text/file blocks unchanged; string content unchanged - Lazy temp dir creation for image files, cleaned up in finally block - Export Content schemas and types from @agentv/eval - Add comprehensive unit tests for schema validation and materialization - Add integration tests for CodeEvaluator with multimodal output Closes #821 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #821
Changes
Schema: typed Content blocks in
@agentv/evalContentTextSchema,ContentImageSchema,ContentFileSchema,ContentSchema(Zod discriminated union ontype)MessageSchema.contentfrom loosestring | Record | Record[]to typedstring | Content[]path(file path), never inline base64 — matches wire format contract@agentv/evalPayload builder: image materialization in
code-evaluator.tsmaterializeContentForGrader()— converts ContentImage blocks for code grader consumption:data:image/png;base64,...) → decoded to temp file, replaced with file pathsourcecarried through aspathfieldagentv-img-*) — only allocated when images existfinallyblock alongside file-backed output cleanupTests
materializeContentForGrader(null/undefined, text-only, data URIs, paths, JPEG extension, multiple images, ContentFile preservation, field preservation)CodeEvaluatormultimodal flow (text-only, image materialization, temp cleanup)Depends on
feat/825-template-vars)feat/818-provider-preserve)