Deep-CodeAI · Skobeltsyn · May 30, 2026 · May 30, 2026 · May 30, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,17 @@ All notable changes to Agents.KT are documented here. The format follows [Keep a
 
 ## [Unreleased]
 
+### Added — Typed agent attachments (#2470 slice b)
+
+- **`agent.invokeWithAttachments(input, attachments)`** + suspending sibling `invokeSuspendWithAttachments` — user-facing API for vision input via typed `Content.Image`. The runtime dereferences each ref against the agent's injected `BlobStore`, base64-encodes once, and attaches `ImagePart` to the first user `LlmMessage`. Per-provider wire translation is the slice-a work — this commit routes the typed surface into it.
+- **`Agent.blobStore: BlobStore?` + `blobStore(store)` DSL** — optional injection; null when the agent doesn't take attachments. Passing attachments to an agent with no `blobStore` errors fast at invoke time with a clear message — caller misconfiguration surfaces before any provider HTTP.
+- **Closed mime mapping** — `ImageMime → ImagePart.WireMime` for all four variants (`Png`, `Jpeg`, `Gif`, `Webp`). No `String` conversion at any boundary.
+- **Forensic-friendly errors** — when a ref's blob is missing from the store, the error names the ref's hash prefix. Helps debug snapshot resumes against partially-purged stores.
+- **Non-image variants skipped in v1** — `Content.Text` / `Document` / `Audio` / `Video` flow through the attachment path as no-ops. Slice c will wire Document via provider doc-input adapters; Audio/Video land in Stage 2.
+- **Empty / all-skipped attachments → null images** — no provider sees an empty array; legacy wire shape preserved.
+- **Resume composition** — `attachments` argument is ignored on resume because the restored conversation already carries the original `LlmMessage.images` on the saved user turn.
+- **Tests:** 8 unit cases (`AgentAttachmentsTest`) + 6 live cases (`AgentVisionLiveTest`) running the same `VisionFixtures` from slice a through the agent surface on Ollama qwen3-vl:8b, Claude Haiku 4.5, OpenAI gpt-4o-mini. See [docs/multimodal.md](docs/multimodal.md#agent-attachments--typed-contentimage-at-the-invoke-surface-2470-slice-b).
+
 ### Added — Vision input across all providers (#2470 slice a)
 
 - **`LlmMessage.images: List<ImagePart>? = null`** — new optional field; back-compat default leaves the wire shape byte-identical to pre-#2470 for callers that don't pass images. Closed `ImagePart(base64, wireMime)` with `WireMime` sealed type (`Png`, `Jpeg`, `Gif`, `Webp`) — `String` mime is intentionally not accepted in the public ctor.

diff --git a/README.md b/README.md
@@ -157,6 +157,7 @@ These APIs work in `main`, are unit-tested, and are exercised by integration tes
 - **Eval harness** — `DeterministicModelClient(LlmResponse.Text("..."), LlmResponse.ToolCalls(...))` (#2492) scripts model responses for reproducible eval without a live provider; the streaming flow folds into the same Started → ArgsDelta → Finished → End chunk sequence a native streaming provider would emit. Typed assertion DSL `eval<IN, OUT>("name") { input(...); expect { ... }; expectSnapshot(...) }` (#2493) runs against the parsed `OUT` — not regex on the wire. Snapshot mode pins `toLlmInput(output)` JSON for structural diffs; `evalSuite { + case; + case }` bundles cases. Optional `judge("tone", rubric)` (#2494) runs an advisory LLM-as-judge scorer with a typed `@Generable` `JudgeVerdict` — explicitly separate from the deterministic pass/fail contract (judges never gate). See [docs/eval.md](docs/eval.md).
 - **Multimodal foundation** — `sealed Content { Text, Image(ref, ImageMime), Audio, Video, Document(ref, DocMime) }` (#2466) with closed mime types per modality (no `String` mime). Content-addressed `ContentRef(hash, sizeBytes, wireMime)` + `BlobStore` interface, `InMemoryBlobStore` / `FileBlobStore` impls — SHA-256 keys match the manifest-hash family, atomic tmp+rename, process-restart-safe, idempotent put (#2467). Tools can return `ToolResult(parts: List<Content>)` for mixed text + image + document outputs; JSONL audit exporter records `outputParts` per-part summary (`<modality>:<hash-prefix>:<size>:<mime>`) with no blob bytes in the audit row (#2469). Stage 1 wires Image + Document end-to-end. See [docs/multimodal.md](docs/multimodal.md).
 - **Vision input to models** — `LlmMessage(role = "user", content = "...", images = listOf(ImagePart(base64, ImagePart.WireMime.Png)))` (#2470 slice a) reaches all four built-in adapters: Ollama emits `images: [<b64>...]`, Claude emits `{type:"image", source:{type:"base64",...}}` content blocks, OpenAI emits `{type:"image_url", image_url:{url:"data:..."}}` content blocks, DeepSeek inherits OpenAI (silently ignored on non-vision models). Closed `ImagePart.WireMime { Png, Jpeg, Gif, Webp }` — no `String` mime. Programmatic `VisionFixtures.threeSquaresPng()` / `housePng()` (256×256, `BufferedImage`-rendered, ~5KB) + per-provider live tests (qwen3-vl:8b / Haiku 4.5 / gpt-4o-mini) with cost discipline. See [docs/multimodal.md](docs/multimodal.md#vision-input--talking-to-the-model-2470-slice-a).
+- **Typed `Content.Image` at the agent surface** — `agent.invokeWithAttachments("describe", attachments = listOf(Content.Image(ref, ImageMime.Png)))` (#2470 slice b). Inject a `BlobStore` via `blobStore(store)` in the agent DSL; the runtime dereferences each `Content.Image` against the store, base64-encodes once, and attaches `ImagePart` to the first user message. Closed `ImageMime → ImagePart.WireMime` mapping covers all four variants. Misconfiguration errors fast (no `blobStore` configured, missing blob for a ref). Composes with snapshot/resume — refs travel in the snapshot; the same store dereferences on resume. Suspending sibling `invokeSuspendWithAttachments`. Live tests across all three vision providers via the agent surface. See [docs/multimodal.md](docs/multimodal.md#agent-attachments--typed-contentimage-at-the-invoke-surface-2470-slice-b).
 - **Prompt caching across providers** — `agent { caching { enabled = true; cacheSystemPrompt = true; cacheToolDefs = true; cacheConversation = Rolling; ttl = 1.hours; cacheable("doc-id") { ... } } }`. Vendor-neutral DSL drives Anthropic's explicit `cache_control` breakpoints (#2658), OpenAI / DeepSeek automatic prefix caching with a stable `prompt_cache_key` routing hint (#2659 / #2661), Ollama / vLLM / SGLang engine-level KV-cache reuse (no-op hints, #2662), and surfaces cache reads + writes + hit-rate on `TokenUsage` (#2663). A prefix-stability guard (#2657) detects silent cache-busters — timestamps, UUIDs, non-deterministic ordering inside cacheable segments — and warns before you pay for a single non-cached run. Off by default; non-breaking. See [docs/caching.md](docs/caching.md).
 - **JSONL audit exporter** — `:agents-kt-observability` writes append-only, one-line-per-event audit rows with `requestId`, `sessionId`, `manifestHash`, agent/skill/tool ids, event type, provider, and model; raw arguments/results are omitted by default (#1914). See [docs/observability.md](docs/observability.md).
 - **ObservabilityBridge adapters** — `.observe(OtelBridge(tracer))` maps runtime events to OTel spans (#1908), `.observe(LangSmithBridge(apiKey, project))` maps the same events to LangSmith run trees (#1909), and `.observe(LangfuseBridge(publicKey, secretKey))` maps them to Langfuse traces, generations, spans, and events (#1910), while keeping core vendor-free. See [docs/observability.md](docs/observability.md).

diff --git a/docs/multimodal.md b/docs/multimodal.md
@@ -204,10 +204,58 @@ Models overridable via env (`AGENTSKT_TEST_OLLAMA_VISION_MODEL`, `AGENTSKT_TEST_
 
 Assertion shape is loose: the test passes if the model's text response mentions one of a small acceptable keyword set (`3` / `three` for counting; `house` / `home` / `cottage` / `building` / `cabin` / `barn` for the house). Goal is "did the image reach the model and elicit a sensible reply" — not exact phrasing.
 
+## Agent attachments — typed `Content.Image` at the invoke surface (#2470 slice b)
+
+Slice a wires the per-provider wire format on `LlmMessage.images`. Slice b puts a clean user-facing API on top: the caller passes typed `Content.Image` (carrying a `ContentRef`) at the agent's invoke surface; the runtime dereferences against an injected `BlobStore`, base64-encodes once, and attaches `ImagePart` to the first user message.
+
+```kotlin
+import agents_engine.content.Content
+import agents_engine.content.ImageMime
+import agents_engine.content.FileBlobStore
+
+val store = FileBlobStore(Path.of("snapshots/blobs"))
+
+val agent = agent<String, String>("vision") {
+    model { ollama("qwen3-vl:8b") }
+    blobStore(store)
+    skills { skill<String, String>("describe", "") { tools() } }
+}
+
+val ref = store.put(pngBytes, ImageMime.Png.wireMime)
+val reply = agent.invokeWithAttachments(
+    "What is in this image?",
+    attachments = listOf(Content.Image(ref, ImageMime.Png)),
+)
+```
+
+### What the runtime guarantees
+
+- **First user message only.** Attachments ride on the initial user turn. Multi-turn vision is composable but each invocation owns its own first-turn attachments.
+- **Closed-mime mapping.** `ImageMime → ImagePart.WireMime` for all four variants — `Png`, `Jpeg`, `Gif`, `Webp`. No String conversions anywhere.
+- **Fail-fast on misconfiguration.** Passing attachments to an agent with no `blobStore` configured errors fast at invoke time with `Agent '<name>' has attachments but no blobStore`. A ref pointing at a missing blob errors fast with the ref's hash prefix in the message for forensics.
+- **Skip non-image variants.** `Content.Text` / `Content.Document` / `Content.Audio` / `Content.Video` are silently skipped in v1 — slice c will wire Document via provider doc-input adapters, audio/video as part of Stage 2.
+- **Back-compat.** `agent.invokeSuspend(input)` (without attachments) stays byte-identical on the wire. The attachments path is purely additive — opt in via the new `invokeWithAttachments` / `invokeSuspendWithAttachments` entry points.
+- **Snapshot/resume composition.** On resume the saved user turn already carries the original `LlmMessage.images`; the runtime ignores the `attachments` argument because the conversation was restored intact.
+
+### Suspending vs blocking
+
+| Entry point | Use when |
+|---|---|
+| `agent.invokeSuspendWithAttachments(input, attachments)` | Inside coroutine scopes — composition operators, structured concurrency. |
+| `agent.invokeWithAttachments(input, attachments)` | Outside coroutine scopes — quick scripts, REPL, blocking glue. Thin `runBlocking` shim. |
+
+Mirrors the existing `invokeSuspend` / `invoke` split.
+
+### Live tests
+
+`AgentVisionLiveTest.kt` runs the two `VisionFixtures` (`threeSquaresPng()` + `housePng()`) through the agent surface on all three vision-capable providers. Same cost discipline as slice a — 256×256 PNG, `temperature = 0`, `maxTokens = 80`, single-turn. Tagged `live-llm` (Ollama) / `live-cloud-api` (Claude + OpenAI); `assumeTrue` skips per-provider when no key.
+
+The slice-b live tests complement the slice-a tests: the slice-a `VisionLiveTest` exercises the raw `ModelClient`; slice-b's `AgentVisionLiveTest` exercises the full agent loop including BlobStore deref.
+
 ## What's still coming (rest of #2465)
 
 - **#2468** Compile-time modality routing — `Agent<Image, X>` becomes a real type; cross-modality miswiring is a compile error. Multi-part `@Generable` inputs via KSP.
-- **#2470 (slice b)** `Content` → `LlmMessage.images` translation at the agentic loop — currently the caller dereferences `ContentRef` → bytes → `ImagePart` manually. Sliced this way to land the wire format first; the loop hook is a small follow-up.
+- **#2470 slice c** Document/Audio/Video provider-input adapters — currently only images flow through the wire; Document/Audio/Video Content variants are skipped on the attachment path.
 - **#2471** Manifest-anchored modality capability — declared per-agent modalities recorded in the permission manifest, validated against provider capabilities at build time.
 - **#2472** Multimodal memory — `MemoryBank` entries carry `ContentRef` for image/audio/video state.
 - **#2473** Testing fixtures + snapshot + mutation coverage.

diff --git a/src/main/kotlin/agents_engine/core/Agent.kt b/src/main/kotlin/agents_engine/core/Agent.kt
@@ -191,6 +191,18 @@ class Agent<IN, OUT>(
         private set
     var skillChosenListener: ((name: String) -> Unit)? = null
         private set
+    /**
+     * #2470 slice b — optional [agents_engine.content.BlobStore] for
+     * dereferencing `Content.Image` attachments at the agent invoke
+     * surface. When the caller passes `attachments = listOf(Content.Image(
+     * ref, mime))`, the runtime reads the bytes from this store and builds
+     * the corresponding [agents_engine.model.ImagePart] for the first user
+     * LlmMessage. Null when the agent doesn't accept image attachments —
+     * passing attachments to such an agent errors fast at invoke time
+     * with a clear message.
+     */
+    var blobStore: agents_engine.content.BlobStore? = null
+        private set
     var memoryBank: MemoryBank? = null
         private set
     var routerRationaleListener: ((rationale: String) -> Unit)? = null
@@ -547,6 +559,36 @@ class Agent<IN, OUT>(
         }
     }
 
+    /**
+     * #2470 slice b — inject a [agents_engine.content.BlobStore] so the
+     * agent can dereference `Content.Image` attachments at invoke time.
+     *
+     * ```kotlin
+     * val store = FileBlobStore(Path.of("blobs"))
+     * val agent = agent<String, String>("vision") {
+     *     model { ollama("qwen3-vl:8b") }
+     *     blobStore(store)
+     *     skills { skill<String, String>("describe", "") { tools() } }
+     * }
+     *
+     * val ref = store.put(pngBytes, ImageMime.Png.wireMime)
+     * val out = agent.invokeWithAttachments(
+     *     "What is in this image?",
+     *     attachments = listOf(Content.Image(ref, ImageMime.Png)),
+     * )
+     * ```
+     *
+     * The runtime reads the blob from this store, base64-encodes once,
+     * and attaches it to the first user LlmMessage as
+     * `images: List<ImagePart>`. Per-provider wire translation is the
+     * #2470 slice-a work in `OllamaClient` / `ClaudeClient` /
+     * `OpenAiClient`.
+     */
+    fun blobStore(store: agents_engine.content.BlobStore) {
+        checkNotFrozen()
+        blobStore = store
+    }
+
     fun tools(block: ToolsBuilder.() -> Unit) {
         checkNotFrozen()
         val builder = ToolsBuilder()
@@ -602,6 +644,46 @@ class Agent<IN, OUT>(
             invokeSuspendForSession(input, emitter = null) { /* no-op */ }
         }
 
+    /**
+     * #2470 slice b — suspending entry point with image attachments. The
+     * caller passes `attachments = listOf(Content.Image(ref, mime), ...)`;
+     * the runtime dereferences each ref against [blobStore], base64-encodes
+     * once, and attaches them to the first user LlmMessage. Per-provider
+     * wire translation is the slice-a work (Ollama / Claude / OpenAI all
+     * already implement the wire format for `LlmMessage.images`).
+     *
+     * Errors fast with a clear message when:
+     * - [blobStore] is null but [attachments] are passed
+     * - A ref's blob is missing from the store (purged / rewired)
+     *
+     * Document / Audio / Video variants in [attachments] are silently
+     * skipped in v1 — they'll be wired through provider doc/audio/video
+     * adapters in later slices of #2470.
+     */
+    suspend fun invokeSuspendWithAttachments(
+        input: IN,
+        attachments: List<agents_engine.content.Content>,
+    ): OUT =
+        withAgentRuntimeContext(newRuntimeContext()) {
+            invokeSuspendForSession(
+                input = input,
+                emitter = null,
+                attachments = attachments,
+            ) { /* no-op */ }
+        }
+
+    /**
+     * #2470 slice b — blocking shim over [invokeSuspendWithAttachments]
+     * for callers outside coroutine scopes. Mirrors the [invoke] /
+     * [invokeSuspend] split.
+     */
+    fun invokeWithAttachments(
+        input: IN,
+        attachments: List<agents_engine.content.Content>,
+    ): OUT = kotlinx.coroutines.runBlocking {
+        invokeSuspendWithAttachments(input, attachments)
+    }
+
     /**
      * #2749 — public snapshot/resume seam.
      *
@@ -716,6 +798,15 @@ class Agent<IN, OUT>(
          * #2754 — opt out of the snapshot manifest-hash restore guard.
          */
         allowManifestMismatch: Boolean = false,
+        /**
+         * #2470 slice b — image attachments to ride on the FIRST user
+         * LlmMessage. Runtime dereferences each `Content.Image` against
+         * [Agent.blobStore] (errors fast when null) and renders into
+         * [agents_engine.model.ImagePart]. Non-image variants in the
+         * list (Document / Audio / Video) are deferred — Stage 2 with
+         * provider adapters. Null = no attachments; wire shape unchanged.
+         */
+        attachments: List<agents_engine.content.Content>? = null,
         onSkillCompleted: (agents_engine.model.TokenUsage?) -> Unit = { /* no-op */ },
         onSkillStarted: (String) -> Unit,
     ): OUT {
@@ -744,6 +835,7 @@ class Agent<IN, OUT>(
                     onTurnCheckpoint = onTurnCheckpoint,
                     resumeWith = resumeWith,
                     allowManifestMismatch = allowManifestMismatch,
+                    attachments = attachments,
                 )
                 // #1740: surface cumulative usage on the way out. Non-agentic
                 // skills don't go through executeAgentic, so onSkillCompleted