Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions docs/issues/remote-tool-result-images/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Plan

## Current behavior

Tool execution can attach visual previews to `tool_call.imagePreviews`. The desktop renderer shows those previews only inside the expanded tool-call details, not as normal assistant image messages. `prepareToolImagePreviewPresentation()` currently promotes only successful built-in `image_generate` previews into assistant `image` blocks. Other tool result images, including screenshots, remain embedded in the tool-call metadata.

Remote snapshots historically persisted only assistant `image` blocks. The first fix added a fallback that also persists `tool_call.imagePreviews`, but the broader issue is conversation-level visibility: the assistant transcript itself should contain the image result.

## Implementation approach

1. Generalize `prepareToolImagePreviewPresentation()` so successful, non-error tool result previews with usable `data` are promoted into assistant `image` blocks for any tool source.
Comment thread
zhangmo8 marked this conversation as resolved.
2. Keep the existing special-case behavior for built-in `image_generate`: its previews are promoted and removed from the tool-call detail panel.
3. For other tools, promote usable previews while preserving metadata-only/unusable previews on `tool_call.imagePreviews` so the detail panel can still show what is available.
4. Add stable image block metadata linking promoted images back to the tool call and preview source/title.
5. Keep the remote snapshot fallback for legacy conversations where previews are already stored only in `tool_call.imagePreviews`.
6. Update tests to cover screenshot/tool-output promotion in the normal runtime path and the remote fallback path.

## Affected interfaces

- `AssistantMessageBlock` remains unchanged; promoted images use existing `type: 'image'` and `image_data` fields.
- `AssistantMessageExtra` gains optional metadata keys through its existing index signature, such as `toolCallId`, `toolImagePreviewId`, `toolImagePreviewSource`, and `toolImagePreviewTitle`.
- `RemoteConversationSnapshot.generatedImages` remains unchanged.

## Data flow

1. Tool execution returns `imagePreviews`.
2. Runtime normalizes the tool result and calls `prepareToolImagePreviewPresentation()`.
3. Usable previews become assistant `image` blocks inserted after the tool-call block.
4. Desktop conversation renders those images as normal assistant images.
5. Remote snapshot persists those image blocks into `generatedImages`; legacy unpromoted previews are also persisted as fallback.

## Compatibility

- Existing generated-image behavior remains compatible: built-in `image_generate` still hides promoted previews from the tool detail panel.
- Saved conversations with only `tool_call.imagePreviews` continue remote delivery via the fallback persistence path.
- Error tool results are not promoted into normal image blocks.

## Test strategy

- Update `agentRuntimePresenter/dispatch` tests to assert generic successful tool image previews are promoted into assistant image blocks.
- Keep tests for built-in `image_generate`, MCP same-name tool, and error results aligned with the new promotion rules.
- Keep `RemoteConversationRunner` tests covering fallback persistence from `tool_call.imagePreviews`.
- Run focused tests, typecheck, format, i18n, and lint.
36 changes: 36 additions & 0 deletions docs/issues/remote-tool-result-images/spec.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Tool result images in conversation and remote delivery

## User need

Users expect visual tool results to appear as first-class images in the normal chat transcript and in remote-control channels. Today a tool such as `Page.captureScreenshot` can complete successfully and store the screenshot in `tool_call.imagePreviews`, but the assistant may only continue with text or no final content. The image remains hidden behind the tool-call details and remote channels may not receive it unless the result is converted separately.

## Goal

Promote suitable function/tool-call image results into assistant `image` blocks so they are visible in the desktop conversation without depending on the model to restate them. Remote delivery should then reuse the same image blocks and, as a compatibility fallback, still handle unpromoted `tool_call.imagePreviews`.

## Acceptance criteria

- Successful `tool_call` results with resolvable `imagePreviews` create assistant `image` blocks adjacent to the tool call.
Comment thread
zhangmo8 marked this conversation as resolved.
- `Page.captureScreenshot`, MCP image outputs, file-read image previews, and other non-error tool result images can become visible conversation images.
- The tool-call detail panel may still show preview metadata only when an image cannot be promoted or when the tool result is an error.
- The model context can continue safely without requiring the assistant to output the image itself.
- Remote snapshots deliver promoted image blocks through the existing `generatedImages` path and can still deliver legacy/unpromoted tool result previews.
- Raw base64 is not leaked into normal text messages.

## Constraints

- Preserve existing image-generation promotion behavior and compatibility for saved conversations.
- Keep channel-specific remote code unchanged where possible.
- Avoid promoting error tool results as normal assistant images.
- Skip previews without usable image data.

## Non-goals

- Changing remote channel APIs or settings.
- Adding live streaming of images before tool completion.
- Sending images from tools that only expose remote HTTP URLs without cached/data payloads.
- Reworking renderer image components.

## Open questions

None.
9 changes: 9 additions & 0 deletions docs/issues/remote-tool-result-images/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Tasks

- [x] Inspect remote snapshot and channel image delivery flow.
- [x] Document the initial remote fallback issue and implementation plan.
- [x] Persist completed tool-call image previews as remote image assets.
- [x] Re-scope the SDD artifacts to include conversation-level image visibility.
- [x] Promote successful generic tool result image previews into assistant image blocks.
- [x] Update focused tests for generic promotion and remote fallback delivery.
- [x] Run formatter, i18n check/generation, lint, typecheck, and relevant tests.
1 change: 1 addition & 0 deletions src/main/presenter/agentRuntimePresenter/dispatch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -608,6 +608,7 @@ function applyFinalizedToolResults(params: {
}

const imagePresentation = prepareToolImagePreviewPresentation({
toolCallId: stagedResult.toolCallId,
toolName: stagedResult.toolName,
toolSource: stagedResult.toolSource,
serverName: stagedResult.serverName,
Expand Down
37 changes: 27 additions & 10 deletions src/main/presenter/agentRuntimePresenter/imageGenerationBlocks.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {
} from '@shared/agentImageGenerationTool'

export function prepareToolImagePreviewPresentation(params: {
toolCallId?: string
toolName: string
toolSource?: 'mcp' | 'agent'
serverName?: string
Expand All @@ -15,18 +16,12 @@ export function prepareToolImagePreviewPresentation(params: {
toolBlockImagePreviews?: ToolCallImagePreview[]
promotedBlocks: AssistantMessageBlock[]
} {
const { toolName, toolSource, serverName, isError, imagePreviews } = params
const { toolCallId, toolName, toolSource, serverName, isError, imagePreviews } = params
if (!imagePreviews) {
return { promotedBlocks: [] }
}

if (
toolName !== IMAGE_GENERATE_TOOL_NAME ||
toolSource !== 'agent' ||
serverName !== IMAGE_GENERATION_TOOL_SERVER_NAME ||
isError ||
imagePreviews.length === 0
) {
if (isError || imagePreviews.length === 0) {
return {
toolBlockImagePreviews: imagePreviews,
promotedBlocks: []
Expand All @@ -35,9 +30,12 @@ export function prepareToolImagePreviewPresentation(params: {

const timestamp = Date.now()
const promotedBlocks: AssistantMessageBlock[] = []
const remainingToolBlockImagePreviews: ToolCallImagePreview[] = []

for (const preview of imagePreviews) {
const { data, mimeType } = preview
if (!data || !mimeType) {
remainingToolBlockImagePreviews.push(preview)
continue
}

Expand All @@ -49,7 +47,14 @@ export function prepareToolImagePreviewPresentation(params: {
image_data: {
data,
mimeType
}
},
extra: {
...(toolCallId ? { toolCallId } : {}),
toolName,
...(preview.id ? { toolImagePreviewId: preview.id } : {}),
toolImagePreviewSource: preview.source,
...(preview.title ? { toolImagePreviewTitle: preview.title } : {})
} as AssistantMessageBlock['extra']
})
}

Expand All @@ -60,8 +65,20 @@ export function prepareToolImagePreviewPresentation(params: {
}
}

if (
toolName === IMAGE_GENERATE_TOOL_NAME &&
toolSource === 'agent' &&
serverName === IMAGE_GENERATION_TOOL_SERVER_NAME
) {
return {
toolBlockImagePreviews: remainingToolBlockImagePreviews,
promotedBlocks
}
}

return {
toolBlockImagePreviews: [],
toolBlockImagePreviews:
remainingToolBlockImagePreviews.length > 0 ? remainingToolBlockImagePreviews : [],
promotedBlocks
}
}
Expand Down
1 change: 1 addition & 0 deletions src/main/presenter/agentRuntimePresenter/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1265,6 +1265,7 @@ export class AgentRuntimePresenter implements IAgentImplementation {
return { resumed: false }
}
const imagePresentation = prepareToolImagePreviewPresentation({
toolCallId: toolCall.id,
toolName: toolCall.name || '',
toolSource: execution.toolSource,
serverName: execution.serverName,
Expand Down
Loading