Python: Allow @tool functions to return rich content (images, audio)#4331
Open
giles17 wants to merge 3 commits intomicrosoft:mainfrom
Open
Python: Allow @tool functions to return rich content (images, audio)#4331giles17 wants to merge 3 commits intomicrosoft:mainfrom
giles17 wants to merge 3 commits intomicrosoft:mainfrom
Conversation
…udio) Add support for tool functions to return Content objects that the model can perceive natively. Closes microsoft#4272 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
…o giles/tool-rich-content-results
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enables @tool-decorated functions to return rich content (images, audio, files) that models can perceive natively, rather than having them serialized to JSON strings. This addresses issue #4272 by allowing vision-in-the-loop workflows where tools like capture_screenshot() or render_chart() can feed image content back into the model for analysis.
Changes:
- Core framework now preserves Content objects with rich media instead of JSON-serializing them
- Added
itemsfield to function_result Content to carry rich media alongside text results - Updated all 6 provider implementations to handle rich content (OpenAI Responses, OpenAI Chat, Anthropic support it natively; Bedrock, Ollama, Azure-AI log warnings)
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/packages/core/agent_framework/_types.py | Added items parameter to Content.init and from_function_result() to store rich media items; updated to_dict() to serialize items |
| python/packages/core/agent_framework/_tools.py | Updated parse_result() to return str or list[Content] instead of always serializing; added _build_function_result() helper to separate text and rich items; updated invoke() return type |
| python/packages/core/agent_framework/_mcp.py | Updated _parse_tool_result_from_mcp() to return list[Content] for results containing images/audio instead of JSON strings |
| python/packages/core/agent_framework/openai/_responses_client.py | Injects rich items as separate user message with input_image content after function_call_output |
| python/packages/core/agent_framework/openai/_chat_client.py | Formats tool message content as multi-part array with text and image_url/input_audio/file parts when items present |
| python/packages/anthropic/agent_framework_anthropic/_chat_client.py | Formats rich items as native image blocks in tool_result content array; handles both data and uri image types |
| python/packages/bedrock/agent_framework_bedrock/_chat_client.py | Logs warning when rich items present (Bedrock doesn't support them); omits items from tool result |
| python/packages/ollama/agent_framework_ollama/_chat_client.py | Logs warning when rich items present (Ollama doesn't support them); omits items from tool result |
| python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py | Logs warning when rich items present (Azure AI Agents doesn't support them); omits items from tool output |
| python/packages/core/tests/core/test_types.py | Added 8 new tests for parse_result(), _build_function_result(), and Content.from_function_result() with items; updated 2 existing tests to expect list[Content] instead of JSON |
| python/packages/core/tests/core/test_mcp.py | Updated test_parse_tool_result_from_mcp to expect list[Content] for results with images; added test_parse_tool_result_from_mcp_audio_content |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #4272
When a
@toolfunction returns aContentobject (e.g.Content.from_data(image_bytes, "image/png")), the framework now preserves it as rich content that the model can perceive natively — instead of serializing it to a JSON string.Problem
Previously,
FunctionTool.parse_result()serialized anyContentreturn to JSON text via_make_dumpable(). The model received{"type": "function_call_output", "output": "{...}"}— a text blob, not the actual image. The same issue existed in MCP tool results whereImageContentwas JSON-serialized.Solution
Added an
itemsfield tofunction_resultContent that carries richContentobjects (images, audio, files) alongside the text result. Providers format these items using their existing multi-modal content handling.User API — no decorator changes needed:
Changes
Core framework:
_types.py: Addeditemsfield toContentandfrom_function_result()_tools.py: Updatedparse_result()to preserveContentreturns instead of JSON-serializing. Added_build_function_result()helper. Updatedinvoke()return type._mcp.py: Updated_parse_tool_result_from_mcp()to returnlist[Content]for image/audio instead of JSON stringsAll 6 providers updated:
input_imageafterfunction_call_outputimage_urltool_resultcontent arrayTests: 8 new tests + 2 updated existing tests, all passing.