Python: Add Azure AI Agent V2 computer use tool sample#2210
Python: Add Azure AI Agent V2 computer use tool sample#2210TaoChenOSU wants to merge 3 commits intomainfrom
Conversation
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull Request Overview
This pull request adds a comprehensive Python sample demonstrating how to create and use an Azure AI agent with the Computer Use Preview Tool, enabling computer automation tasks through simulated interactions.
Key Changes
- New sample file demonstrating computer use tool integration with Azure AI agents
- Core framework enhancement to support raw message representations for specialized tool responses
- Updated documentation including the new sample in the README
Reviewed Changes
Copilot reviewed 4 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
python/samples/getting_started/agents/azure_ai/azure_ai_with_computer_use.py |
New 290-line sample demonstrating computer use tool with state machine for simulated web search workflow, including screenshot handling and action processing |
python/samples/getting_started/agents/azure_ai/assets/cua_browser_search.png |
Binary asset file - initial screenshot showing browser search page |
python/samples/getting_started/agents/azure_ai/assets/cua_search_typed.png |
Binary asset file - screenshot showing typed search query |
python/samples/getting_started/agents/azure_ai/assets/cua_search_results.png |
Binary asset file - screenshot showing search results |
python/samples/getting_started/agents/azure_ai/README.md |
Added documentation entry for the new computer use sample |
python/packages/core/agent_framework/openai/_responses_client.py |
Enhanced message parser to support raw representations for messages without standard content |
python/packages/core/tests/openai/test_openai_responses_client.py |
Added test coverage for raw representation message handling |
| if item.type == "message": | ||
| contents = item.content | ||
| for part in contents: | ||
| final_output += getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" |
There was a problem hiding this comment.
Incorrect operator precedence in string concatenation. The current expression getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" will evaluate as (... or "" + "\n") which adds "\n" to an empty string before the or operation, not to the final result.
This should be: (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n"
Add parentheses to ensure the newline is concatenated to the result of the or chain.
| final_output += getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" | |
| final_output += (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n" |
| screenshots = load_screenshot_assets() | ||
| print("Successfully loaded screenshot assets") | ||
| except FileNotFoundError: | ||
| print("Failed to load required screenshot assets. Please ensure the asset files exist in ../assets/") |
There was a problem hiding this comment.
The error message references an incorrect path. The assets directory is located at ./assets/ relative to the script file (as shown in line 74), not at ../assets/.
The error message should be: "Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/"
| print("Failed to load required screenshot assets. Please ensure the asset files exist in ../assets/") | |
| print("Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/") |
| filename_map = { | ||
| "browser_search": "cua_browser_search.png", | ||
| "search_typed": "cua_search_typed.png", | ||
| "search_results": "cua_search_results.png", | ||
| } | ||
|
|
||
| for key, path in screenshot_paths.items(): | ||
| try: | ||
| image_base64 = image_to_base64(path) | ||
| screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"} |
There was a problem hiding this comment.
The filename_map dictionary is redundant as it duplicates information already present in the screenshot_paths dictionary. The filenames can be extracted directly from the paths using os.path.basename(path).
Consider refactoring to:
for key, path in screenshot_paths.items():
try:
image_base64 = image_to_base64(path)
screenshots[key] = {"filename": os.path.basename(path), "url": f"data:image/png;base64,{image_base64}"}
except FileNotFoundError as e:
print(f"Error: Missing required screenshot asset: {e}")
raiseThis eliminates the need to maintain two separate dictionaries with the same information.
| filename_map = { | |
| "browser_search": "cua_browser_search.png", | |
| "search_typed": "cua_search_typed.png", | |
| "search_results": "cua_search_results.png", | |
| } | |
| for key, path in screenshot_paths.items(): | |
| try: | |
| image_base64 = image_to_base64(path) | |
| screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"} | |
| for key, path in screenshot_paths.items(): | |
| try: | |
| image_base64 = image_to_base64(path) | |
| screenshots[key] = {"filename": os.path.basename(path), "url": f"data:image/png;base64,{image_base64}"} |
| if "content" in args or "tool_calls" in args: | ||
| all_messages.append(args) | ||
| elif message.raw_representation: | ||
| all_messages.append(message.raw_representation) |
There was a problem hiding this comment.
I would avoid using raw_representation as input. As far as I know, currently we use this property as output only, unless I miss something. Instead of using raw_representation as input, we can:
- Allow to pass
dictas part ofChatMessage.contents, which will enable breaking-glass scenario for message content types as input. - Add new content type for
computer usetool. I think this would be a preferred approach, since this tool type exists in both OpenAI Responses API and Azure AI.
There was a problem hiding this comment.
that was my comment as well (didn't see this before), and we had a ADR PR discussing the potential of a set of computer use types and decided against it: #796 (comment)
| args["content"].append(self._openai_content_parser(message.role, content, call_id_to_id)) # type: ignore | ||
| if "content" in args or "tool_calls" in args: | ||
| all_messages.append(args) | ||
| elif message.raw_representation: |
There was a problem hiding this comment.
I'm not sure this is a good idea, there is a reason we have not created abstractions for computer use, and it's because the variety and complexity of the code needed to handle the input and outputs of it across platforms is too complex for our purposes. Adding a raw_representation as a input goes against all that we do and I think if a dev needs this kind of special behavior then they are probably better off building directly against an SDK anyway since it is not abstracted, so it's not like they will be able to swap in and out between models and therefore the added value is low, and putting this method in, might break some other things, and putting this sample in implies we support this scenario, while we really don't...
|
Based on offline discussions, we are not ready for this tool. Closing |
Motivation and Context
Add a sample to show how to create and use an Azure AI agent with a computer use tool.
Description
Add a sample to show how to create and use an Azure AI agent with a computer use tool.
Contribution Checklist