Skip to content

Python: Add Azure AI Agent V2 computer use tool sample#2210

Closed
TaoChenOSU wants to merge 3 commits intomainfrom
taochen/python-add-computer-use-tool-sample
Closed

Python: Add Azure AI Agent V2 computer use tool sample#2210
TaoChenOSU wants to merge 3 commits intomainfrom
taochen/python-add-computer-use-tool-sample

Conversation

@TaoChenOSU
Copy link
Contributor

@TaoChenOSU TaoChenOSU commented Nov 14, 2025

Motivation and Context

Add a sample to show how to create and use an Azure AI agent with a computer use tool.

Description

Add a sample to show how to create and use an Azure AI agent with a computer use tool.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

@TaoChenOSU TaoChenOSU self-assigned this Nov 14, 2025
Copilot AI review requested due to automatic review settings November 14, 2025 01:15
@TaoChenOSU TaoChenOSU requested review from dmytrostruk and removed request for Copilot November 14, 2025 01:15
@github-actions github-actions bot changed the title Add Azure AI Agent V2 computer use tool sample Python: Add Azure AI Agent V2 computer use tool sample Nov 14, 2025
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Nov 14, 2025

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/openai
   _responses_client.py4187183%144–145, 148–149, 155–156, 159, 166, 201, 231, 259–260, 287, 291, 308, 313, 355, 416, 421, 498, 503, 507–509, 530, 545–546, 550–552, 600, 620–621, 634–635, 651–652, 690, 692, 730, 732, 741–742, 755, 757, 830–836, 853–858, 877, 895, 905, 907, 925–926, 928–930, 941–942, 945, 947
TOTAL14699212185% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
2037 129 💤 0 ❌ 0 🔥 38.969s ⏱️

Copilot AI review requested due to automatic review settings November 14, 2025 01:26
@markwallace-microsoft markwallace-microsoft added the documentation Improvements or additions to documentation label Nov 14, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a comprehensive Python sample demonstrating how to create and use an Azure AI agent with the Computer Use Preview Tool, enabling computer automation tasks through simulated interactions.

Key Changes

  • New sample file demonstrating computer use tool integration with Azure AI agents
  • Core framework enhancement to support raw message representations for specialized tool responses
  • Updated documentation including the new sample in the README

Reviewed Changes

Copilot reviewed 4 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/samples/getting_started/agents/azure_ai/azure_ai_with_computer_use.py New 290-line sample demonstrating computer use tool with state machine for simulated web search workflow, including screenshot handling and action processing
python/samples/getting_started/agents/azure_ai/assets/cua_browser_search.png Binary asset file - initial screenshot showing browser search page
python/samples/getting_started/agents/azure_ai/assets/cua_search_typed.png Binary asset file - screenshot showing typed search query
python/samples/getting_started/agents/azure_ai/assets/cua_search_results.png Binary asset file - screenshot showing search results
python/samples/getting_started/agents/azure_ai/README.md Added documentation entry for the new computer use sample
python/packages/core/agent_framework/openai/_responses_client.py Enhanced message parser to support raw representations for messages without standard content
python/packages/core/tests/openai/test_openai_responses_client.py Added test coverage for raw representation message handling

if item.type == "message":
contents = item.content
for part in contents:
final_output += getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect operator precedence in string concatenation. The current expression getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" will evaluate as (... or "" + "\n") which adds "\n" to an empty string before the or operation, not to the final result.

This should be: (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n"

Add parentheses to ensure the newline is concatenated to the result of the or chain.

Suggested change
final_output += getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n"
final_output += (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n"

Copilot uses AI. Check for mistakes.
screenshots = load_screenshot_assets()
print("Successfully loaded screenshot assets")
except FileNotFoundError:
print("Failed to load required screenshot assets. Please ensure the asset files exist in ../assets/")
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references an incorrect path. The assets directory is located at ./assets/ relative to the script file (as shown in line 74), not at ../assets/.

The error message should be: "Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/"

Suggested change
print("Failed to load required screenshot assets. Please ensure the asset files exist in ../assets/")
print("Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/")

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +90
filename_map = {
"browser_search": "cua_browser_search.png",
"search_typed": "cua_search_typed.png",
"search_results": "cua_search_results.png",
}

for key, path in screenshot_paths.items():
try:
image_base64 = image_to_base64(path)
screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"}
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename_map dictionary is redundant as it duplicates information already present in the screenshot_paths dictionary. The filenames can be extracted directly from the paths using os.path.basename(path).

Consider refactoring to:

for key, path in screenshot_paths.items():
    try:
        image_base64 = image_to_base64(path)
        screenshots[key] = {"filename": os.path.basename(path), "url": f"data:image/png;base64,{image_base64}"}
    except FileNotFoundError as e:
        print(f"Error: Missing required screenshot asset: {e}")
        raise

This eliminates the need to maintain two separate dictionaries with the same information.

Suggested change
filename_map = {
"browser_search": "cua_browser_search.png",
"search_typed": "cua_search_typed.png",
"search_results": "cua_search_results.png",
}
for key, path in screenshot_paths.items():
try:
image_base64 = image_to_base64(path)
screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"}
for key, path in screenshot_paths.items():
try:
image_base64 = image_to_base64(path)
screenshots[key] = {"filename": os.path.basename(path), "url": f"data:image/png;base64,{image_base64}"}

Copilot uses AI. Check for mistakes.
if "content" in args or "tool_calls" in args:
all_messages.append(args)
elif message.raw_representation:
all_messages.append(message.raw_representation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid using raw_representation as input. As far as I know, currently we use this property as output only, unless I miss something. Instead of using raw_representation as input, we can:

  • Allow to pass dict as part of ChatMessage.contents, which will enable breaking-glass scenario for message content types as input.
  • Add new content type for computer use tool. I think this would be a preferred approach, since this tool type exists in both OpenAI Responses API and Azure AI.

Copy link
Member

@eavanvalkenburg eavanvalkenburg Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my comment as well (didn't see this before), and we had a ADR PR discussing the potential of a set of computer use types and decided against it: #796 (comment)

args["content"].append(self._openai_content_parser(message.role, content, call_id_to_id)) # type: ignore
if "content" in args or "tool_calls" in args:
all_messages.append(args)
elif message.raw_representation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is a good idea, there is a reason we have not created abstractions for computer use, and it's because the variety and complexity of the code needed to handle the input and outputs of it across platforms is too complex for our purposes. Adding a raw_representation as a input goes against all that we do and I think if a dev needs this kind of special behavior then they are probably better off building directly against an SDK anyway since it is not abstracted, so it's not like they will be able to swap in and out between models and therefore the added value is low, and putting this method in, might break some other things, and putting this sample in implies we support this scenario, while we really don't...

@TaoChenOSU
Copy link
Contributor Author

Based on offline discussions, we are not ready for this tool. Closing

@TaoChenOSU TaoChenOSU closed this Nov 14, 2025
@crickman crickman deleted the taochen/python-add-computer-use-tool-sample branch December 18, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants