-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Python: Add Azure AI Agent V2 computer use tool sample #2210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -434,6 +434,8 @@ def _openai_chat_message_parser( | |
| args["content"].append(self._openai_content_parser(message.role, content, call_id_to_id)) # type: ignore | ||
| if "content" in args or "tool_calls" in args: | ||
| all_messages.append(args) | ||
| elif message.raw_representation: | ||
| all_messages.append(message.raw_representation) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would avoid using
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that was my comment as well (didn't see this before), and we had a ADR PR discussing the potential of a set of computer use types and decided against it: #796 (comment) |
||
| return all_messages | ||
|
|
||
| def _openai_content_parser( | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,289 @@ | ||||||||||||||||||||||||||||||||
| # Copyright (c) Microsoft. All rights reserved. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| import asyncio | ||||||||||||||||||||||||||||||||
| import base64 | ||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||
| from enum import Enum | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| from agent_framework import ChatMessage, ChatResponse, DataContent, Role, TextContent | ||||||||||||||||||||||||||||||||
| from agent_framework.azure import AzureAIClient | ||||||||||||||||||||||||||||||||
| from azure.ai.projects.models import ComputerUsePreviewTool | ||||||||||||||||||||||||||||||||
| from azure.identity.aio import AzureCliCredential | ||||||||||||||||||||||||||||||||
| from openai.types.responses import ResponseComputerToolCall | ||||||||||||||||||||||||||||||||
| from openai.types.responses.response import Response | ||||||||||||||||||||||||||||||||
| from openai.types.responses.response_computer_tool_call import Action | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
| Azure AI Agent With Computer Use Tool | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| This sample demonstrates basic usage of AzureAIClient to create an agent | ||||||||||||||||||||||||||||||||
| that can perform computer automation tasks using the ComputerUsePreviewTool. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Pre-requisites: | ||||||||||||||||||||||||||||||||
| - Make sure to set up the AZURE_AI_PROJECT_ENDPOINT. | ||||||||||||||||||||||||||||||||
| - Make sure to deploy a model that supports the computer use tool, currently "computer-use-preview". | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Note that the computer operations in this sample are simulated for demonstration purposes. | ||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| class SearchState(Enum): | ||||||||||||||||||||||||||||||||
| """Enum for tracking the state of the simulated web search workflow.""" | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| INITIAL = "initial" # Browser search page | ||||||||||||||||||||||||||||||||
| TYPED = "typed" # Text entered in search box | ||||||||||||||||||||||||||||||||
| PRESSED_ENTER = "pressed_enter" # Enter key pressed, transitioning to results | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| def image_to_base64(image_path: str) -> str: | ||||||||||||||||||||||||||||||||
| """Convert an image file to a Base64-encoded string. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Args: | ||||||||||||||||||||||||||||||||
| image_path: The path to the image file (e.g. 'image_file.png') | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Returns: | ||||||||||||||||||||||||||||||||
| A Base64-encoded string representing the image. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Raises: | ||||||||||||||||||||||||||||||||
| FileNotFoundError: If the provided file path does not exist. | ||||||||||||||||||||||||||||||||
| OSError: If there's an error reading the file. | ||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
| if not os.path.isfile(image_path): | ||||||||||||||||||||||||||||||||
| raise FileNotFoundError(f"File not found at: {image_path}") | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||
| with open(image_path, "rb") as image_file: | ||||||||||||||||||||||||||||||||
| file_data = image_file.read() | ||||||||||||||||||||||||||||||||
| return base64.b64encode(file_data).decode("utf-8") | ||||||||||||||||||||||||||||||||
| except Exception as exc: | ||||||||||||||||||||||||||||||||
| raise OSError(f"Error reading file '{image_path}'") from exc | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| def load_screenshot_assets() -> dict[str, dict[str, str]]: | ||||||||||||||||||||||||||||||||
| """Load and convert screenshot images to base64 data URLs. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Returns: | ||||||||||||||||||||||||||||||||
| dict: Dictionary mapping state names to screenshot info with filename and data URL | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Raises: | ||||||||||||||||||||||||||||||||
| FileNotFoundError: If any required screenshot asset files are missing | ||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
| # Load demo screenshot images from assets directory | ||||||||||||||||||||||||||||||||
| # Flow: search page -> typed search -> search results | ||||||||||||||||||||||||||||||||
| screenshot_paths = { | ||||||||||||||||||||||||||||||||
| "browser_search": os.path.abspath(os.path.join(os.path.dirname(__file__), "./assets/cua_browser_search.png")), | ||||||||||||||||||||||||||||||||
| "search_typed": os.path.abspath(os.path.join(os.path.dirname(__file__), "./assets/cua_search_typed.png")), | ||||||||||||||||||||||||||||||||
| "search_results": os.path.abspath(os.path.join(os.path.dirname(__file__), "./assets/cua_search_results.png")), | ||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| # Convert images to base64 data URLs with filenames | ||||||||||||||||||||||||||||||||
| screenshots: dict[str, dict[str, str]] = {} | ||||||||||||||||||||||||||||||||
| filename_map = { | ||||||||||||||||||||||||||||||||
| "browser_search": "cua_browser_search.png", | ||||||||||||||||||||||||||||||||
| "search_typed": "cua_search_typed.png", | ||||||||||||||||||||||||||||||||
| "search_results": "cua_search_results.png", | ||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| for key, path in screenshot_paths.items(): | ||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||
| image_base64 = image_to_base64(path) | ||||||||||||||||||||||||||||||||
| screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"} | ||||||||||||||||||||||||||||||||
|
Comment on lines
+81
to
+90
|
||||||||||||||||||||||||||||||||
| filename_map = { | |
| "browser_search": "cua_browser_search.png", | |
| "search_typed": "cua_search_typed.png", | |
| "search_results": "cua_search_results.png", | |
| } | |
| for key, path in screenshot_paths.items(): | |
| try: | |
| image_base64 = image_to_base64(path) | |
| screenshots[key] = {"filename": filename_map[key], "url": f"data:image/png;base64,{image_base64}"} | |
| for key, path in screenshot_paths.items(): | |
| try: | |
| image_base64 = image_to_base64(path) | |
| screenshots[key] = {"filename": os.path.basename(path), "url": f"data:image/png;base64,{image_base64}"} |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect operator precedence in string concatenation. The current expression getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" will evaluate as (... or "" + "\n") which adds "\n" to an empty string before the or operation, not to the final result.
This should be: (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n"
Add parentheses to ensure the newline is concatenated to the result of the or chain.
| final_output += getattr(part, "text", None) or getattr(part, "refusal", None) or "" + "\n" | |
| final_output += (getattr(part, "text", None) or getattr(part, "refusal", None) or "") + "\n" |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message references an incorrect path. The assets directory is located at ./assets/ relative to the script file (as shown in line 74), not at ../assets/.
The error message should be: "Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/"
| print("Failed to load required screenshot assets. Please ensure the asset files exist in ../assets/") | |
| print("Failed to load required screenshot assets. Please ensure the asset files exist in ./assets/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good idea, there is a reason we have not created abstractions for computer use, and it's because the variety and complexity of the code needed to handle the input and outputs of it across platforms is too complex for our purposes. Adding a raw_representation as a input goes against all that we do and I think if a dev needs this kind of special behavior then they are probably better off building directly against an SDK anyway since it is not abstracted, so it's not like they will be able to swap in and out between models and therefore the added value is low, and putting this method in, might break some other things, and putting this sample in implies we support this scenario, while we really don't...