Python: Add CuaAgentMiddleware for Computer-Use tool by f-trycua · Pull Request #1338 · microsoft/agent-framework

f-trycua · 2025-10-09T07:07:34Z

Motivation and Context

This PR implements the integration between Microsoft Agent Framework and Cua as discussed in issue #1095.

Why is this needed?

Provides Agent Framework with 100+ model configurations (OpenAI, Anthropic, OpenCUA, InternVL, UI-Tars, GLM, etc.) without duplicating model-specific parsers
Enables desktop automation capabilities across Windows, macOS, and Linux through Cua's virtualization infrastructure
Supports composite agents (e.g., "UI-Tars+GPT-4o") combining grounding and planning models
Leverages Cua's existing computer-use infrastructure instead of reimplementing it

Implementation approach:
Following @eavanvalkenburg's guidance in #1095, this uses the ChatMiddleware pattern rather than implementing Cua as a Tool. This delegates the entire agent loop to Cua while maintaining Agent Framework's orchestration and human-in-the-loop capabilities.

Why wrap ComputerAgent instead of just Computer?

ComputerAgent provides the complete agent loop (model inference → parsing → computer actions → multi-step execution) with support for 100+ model configurations
Computer is just the low-level tool for executing actions (click, type, screenshot, etc.)
By wrapping ComputerAgent, we get all of Cua's model support for free without reimplementing provider-agnostic parsers for OpenCUA, InternVL, UI-Tars, GLM, etc.
This architectural choice means Agent Framework benefits from Cua's ongoing model additions automatically

Related issue: #1095

Description

This PR adds agent-framework-cua, a new integration package that provides CuaAgentMiddleware.

Key components:

CuaAgentMiddleware - Middleware that intercepts chat requests and delegates to Cua's ComputerAgent
- Completely bypasses the Agent Framework chat client by setting context.terminate = True
- All model inference is handled by Cua's ComputerAgent (supports 100+ models)
- Handles message format conversion between Agent Framework and Cua
- Supports human-in-the-loop approval workflows (require_approval, approval_interval)
- Transforms Cua results back to Agent Framework ChatResponse format
Type definitions - CuaModelId, CuaProviderType, CuaOSType, etc. for type safety
Examples:
- basic_example.py - Claude Sonnet 4.5 with Linux Docker
- composite_agent_example.py - UI-Tars + GPT-4o composite agent
Package structure - Follows existing integration patterns (agent-framework-redis, agent-framework-mem0)

Architecture:

Agent Framework → CuaAgentMiddleware → Cua ComputerAgent
                      ↓                      ↓
                 terminate=True    Model + Computer Loop
                                           ↓
                                       Results
                                           ↓
Agent Framework ← CuaAgentMiddleware ← Cua ComputerAgent

The chat client becomes a no-op since CuaAgentMiddleware terminates middleware execution and returns the response directly from Cua.

Technical notes:

Requires Python ≥3.12 (due to cua-agent dependency)
Uses dummy chat_client since middleware terminates execution before reaching it
Fixed ChatMessage.content → ChatMessage.text/contents attribute usage in middleware

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? No

f-trycua · 2025-10-09T21:14:11Z

@microsoft-github-policy-service agree company="Cua AI, Inc."

f-trycua · 2025-10-10T18:29:12Z

I've also been thinking about how to also support .NET with this integration. Since Agent Framework already has built-in MCP support (see samples), we could create a Python MCP server that wraps Cua's ComputerAgent.

The flow would be:

.NET Agent → MCP Client → stdio → Python MCP Server → Cua ComputerAgent (100+ models)

Usage from C#:

// Connect to Cua MCP server
await using var mcpClient = await McpClient.CreateAsync(new StdioClientTransport(new()
{
    Command = "python",
    Arguments = ["-m", "cua.mcp.server"],
}));

var agent = chatClient.CreateAIAgent(
    instructions: "You are a desktop automation assistant.",
    tools: [.. (await mcpClient.ListToolsAsync()).Cast<AITool>()]
);

await agent.RunAsync("Open Firefox and search for 'Python tutorials'");

This approach would:

✅ Reuse existing MCP infrastructure (no new .NET bindings needed)
✅ Give .NET agents access to all 100+ Cua models
✅ Work cross-language via the MCP protocol

We have a pending PR for MCP server support on the Cua side (trycua/cua#427). Once that's merged, I can add C# samples and documentation in a follow-up PR or update this one. Thoughts?

python/samples/getting_started/cua/composite_agent/main.py

python/packages/cua/examples/README.md

python/packages/cua/agent_framework_cua/__init__.py

f-trycua · 2025-10-17T07:15:44Z

Hey @ekzhu - I've addressed your feedback:

API Key Configuration - Added section explaining setup via environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, etc.)
Simplified Exports - Good catch! Removed unused types from __all__, now only exports CuaAgentMiddleware
Restructured Samples - Moved to samples/getting_started/cua/ and added workflow_orchestration showing Agent Framework orchestration + Cua execution synergy
Other - Added experimental disclaimer (DevUI pattern), Cua docs links, updated to cua-xfce image, clarified unused parameters

Happy to chat this week if helpful!

f-trycua · 2025-10-17T18:46:16Z

Hey @ekzhu - I've made some improvements to the API design:

Eliminated Dummy Variables - Created CuaChatClient that properly stores model and instructions configuration. No more need for dummy OpenAIChatClient(model_id="gpt-4o-mini", api_key="dummy-not-used") workarounds.

Before:

# Had to use dummy client
dummy_client = OpenAIChatClient(model_id="gpt-4o-mini", api_key="dummy-not-used")
middleware = CuaAgentMiddleware(
    computer=computer,
    model="anthropic/claude-sonnet-4-5-20250929",
    instructions="You are an assistant.",
)
agent = ChatAgent(chat_client=dummy_client, middleware=[middleware])

After:

# Clean API with CuaChatClient
chat_client = CuaChatClient(
    model="anthropic/claude-sonnet-4-5-20250929",
    instructions="You are an assistant.",
)
middleware = CuaAgentMiddleware(computer=computer)
agent = ChatAgent(chat_client=chat_client, middleware=[middleware])

Standardized Examples - All samples now default to Linux on Docker (cross-platform), with macOS and Windows options shown as alternatives in comments.

Let me know if there's anything else you'd like me to address!

ekzhu

I really like the new interface! It looks very good and polished. I had some issue running it locally with Docker though -- see my comments.

I am a bit concerned about the package name "cua" being overly broad here. I think it may make sense to rename it to "trycua" or something more specific. Right now, it feels like this is the official computer-use feature of the framework.

Another alternative is to move this package to a module inside agent-framework-lab, we can use the extra cua there.

ekzhu · 2025-10-24T22:34:10Z

python/samples/getting_started/cua/basic_example/main.py

+        # Create Cua chat client with model and instructions
+        chat_client = CuaChatClient(
+            model="anthropic/claude-sonnet-4-5-20250929",
+            instructions="You are a desktop automation assistant. Be precise and careful.",


For the examples, let's set the instructions through ChatAgent instead. This is to keep it consistent with the rest of the samples in the repo.

For the examples, let's set the instructions through ChatAgent instead. This is to keep it consistent with the rest of the samples in the repo.

Thanks for flagging this, @ekzhu! CuaAgentMiddleware intercepts the call and drives the run loop, so the chat client never gets a chance to apply its own system message—anything we put there gets ignored. If you need custom guidance, the simplest path is to include it in the prompt you send to agent.run(...); that text is preserved and reaches CUA exactly as written.

ekzhu · 2025-10-24T22:36:29Z

python/samples/getting_started/cua/basic_example/main.py

+async def main():
+    """Run a basic computer use example with Claude."""
+    # Initialize Cua computer (Linux Docker container)
+    async with Computer(os_type="linux", provider_type="docker") as computer:


I had trouble running this example and it failed right here. I am on WSL Ubuntu and running Docker Desktop.

Traceback (most recent call last): File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/computer.py", line 493, in run await self._interface.wait_for_ready(timeout=30) File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 817, in wait_for_ready raise e File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 813, in wait_for_ready await self._wait_for_ready_ws(timeout, interval) File "***/agent-framework/python/.venv/lib/python3.13/site-packages/computer/interface/generic.py", line 938, in _wait_for_ready_ws raise TimeoutError(error_msg) TimeoutError: Could not connect to localhost after 30 seconds ... TimeoutError: Could not connect to WebSocket interface at localhost:8000/ws: Could not connect to localhost after 30 seconds

I have already pulled the image, and I tried this even after I manually started the container from Docker Desktop.

Hi @ekzhu, I'm Adam from the Cua team. I reproduced the sample failure. The culprit is the Docker image name: the code uses trycua/cua-ubuntu:latest, but there’s no Linux/AMD64 manifest for that tag, so the container never starts and the WebSocket wait times out. Pulling trycua/cua-xfce:latest (which is published for AMD64) and tagging it locally as trycua/cua-ubuntu:latest fixes the run.

Pull the AMD64 image Cua documents for Docker:
docker pull --platform=linux/amd64 trycua/cua-xfce:latest

Create a local tag so the provider can find it:
docker tag trycua/cua-xfce:latest trycua/cua-ubuntu:latest

We’ll update the sample to point at the XFCE image so others don’t hit this.

On the Agent Framework side we’ll also land a tiny fix so CuaChatClient imports and applies @use_chat_middleware; that keeps the middleware hook active even when Cua handles the run loop.

ekzhu · 2025-10-24T22:45:30Z

Also there is some merge conflict. Looks like uv.lock needs to be regenerated, and the pyproject.toml file needs to be updated -- just accept both changes.

f-trycua · 2025-11-10T22:57:17Z

I am a bit concerned about the package name "cua" being overly broad here. I think it may make sense to rename it to "trycua" or something more specific. Right now, it feels like this is the official computer-use feature of the framework.

Thanks so much for the feedback and the notes @ekzhu - super helpful.

On the naming:

We own both cua.ai and trycua.com, and Cua is the name of the open-source framework as well as our company, so using the cua package name is intentional and consistent with our ecosystem (CLI, SDKs, cloud API, etc.).

That said, we definitely don’t want it to appear like an “official” Microsoft Agent SDK package. If avoiding confusion is the main concern, a clean alternative for us could be cua-ai (or cua_ai), which still preserves the project identity while making the separation explicit. Happy to make that change if it aligns better with the project’s conventions.

Let me know which direction you’d prefer - we’re flexible as long as the identity remains clear.

YeIIcw · 2025-11-15T05:28:20Z

Hi @markwallace-microsoft, those .NET, workflows, and lab labels were added while I briefly pulled in the wrong files. The PR is back to Python-only now, so could you remove those tags when you get a chance? Thanks!

TaoChenOSU · 2025-11-18T21:30:31Z

python/packages/cua/agent_framework_cua/_chat_client.py

+                # Create middleware
+                middleware = CuaAgentMiddleware(computer=computer)
+
+                # Create agent - no dummy variables needed!


Question: what does it mean by "no dummy variables needed"?

Hi @TaoChenOSU, thanks for spotting that. The “no dummy variables needed” comment is leftover from an earlier draft and will be removed.

Pin all CUA Docker samples to trycua/cua-xfce:latest for Windows/x64 support Drop Anthropic instructions field so Claude requests keep working

add Windows, macOS, and Linux quickstarts under samples/getting_started/cua/setup/ refresh the CUA README to link to the new guides and modernize prerequisites

f-trycua · 2025-11-25T22:26:51Z

Hi @TaoChenOSU @ekzhu - what're the pending items left on this PR?

eavanvalkenburg

Sorry it took a while to have another good look at this, but I have some structural issues with it. The most important is that this setup with the specific CuaChatClient doesn't work for me. The problem I have with it, is that it breaks the expectation that we want users to be able to interchange chat clients without effort. And that includes features like other middlewares on chat, local models, etc. All of that is not possible now, with this. So I think what we should do is: 1) design a computer use content type (that can be used by all major computer use capable api's like Anthropic, OpenAI and google) and 2) use that with the CuaMiddleware here, where the middleware does nothing more then look at the response, if it is computer use content, execute that and ask for another completion, if it isn't then pass the response back up the stack, and be done. That way the chat clients in AF will still be used, and we can inject the CUA Middleware without losing other functionality. let me know if that makes sense @f-trycua

YeIIcw · 2025-12-18T23:00:44Z

Hi @eavanvalkenburg,

Thank you for the detailed feedback! I'd like to propose a refactor that addresses all your concerns. Here's how the new design would address each point:

Proposed Solution:

Computer Use Content Type: Create ComputerUseContent in core framework that works with Anthropic, OpenAI, Google, and any provider emitting computer use function calls.
Middleware Pattern: The middleware would:
- Call await next(context) first (lets chat client execute)
- Detect computer use content in the response
- If found: execute actions, take screenshot, add results, then request another completion
- If not found: return early (pass response through)
Chat Clients Still Used: Never set context.terminate = True, so chat clients always execute. Call next() for initial and subsequent completions.
Other Functionality Preserved: Other middlewares can be combined, local models work, no breaking changes.
Chat Client Interchangeability: Remove CuaChatClient dependency entirely. Work with any chat client (Anthropic, OpenAI, Google, local models, custom implementations).

Proposed Change:

Current: Middleware never calls next(), sets terminate=True, bypasses chat client entirely.

Proposed: Middleware calls next() first, inspects response, executes computer use if found, passes through otherwise.

The middleware would follow the standard pattern you described: look at response → execute if computer use content → ask for another completion → pass through if not.

I've created a design document with detailed before/after comparison and implementation proposal.

Does this approach address your concerns? Happy to discuss any aspects before implementing!

eavanvalkenburg · 2025-12-19T08:17:55Z

@YeIIcw that is what I envisioned initially as well, so good to see that! I am working on those content types already, as we want to ensure parity with the dotnet version and be provider agnostic, so I'll let you know when we have the design ready.

eavanvalkenburg · 2025-12-19T15:15:11Z

I laid out the design here: #1108 we will need to finalize it, but that should give you some idea of how we are thinking about this, please have a look and provide some feedback on what else we should have in those tools and types

eavanvalkenburg · 2026-02-24T11:40:56Z

Closing, because this hasn't moved since december.

markwallace-microsoft added documentation Improvements or additions to documentation python labels Oct 9, 2025

f-trycua mentioned this pull request Oct 9, 2025

Single Agent: Computer Use Integration #1095

Open

markwallace-microsoft requested review from ekzhu and victordibia October 9, 2025 08:45

f-trycua marked this pull request as ready for review October 9, 2025 21:14

ekzhu reviewed Oct 16, 2025

View reviewed changes

ekzhu reviewed Oct 24, 2025

View reviewed changes

YeIIcw force-pushed the feature/cua-integration branch from 9c14b49 to 80bd9cd Compare November 15, 2025 05:10

YeIIcw requested a review from a team as a code owner November 15, 2025 05:10

markwallace-microsoft added .NET workflows Related to Workflows in agent-framework lab Agent Framework Lab labels Nov 15, 2025

github-actions bot changed the title ~~Python: Add CuaAgentMiddleware for Computer-Use tool~~ .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Nov 15, 2025

YeIIcw force-pushed the feature/cua-integration branch from 80bd9cd to 7c2bcee Compare November 15, 2025 05:13

crickman requested review from TaoChenOSU, ekzhu and peibekwe and removed request for ekzhu November 17, 2025 21:15

crickman removed the .NET label Nov 17, 2025

crickman added this to Agent Framework Nov 17, 2025

TaoChenOSU reviewed Nov 18, 2025

View reviewed changes

TaoChenOSU changed the title ~~.NET: Python: Add CuaAgentMiddleware for Computer-Use tool~~ Python: Add CuaAgentMiddleware for Computer-Use tool Nov 18, 2025

markwallace-microsoft added the .NET label Nov 18, 2025

github-actions bot changed the title ~~Python: Add CuaAgentMiddleware for Computer-Use tool~~ .NET: Python: Add CuaAgentMiddleware for Computer-Use tool Nov 18, 2025

f-trycua and others added 9 commits November 18, 2025 18:15

Add CuaAgentMiddleware for Cua integration

f94e8c4

Add clarification that instructions are not used in examples

68f2a98

Address PR feedback and restructure Cua integration samples

6593aae

Add instructions parameter support to CuaAgentMiddleware

ca6e723

Update to Claude Sonnet 4.5 and prioritize Docker as default platform

bc7c2a7

Add CuaChatClient to eliminate dummy variables

2f9f7af

Update _chat_client.py

c37d9ee

Update CUA samples for compatible Docker image

9e30e90

Pin all CUA Docker samples to trycua/cua-xfce:latest for Windows/x64 support Drop Anthropic instructions field so Claude requests keep working

Add platform setup guides for CUA samples

5c43237

add Windows, macOS, and Linux quickstarts under samples/getting_started/cua/setup/ refresh the CUA README to link to the new guides and modernize prerequisites

YeIIcw force-pushed the feature/cua-integration branch from 2c5566a to 5c43237 Compare November 18, 2025 23:17

YeIIcw and others added 3 commits November 18, 2025 18:30

Clean up CUA integration comments

d194bda

Merge branch 'main' into feature/cua-integration

a7e7be1

Merge branch 'main' into feature/cua-integration

d12a612

markwallace-microsoft removed this from Agent Framework Dec 2, 2025

YeIIcw and others added 4 commits December 4, 2025 12:48

Merge branch 'main' into feature/cua-integration

aad1a4a

Merge branch 'main' into feature/cua-integration

28d7a04

Merge branch 'main' into feature/cua-integration

6daa8ff

Merge branch 'main' into feature/cua-integration

4b366c0

markwallace-microsoft removed the .NET label Dec 10, 2025

markwallace-microsoft changed the title ~~.NET: Python: Add CuaAgentMiddleware for Computer-Use tool~~ Python: Add CuaAgentMiddleware for Computer-Use tool Dec 10, 2025

eavanvalkenburg reviewed Dec 12, 2025

View reviewed changes

eavanvalkenburg closed this Feb 24, 2026

Conversation

f-trycua commented Oct 9, 2025

Motivation and Context

Description

Contribution Checklist

Uh oh!

f-trycua commented Oct 9, 2025

Uh oh!

f-trycua commented Oct 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

f-trycua commented Oct 17, 2025

Uh oh!

f-trycua commented Oct 17, 2025

Uh oh!

ekzhu left a comment

Choose a reason for hiding this comment

Uh oh!

ekzhu Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

YeIIcw Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

ekzhu Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

YeIIcw Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

ekzhu commented Oct 24, 2025

Uh oh!

f-trycua commented Nov 10, 2025

Uh oh!

YeIIcw commented Nov 15, 2025

Uh oh!

TaoChenOSU Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

YeIIcw Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

f-trycua commented Nov 25, 2025

Uh oh!

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Uh oh!

YeIIcw commented Dec 18, 2025

Uh oh!

eavanvalkenburg commented Dec 19, 2025

Uh oh!

eavanvalkenburg commented Dec 19, 2025

Uh oh!

eavanvalkenburg commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants