Feat: Add Zoom tool to AnthropicCuaClient#1913
Feat: Add Zoom tool to AnthropicCuaClient#1913chromiebot wants to merge 4 commits intobrowserbase:mainfrom
Conversation
Add comprehensive test suite for the zoom tool functionality: - Test enable_zoom is included in tool definition for computer_20251124 models - Test enable_zoom is NOT included for older computer_20250124 models - Test convertToolUseToAction handles zoom action with region - Test takeAction captures cropped screenshot for zoom regions - Test fallback to regular screenshot when zoomedScreenshotProvider not set - Test setZoomedScreenshotProvider method exists and works Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement the zoom tool for Anthropic's Computer Use API based on the official documentation. The zoom tool allows Claude to view a specific region of the screen at full resolution. Changes: - Add enable_zoom: true to tool definition for computer_20251124 models (Claude Opus 4.6, Sonnet 4.6, Opus 4.5-20251101) - Add setZoomedScreenshotProvider method to allow custom screenshot crop - Add captureZoomedScreenshot method to capture region screenshots - Handle zoom action in convertToolUseToAction with region coordinates - Update takeAction to use zoomed screenshot provider for zoom actions - Fallback to regular screenshot if zoomedScreenshotProvider not set The zoom tool takes a region parameter with [x1, y1, x2, y2] coordinates defining the top-left and bottom-right corners of the area to inspect. Reference: https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests that verify the CUA handler properly handles zoom actions as no-ops (since actual capture happens in AnthropicCUAClient.takeAction), and validates the clip coordinate conversion logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire up the Anthropic CUA zoom tool in the handler layer: - Add setZoomedScreenshotProvider that uses CDP's clip parameter to capture a specific region [x1, y1, x2, y2] at full resolution - Add zoom case in executeAction as a no-op (the actual zoomed screenshot is captured by AnthropicCUAClient.takeAction) - Import AnthropicCUAClient for instanceof check The zoom tool allows Claude to inspect specific screen regions in detail by requesting a cropped screenshot at native resolution, which is part of the computer_20251124 tool spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run. |
There was a problem hiding this comment.
1 issue found across 4 files
Confidence score: 5/5
- Low-severity issue (3/10) with moderate confidence suggests minimal merge risk; this looks like a test-quality gap rather than a functional regression in production code.
- In
packages/core/tests/unit/cua-handler-zoom.test.ts, the current assertion is tautological and does not verify thatsetupAgentClientactually invokessetZoomedScreenshotProvider, so wiring behavior could go untested. - Pay close attention to
packages/core/tests/unit/cua-handler-zoom.test.ts- strengthen the assertion to validate the handler-to-client call path instead of only method presence.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/core/tests/unit/cua-handler-zoom.test.ts">
<violation number="1" location="packages/core/tests/unit/cua-handler-zoom.test.ts:169">
P3: The test is tautological: it only checks that FakeCuaClient has a setZoomedScreenshotProvider method, which is always true, and never verifies that setupAgentClient actually calls it. Because the handler only wires the zoom provider for AnthropicCUAClient instances, this test can pass even if that wiring regresses.</violation>
</file>
Architecture diagram
sequenceDiagram
participant LLM as Anthropic API (Claude)
participant Client as AnthropicCUAClient
participant Handler as V3CuaAgentHandler
participant Page as Browser (Playwright/CDP)
Note over Client,Handler: Initialization Phase
Handler->>Client: NEW: setZoomedScreenshotProvider(callback)
Note over LLM,Page: Runtime Tool Execution
LLM->>Client: tool_use: computer (action: "zoom", region: [x1, y1, x2, y2])
Client->>Client: CHANGED: convertToolUseToAction()
Note right of Client: Maps zoom request to internal action
Client->>Client: NEW: captureZoomedScreenshot(region)
alt Zoomed Provider Set
Client->>Handler: Invoke callback(region)
Handler->>Page: NEW: screenshot({ clip: {x, y, width, height} })
Page-->>Handler: Buffer (High-res crop)
Handler-->>Client: Base64 string
else Fallback (Provider Not Set)
Client->>Client: getFullScreenshot()
Client-->>Client: Base64 string
end
Client->>Handler: executeAction(type: "zoom")
Handler-->>Client: NEW: return success (No-op in handler)
Client->>LLM: tool_result: [ { type: "image", source: ... } ]
Note over LLM,Client: Claude receives high-res crop of specific region
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| createHandler(); | ||
|
|
||
| // Since our mock won't match instanceof, let's verify the method exists | ||
| expect(typeof fakeCuaClient.setZoomedScreenshotProvider).toBe("function"); |
There was a problem hiding this comment.
P3: The test is tautological: it only checks that FakeCuaClient has a setZoomedScreenshotProvider method, which is always true, and never verifies that setupAgentClient actually calls it. Because the handler only wires the zoom provider for AnthropicCUAClient instances, this test can pass even if that wiring regresses.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/tests/unit/cua-handler-zoom.test.ts, line 169:
<comment>The test is tautological: it only checks that FakeCuaClient has a setZoomedScreenshotProvider method, which is always true, and never verifies that setupAgentClient actually calls it. Because the handler only wires the zoom provider for AnthropicCUAClient instances, this test can pass even if that wiring regresses.</comment>
<file context>
@@ -0,0 +1,257 @@
+ createHandler();
+
+ // Since our mock won't match instanceof, let's verify the method exists
+ expect(typeof fakeCuaClient.setZoomedScreenshotProvider).toBe("function");
+ });
+ });
</file context>
|
This PR was approved by @miguelg719 and mirrored to #1955. All further discussion should happen on that PR. |
why
what changed
test plan
Summary by cubic
Adds Zoom tool support to
AnthropicCUAClientand wires it intoV3CuaAgentHandlerso Claude can request high‑res crops of specific screen regions. Improves inspection accuracy without extra navigation.enable_zoomforcomputer_20251124models; convertszoomtool_use into an action with[x1,y1,x2,y2].setZoomedScreenshotProviderandcaptureZoomedScreenshot; falls back to full screenshot when unset.clipand treatszoomas a no‑op action.Written for commit fac3ca1. Summary will update on new commits. Review in cubic