Skip to content

Commit a3de2ca

Browse files
rgarciaclaude
andauthored
Replace Playwright with Kernel native API in OpenAI CUA templates (#124)
## Summary - Replace Playwright (over CDP) with Kernel's native computer control API in both TypeScript and Python OpenAI CUA templates - Add `batch_computer_actions` function tool that executes multiple browser actions in a single API call, reducing latency - Add local test scripts (`test.local.ts` / `test_local.py`) that create remote Kernel browsers for testing without deploying a Kernel app ## Details **New `KernelComputer` class** (TS + Python) wraps the Kernel SDK for all computer actions: - `captureScreenshot`, `clickMouse`, `typeText`, `pressKey`, `scroll`, `moveMouse`, `dragMouse` - `batch` endpoint for batched actions - `playwright.execute` for navigation (`goto`, `back`, `forward`, `getCurrentUrl`) - CUA key name to X11 keysym translation map (ported from Go reference implementation) - Button normalization (CUA model sends numeric button values `1`/`2`/`3` in batch calls) **Batch tool**: System instructions guide the model to prefer `batch_computer_actions` for predictable sequences (e.g., click + type + enter). **Removed dependencies**: `playwright-core`, `sharp` (TS), `playwright` (Python). Bumped `@onkernel/sdk` to `^0.38.0` and `kernel` to `>=0.38.0`. ## Test plan - [x] TypeScript `test.local.ts` E2E: created remote Kernel browser, ran CUA agent (eBay search task), batch tool used successfully, browser cleaned up - [x] Python `test_local.py` E2E: same test, batch tool used on first action (type + enter), agent completed successfully - [x] TypeScript compiles cleanly (`tsc --noEmit`) Made with [Cursor](https://cursor.com) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Moderate risk because it replaces the core browser automation layer (Playwright/CDP) with new Kernel API wrappers and batching semantics, which could change runtime behavior and failure modes. No auth/data model changes beyond new optional replay recording and improved OpenAI request retry handling. > > **Overview** > These templates now use **Kernel’s native computer control API** (screenshot/click/type/scroll/batch) instead of Playwright-over-CDP, via new `KernelComputer` wrappers in both Python and TypeScript. > > The agent loop is updated to support a new `batch_computer_actions` function tool (plus `computer_use_extra` for `goto`/`back`/`url`) with explicit model instructions, post-action screenshot settling, and structured `on_event` streaming for prompts, reasoning, actions, screenshots, and errors. > > Adds local runners (`run_local.py`, `run_local.ts`) to test against a remote Kernel browser without deployment, optional browser replay recording (`replay.py` / `lib/replay.ts`) surfaced as `replay_url`, and updates dependencies/config/docs (Kernel SDK bump, Playwright/sharp/Pillow removals, `.env.example` includes `KERNEL_API_KEY`, `.gitignore` adds caches). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 9f0f9b5. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0bb2afc commit a3de2ca

38 files changed

+3736
-1447
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,7 @@ kernel
4444

4545
# QA testing directories
4646
qa-*
47+
48+
49+
__pycache__
50+
.dmux/
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
# Copy this file to .env and fill in your API key
1+
# Copy this file to .env and fill in your API keys
22
OPENAI_API_KEY=your_openai_api_key_here
3+
KERNEL_API_KEY=your_kernel_api_key_here
Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,27 @@
11
# Kernel Python Sample App - OpenAI Computer Use
22

3-
This is a Kernel application that demonstrates using the Computer Use Agent (CUA) from OpenAI.
3+
This is a Kernel application that demonstrates using the Computer Use Agent (CUA) from OpenAI with Kernel's native browser control API.
44

5-
It generally follows the [OpenAI CUA Sample App Reference](https://github.com/openai/openai-cua-sample-app) and uses Playwright via Kernel for browser automation.
5+
It uses Kernel's computer control endpoints (screenshot, click, type, scroll, batch, etc.) and includes a `batch_computer_actions` tool that executes multiple actions in a single API call for lower latency.
66

7-
See the [docs](https://www.kernel.sh/docs/quickstart) for more information.
7+
## Local testing
8+
9+
You can test against a remote Kernel browser without deploying:
10+
11+
```bash
12+
cp .env.example .env
13+
# Fill in OPENAI_API_KEY and KERNEL_API_KEY in .env
14+
uv run run_local.py
15+
uv run run_local.py --task "go to https://news.ycombinator.com and get the top 5 articles"
16+
```
17+
18+
The local runner defaults to a built-in sample task. Pass `--task "..."` to run a custom prompt locally, and add `--debug` to include verbose in-flight events.
19+
20+
## Deploy to Kernel
21+
22+
```bash
23+
kernel deploy main.py --env-file .env
24+
kernel invoke python-openai-cua cua-task -p '{"task":"go to https://news.ycombinator.com and list top 5 articles"}'
25+
```
26+
27+
See the [docs](https://www.kernel.sh/docs/quickstart) for more information.

0 commit comments

Comments
 (0)