Skip to content

fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159

Open
yawbtng wants to merge 2 commits into
browserbase:mainfrom
yawbtng:fix/google-cua-mime-type
Open

fix(core): parse screenshot MIME in GoogleCUAClient function responses#2159
yawbtng wants to merge 2 commits into
browserbase:mainfrom
yawbtng:fix/google-cua-mime-type

Conversation

@yawbtng
Copy link
Copy Markdown

@yawbtng yawbtng commented May 22, 2026

why

Closes #2046. GoogleCUAClient builds function-response image parts with inlineData.mimeType = "image/png" and strips the data URL prefix using a PNG-only regex. If a screenshot is JPEG/WebP (or anything non-PNG), the metadata is wrong and the regex no-ops — the whole data URL ends up as the payload.

what changed

  • Added parseImageDataUrl(input) helper that extracts { mimeType, data } from an image/<type> data URL, falling back to image/png + the raw input for unrecognized formats (preserves prior behavior).
  • Replaced the hardcoded PNG strip + hardcoded mimeType at the function-response call site with the helper's parsed values.
  • No change to captureScreenshot() — it still normalizes raw provider input to PNG data URLs, matching the issue's stated current contract.

test plan

New packages/core/tests/unit/google-cua-client.test.ts covering:

  • PNG data URL → image/png + base64 payload
  • JPEG data URL → image/jpeg + base64 payload
  • WebP data URL → image/webp + base64 payload
  • Raw base64 → fallback to image/png with the input as data
  • Non-image data URL (e.g. text/plain) → fallback to image/png

Also ran pnpm run typecheck and pnpm run eslint in packages/core — both clean.


Summary by cubic

Fixes screenshot MIME parsing in GoogleCUAClient function responses by deriving the type and base64 payload from image data URLs, including newline-wrapped payloads. Addresses Linear issue #2046 and ensures JPEG/WebP screenshots set the correct inlineData.mimeType, with a PNG fallback for raw inputs.

  • Bug Fixes
    • Added parseImageDataUrl(input) to extract { mimeType, data } from data:image/*;base64,..., handling newline-wrapped base64.
    • Replaced PNG-only strip and hardcoded mimeType with parsed values.
    • Added unit tests for PNG, JPEG, WebP, newline-wrapped data, raw base64, and non-image data URLs.

Written for commit 246ef50. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 22, 2026

🦋 Changeset detected

Latest commit: 246ef50

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

This PR is from an external contributor and must be approved by a stagehand team member with write access before CI can run.
Approving the latest commit mirrors it into an internal PR owned by the approver.
If new commits are pushed later, the internal PR stays open but is marked stale until someone approves the latest external commit and refreshes it.

@github-actions github-actions Bot added external-contributor Tracks PRs mirrored from external contributor forks. external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. labels May 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b7e1b1861

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

} from "../flowlogger/FlowLogger.js";
import { v7 as uuidv7 } from "uuid";

const IMAGE_DATA_URL_PATTERN = /^data:(image\/[a-zA-Z0-9.+-]+);base64,(.*)$/;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle newline-wrapped base64 data URLs

The new parser regex uses (.*) for the payload, but . does not match newlines in JavaScript regexes, so valid data:image/...;base64,... strings that contain line breaks will fail to match and fall back to returning the entire data URL as inlineData.data. In that case Google receives data:image/...;base64,... instead of raw base64 bytes, which can break function-response image decoding. This is a regression from the previous replace(/^data:image\/png;base64,/, "") behavior, which stripped the prefix regardless of payload newlines.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 246ef50. Switched (.*) to ([\s\S]*) so the payload group captures newline-wrapped (MIME-style) base64, and added a regression test (parses a PNG data URL with newline-wrapped base64 (MIME-style)) so this can't drift back. The new behavior is now a strict superset of the previous replace(/^data:image\/png;base64,/, "") — wrapped payloads, JPEG/WebP, raw base64, and unrecognized URLs all handled. 6/6 tests pass locally.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor:awaiting-approval Waiting for a stagehand team member to approve the latest external commit. external-contributor Tracks PRs mirrored from external contributor forks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core(cua): Google function-response image handling hardcodes PNG mimeType

1 participant