Skip to content

feat(screenshot): add CLI options to cap screenshot size at the source#1823

Open
antoinekm wants to merge 1 commit intoChromeDevTools:mainfrom
antoinekm:feat/screenshot-size-cli-options
Open

feat(screenshot): add CLI options to cap screenshot size at the source#1823
antoinekm wants to merge 1 commit intoChromeDevTools:mainfrom
antoinekm:feat/screenshot-size-cli-options

Conversation

@antoinekm
Copy link
Copy Markdown

@antoinekm antoinekm commented Apr 7, 2026

Summary

Adds opt-in CLI flags so operators can cap the size of screenshots returned by take_screenshot before they are embedded in the MCP response. Refs #879.

The flags address two related symptoms reported when MCP clients display screenshots inline:

  1. Per-image dimension limit: hosted LLM APIs commonly reject images exceeding per-image dimension constraints (typical caps are in the 2000-8000 px range, sometimes scaling down further when many images are in the same request). This is the exact error reported in Claude errors if the image dimensions exceed 8000 pixels #879.
  2. Cumulative request size: after many captures, the cumulative base64 payload eventually pushes a request over the per-call body size limit imposed by the LLM API.

Both can be mitigated at the source by reducing format/quality and downscaling the capture.

New flags (all opt-in)

  • --screenshot-format <jpeg|png|webp>: override the default format used by take_screenshot when the caller does not specify one
  • --screenshot-quality <0-100>: override the default JPEG/WebP quality. Ignored for PNG
  • --screenshot-max-width <px>: downscale screenshots wider than this before they are returned
  • --screenshot-max-height <px>: downscale screenshots taller than this. Combines with --screenshot-max-width; the smaller scale wins so both bounds are respected while preserving aspect ratio

For the exact error in #879, the recipe is --screenshot-max-width=8000 --screenshot-max-height=8000 (or a smaller value such as 2000 if many images may end up in the same request, depending on the operator's chosen API).

Implementation

  • Resizing leverages Puppeteer's clip.scale (CDP Page.captureScreenshot), so no new dependencies.
  • Source dimensions per capture mode:
    • viewport: page.viewport()
    • full page: document.documentElement.scrollWidth/scrollHeight via page.evaluate()
    • element (uid): elementHandle.boundingBox()
  • For element and full-page captures with a downscale clip, the call routes through page.screenshot({clip}) so the scale parameter applies. captureBeyondViewport is left to Puppeteer's default (true when a clip is set), preserving correct behavior for elements below the fold and full-page captures.
  • ~150 lines of source code, ~200 lines of new tests.

Backwards compatibility

Fully opt-in: when no flags are set, take_screenshot returns the exact same bytes as before. No behavioral change for existing users.

Design alignment

  • Aligned with the "Reference over Value" principle in docs/design-principles.md: the existing 2 MB threshold still routes oversized screenshots to a temporary file. This change only reduces the size of the inline base64 fallback path, which the principles document calls out as an acceptable exception when MCP clients display images natively.
  • The MCP server hardcodes no LLM-specific size limits. Operators pick the values that match their client/model combination. This keeps the maintenance surface here minimal as model limits evolve, and is intended as a complement to, not a replacement for, fixes in the MCP client itself.

Addressing concerns raised in #879

"It's not feasible for us to maintain this. Limits will change when models change." (@natorion)

The flags are pure parameters; nothing about the upstream LLM is encoded in the server. When a vendor raises (or lowers) a limit, no code change is needed here, only the operator's CLI args change.

"filePath / page_resize already work as a workaround." (@OrKoN)

filePath is great when the call site knows it's about to take a huge screenshot, but as you noted earlier in the thread, an oversized image already in the request history keeps causing failures even on subsequent calls. page_resize works but mutates the page being debugged. The resize in this PR happens between Puppeteer and the MCP response, so the inspected page is untouched and the failure mode is prevented at the source.

"Should be fixed client side."

Agreed, this PR is intended as a complement, not a substitute. A client-side fix (e.g. compaction evicts/downsamples old images) handles the cumulative case for any MCP. A server-side cap handles the per-call dimension limit for users who hit it before compaction can kick in. The two address overlapping but distinct failure modes.

Happy to drop or rework any of this if the maintainers prefer a different shape, for example making the threshold automatic from a single --max-image-bytes knob, or rejecting the PR entirely in favor of waiting for a client-side fix. Just wanted to put a concrete option on the table.

Tests

Added 6 new tests:

  • honors screenshotFormat default from CLI args
  • keeps "png" as default format when no CLI override is set
  • downscales viewport screenshot when screenshotMaxWidth is set
  • downscales using the smaller scale when both max-width and max-height are set
  • does not resize when source is smaller than the max bounds
  • downscales full page screenshot when screenshotMaxWidth is set

All 627 tests in the suite pass. npm run typecheck and npm run check-format are clean.

Notes for reviewers

  • The dimensions compared against --screenshot-max-width/height are CSS pixels (page.viewport()), not raw bitmap pixels. With deviceScaleFactor > 1 (HiDPI emulation) the actual bitmap may still be larger. Happy to clarify this in the option description if preferred.
  • For element captures with a downscale clip, the call routes through page.screenshot({clip}) instead of element.screenshot(). Same-frame elements are correct (boundingBox returns main-frame coords). I have not exercised this path against cross-origin iframe elements; let me know if you'd like a fallback there.
  • The PR is currently in Draft state pending CLA verification and any feedback on the framing above.

Refs #879

@antoinekm antoinekm marked this pull request as draft April 7, 2026 09:13
Adds opt-in CLI flags so operators can cap the size of screenshots
returned by `take_screenshot` before they are embedded in the MCP
response. Addresses two related symptoms reported when MCP clients
display screenshots inline:

1. The hosted LLM API rejects images exceeding its per-image dimension
   limits (e.g. Anthropic's 8000x8000 px / 2000x2000 px when >20
   images are in the same request).
2. After many captures the cumulative base64 payload pushes the
   request over the per-call body size limit.

Both can be mitigated at the source by reducing format/quality and
downscaling the capture.

New CLI flags (all opt-in, no behavior change when unset):

- --screenshot-format <jpeg|png|webp>: override the default format
  used by take_screenshot when the caller does not specify one.
- --screenshot-quality <0-100>: override the default JPEG/WebP
  quality when the caller does not specify one. Ignored for PNG.
- --screenshot-max-width <px>: downscale screenshots wider than this
  before they are returned.
- --screenshot-max-height <px>: downscale screenshots taller than
  this before they are returned. Combines with --screenshot-max-width;
  the smaller scale wins so both bounds are respected while preserving
  aspect ratio.

Resizing leverages Puppeteer's clip.scale (CDP Page.captureScreenshot)
so no new dependencies are introduced. Source dimensions are computed
per capture mode:

- viewport: page.viewport()
- full page: document.documentElement.scrollWidth/scrollHeight via
  page.evaluate()
- element (uid): elementHandle.boundingBox()

For element and full-page captures with a downscale clip, the call is
routed through page.screenshot({clip}) so the scale parameter applies.
captureBeyondViewport is left to Puppeteer's default (true when a clip
is set), which preserves correct behavior for elements below the fold
and for full-page captures.

Design notes:

- Aligned with the "Reference over Value" principle in
  docs/design-principles.md: the existing 2 MB threshold still routes
  oversized screenshots to a temporary file. This change only reduces
  the size of the inline base64 fallback path, which the principles
  document calls out as an acceptable exception when MCP clients
  display images natively.
- Fully opt-in: when no flags are set, take_screenshot returns the
  exact same bytes as before. No breaking change.
- The MCP server hardcodes no LLM-specific size limits — operators
  pick the values that match their client/model combination. This
  keeps the maintenance surface minimal as model limits evolve and
  is intended as a complement to, not a replacement for, fixes in
  the MCP client itself.
- Compares against CSS pixels (page.viewport()), not raw bitmap
  pixels, so HiDPI emulation behaves predictably from the user's
  perspective.

Tests added (6 new):

- honors screenshotFormat default from CLI args
- keeps "png" as default format when no CLI override is set
- downscales viewport screenshot when screenshotMaxWidth is set
- downscales using the smaller scale when both max-width and
  max-height are set
- does not resize when source is smaller than the max bounds
- downscales full page screenshot when screenshotMaxWidth is set

Refs ChromeDevTools#879
@antoinekm antoinekm force-pushed the feat/screenshot-size-cli-options branch from 90b3282 to f09e1ea Compare April 7, 2026 09:15
@antoinekm antoinekm changed the title feat(screenshot): add CLI options to reduce screenshot size in MCP responses feat(screenshot): add CLI options to cap screenshot size at the source Apr 7, 2026
@antoinekm antoinekm marked this pull request as ready for review April 7, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant