fix: sanitize malformed Unicode in MCP responses#39625
Open
furkankoykiran wants to merge 3 commits intomicrosoft:mainfrom
Open
fix: sanitize malformed Unicode in MCP responses#39625furkankoykiran wants to merge 3 commits intomicrosoft:mainfrom
furkankoykiran wants to merge 3 commits intomicrosoft:mainfrom
Conversation
Member
|
Also looks like agent-generated blob that fails lint. Better place to handle it though. Assume Node 20 and hand-craft the tests please. |
Add sanitizeUnicode() function to replace lone surrogates with U+FFFD before JSON serialization. This prevents "invalid high surrogate in string" errors when page content contains malformed Unicode. Uses String.prototype.toWellFormed() on Node 20+, with fallback for Node 18 compatibility. Integrated into response serialization pipeline alongside existing redactText() function.
Add comprehensive tests for malformed Unicode handling in MCP responses: - Lone high surrogates - Lone low surrogates - Valid surrogate pairs (emoji) - Mixed CJK content with malformed Unicode - Multiple consecutive lone surrogates - Console messages with lone surrogates All tests verify that MCP responses don't fail with JSON serialization errors when encountering malformed Unicode from page content.
- Remove Node 18 fallback, use toWellFormed() only - Rewrite tests to match MCP test patterns (3 focused tests) - Use server.setContent() for simpler test setup - Reduce test complexity while maintaining coverage
Contributor
Author
|
Thanks for the feedback. I've simplified the changes based on your suggestions: Changes made:
The branch has also been rebased onto latest main and the PR is now up to date. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes "invalid high surrogate in string" JSON serialization errors in MCP responses when page content contains malformed Unicode (lone surrogates).
Changes
Add
sanitizeUnicode()function (response.ts)String.prototype.toWellFormed()on Node 20+ for native Unicode sanitizationIntegrate into response serialization
redactText()function inserialize()methodAdd comprehensive tests (unicode-serialization.spec.ts)
Context
Closes microsoft/playwright-mcp#1447
Previous attempt (PR #1448 in playwright-mcp) was closed because it fixed the issue at the transport layer in
cli.js. The proper fix location is inresponse.tswhere text processing already happens viaredactText().Verification
npm run ctest-mcp unicode-serialization # 6 passed (35.3s)All tests pass, confirming that: