Skip to content

Fix JUnit XML escaping for supplementary Unicode#14491

Closed
Sean-Kenneth-Doherty wants to merge 1 commit into
pytest-dev:mainfrom
Sean-Kenneth-Doherty:fix-junitxml-supplementary-unicode
Closed

Fix JUnit XML escaping for supplementary Unicode#14491
Sean-Kenneth-Doherty wants to merge 1 commit into
pytest-dev:mainfrom
Sean-Kenneth-Doherty:fix-junitxml-supplementary-unicode

Conversation

@Sean-Kenneth-Doherty
Copy link
Copy Markdown

@Sean-Kenneth-Doherty Sean-Kenneth-Doherty commented May 17, 2026

Fixes #14483.

Summary

  • Use an 8-digit Unicode escape for the XML supplementary-plane range in bin_xml_escape.
  • Restore regression coverage for valid XML boundary code points, including U+10000, emoji, and U+10FFFF.
  • Add the changelog entry and AUTHORS entry requested by the PR template.

The previous static regex used က0-ჿff, but Python \u escapes consume only four hex digits. That left supplementary-plane characters outside the valid XML character range and caused valid emoji/CJK extension characters to be escaped as #x....

Validation

  • .venv/bin/python -m pytest testing/test_junitxml.py::test_invalid_xml_escape -q -> 1 passed
  • .venv/bin/python -m pytest testing/test_junitxml.py -q -> 135 passed, 2 skipped
  • uvx ruff@0.15.12 check src/_pytest/junitxml.py testing/test_junitxml.py
  • uvx ruff@0.15.12 format --check src/_pytest/junitxml.py testing/test_junitxml.py
  • uvx --from mypy==2.0.0 --with iniconfig>=1.1.0 --with attrs>=19.2.0 --with pluggy>=1.5.0 --with packaging --with tomli --with types-setuptools --with types-tabulate --with exceptiongroup>=1.0.0rc8 mypy --pretty --show-error-codes --no-warn-unused-ignores src/_pytest/junitxml.py testing/test_junitxml.py -> no issues found
  • uvx --from codespell==2.4.2 codespell --toml=pyproject.toml AUTHORS changelog/14483.bugfix.rst src/_pytest/junitxml.py testing/test_junitxml.py
  • git diff --check origin/main...HEAD

Checklist

  • Add text like closes #XYZW to the PR description and/or commits.
  • If AI agents were used, they are credited in Co-authored-by commit trailers.
  • Create a new changelog file in the changelog directory.
  • Add yourself to AUTHORS in alphabetical order.

Co-authored-by: OpenAI Codex <codex@openai.com>
@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label May 17, 2026
@Sean-Kenneth-Doherty
Copy link
Copy Markdown
Author

Local/CI validation refresh from my side:\n\n- GitHub Actions matrix is green, including package, check, Read the Docs, pre-commit.ci, codecov/patch, and Python 3.10-3.14/PyPy lanes.\n- .venv/bin/python -m pytest testing/test_junitxml.py::test_invalid_xml_escape -q -> 1 passed\n- .venv/bin/python -m pytest testing/test_junitxml.py -q -> 135 passed, 2 skipped\n- git diff --check origin/main...HEAD -> clean

@RonnyPfannschmidt
Copy link
Copy Markdown
Member

closing as agentic abuse after ignoring the prior close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bin_xml_escape: supplementary plane characters (U+10000+) incorrectly escaped due to wrong unicode escape in regex

2 participants