Skip to content

Comments

Fix: escaping underscore \_#1404

Merged
jbeder merged 2 commits intojbeder:masterfrom
SGSSGene:fix/underscore_escape
Feb 17, 2026
Merged

Fix: escaping underscore \_#1404
jbeder merged 2 commits intojbeder:masterfrom
SGSSGene:fix/underscore_escape

Conversation

@SGSSGene
Copy link
Contributor

@SGSSGene SGSSGene commented Feb 17, 2026

Fixes PR #1347.

Parsing a scalar type (string) with the content \_ should result in a string with the unicode codepoint 0x00A0 representing a Non-Break Space (NBSP). This results in a utf-8 encoding of two bytes 0xC2 0xA0. See Example 5.13 in the yaml specification (https://yaml.org/spec/1.2.2/#56-miscellaneous-characters).

The test are faulty implemented and check only for a 0xA0 code point. Similarly, the parser converts it only to a single 0xA0 byte.

There are two commits:

  1. fixes the unit test (unit test will fail)
  2. fixes the parser

This fixes the example "Example 5.13 Escaped Characters" from the YAML
specification.

The example demands that `\_` is being translated to unicode `\u00a0`.
The unicode codepoint `\u00A0` encoded for utf-8 in hex is `\xC2\xA0`.

Fixing this test case will cause the unit test to not pass, since the
codepoint `\u00A0` is not handled correctly.
(Failing unittest is expected).
@SGSSGene SGSSGene force-pushed the fix/underscore_escape branch from b0aa6c9 to ab33fb5 Compare February 17, 2026 21:31
@jbeder jbeder merged commit 44d5454 into jbeder:master Feb 17, 2026
31 checks passed
@SGSSGene SGSSGene deleted the fix/underscore_escape branch February 17, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants