Skip to content

Strip Unicode Cf characters in PrintableString#4593

Open
tnull wants to merge 2 commits intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi
Open

Strip Unicode Cf characters in PrintableString#4593
tnull wants to merge 2 commits intolightningdevkit:mainfrom
tnull:2026-05-printable-string-bidi

Conversation

@tnull
Copy link
Copy Markdown
Contributor

@tnull tnull commented May 5, 2026

PrintableString is the sanitiser LDK uses to render untrusted strings (node aliases, BOLT-12 invoice / offer text, UntrustedString, LSPS messages, lightning-invoice descriptions) to logs and UI. It only replaced char::is_control matches (Unicode general category Cc) with U+FFFD, leaving the entire Cf (Format) category untouched.

That is the exact category covering the bidirectional override / isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack family (CVE-2021-42574): a peer can set its alias / invoice description / offer fields to e.g. safe\u{202E}cipsxe.exe, which previously passed through verbatim while a human reader sees safeexe.cips — defeating the threat model PrintableString exists to defend against.

Replace Cf codepoints alongside Cc ones. The Cf ranges are inlined as a matches! table sourced from Unicode 16.0 to keep the change no_std-friendly with no new dependencies.

Co-Authored-By: HAL 9000

`PrintableString` is the sanitiser LDK uses to render untrusted strings
(node aliases, BOLT-12 invoice / offer text, `UntrustedString`, LSPS
messages, `lightning-invoice` descriptions) to logs and UI. It only
replaced `char::is_control` matches (Unicode general category `Cc`)
with U+FFFD, leaving the entire `Cf` (Format) category untouched.

That is the exact category covering the bidirectional override /
isolate codepoints (U+202A..U+202E, U+2066..U+2069) and zero-width
characters (U+200B..U+200D, U+FEFF) behind the "Trojan Source" attack
family (CVE-2021-42574): a peer can set its alias / invoice description
/ offer fields to e.g. `safe\u{202E}cipsxe.exe`, which previously
passed through verbatim while a human reader sees `safeexe.cips` —
defeating the threat model `PrintableString` exists to defend against.

Replace `Cf` codepoints alongside `Cc` ones. The `Cf` ranges are
inlined as a `matches!` table sourced from Unicode 16.0 to keep the
change `no_std`-friendly with no new dependencies.

Co-Authored-By: HAL 9000
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 5, 2026

👋 Thanks for assigning @valentinewallace as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Comment thread lightning-types/src/string.rs
@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 5, 2026

No issues found.

The previously flagged off-by-one in the Egyptian Hieroglyph Cf range (0x13430..=0x13440 -> 0x13430..=0x1343F) has been correctly fixed in the fixup commit. The test at line 107-110 validates this boundary. The Unicode 16.0 Cf table is complete, and the security-critical bidi override / zero-width characters are all covered.

Comment thread lightning-types/src/string.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.18%. Comparing base (1a26867) to head (907dd41).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
lightning-types/src/string.rs 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4593      +/-   ##
==========================================
- Coverage   86.84%   86.18%   -0.66%     
==========================================
  Files         161      156       -5     
  Lines      109260   108660     -600     
  Branches   109260   108660     -600     
==========================================
- Hits        94882    93646    -1236     
- Misses      11797    12405     +608     
- Partials     2581     2609      +28     
Flag Coverage Δ
fuzzing-fake-hashes ?
fuzzing-real-hashes ?
tests 86.18% <96.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tnull tnull requested a review from valentinewallace May 6, 2026 09:36
Copy link
Copy Markdown
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash and CI fix

}
}

// Codepoints in Unicode general category `Cf` (Format), per Unicode 16.0. These are not matched
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per some claude exploring of this patch, this should also be valid for Unicode 17.0. Any reason to not specify the latest here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants