-
Notifications
You must be signed in to change notification settings - Fork 23
Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ndard 14 fonts - Add getEncodingForStandard14() to select the correct encoding per font - Add isWinAnsiStandard14() to distinguish Symbol/ZapfDingbats - Extend CHAR_TO_GLYPH map with all WinAnsi non-ASCII characters (0x80-0x9F and 0xA0-0xFF ranges) fixing width measurement for accented text like é, ñ, ü, €, etc.
…d 14 fonts Three compounding bugs caused accented characters (á, é, ñ, ö, €, etc.) to render as mojibake with Standard 14 fonts like Helvetica: 1. Wrong text encoding: used PDFDocEncoding instead of WinAnsiEncoding 2. UTF-8 round-trip corruption: Operator.toString() → TextDecoder (UTF-8) destroyed non-ASCII bytes when re-encoded via TextEncoder 3. Missing /Encoding in font dict: viewers fell back to StandardEncoding Fix: - encodeTextForFont() now uses WinAnsiEncoding (or SymbolEncoding/ ZapfDingbatsEncoding) with hex-format PdfString output - Unencodable characters substitute with .notdef (byte 0x00) - appendOperators() uses Operator.toBytes() directly, bypassing the string intermediate that caused UTF-8 corruption - createContentStream/appendContent/prependContent accept string | Uint8Array for the broad bytes-first pipeline refactor - addFontResource() adds /Encoding WinAnsiEncoding for Helvetica/ Times/Courier families (omitted for Symbol/ZapfDingbats per spec) - ContentAppender type updated to string | Uint8Array
…14 fonts 29 tests covering: - Font encoding selection (WinAnsi vs Symbol vs ZapfDingbats) - Glyph name mapping for accented/non-ASCII characters - Width measurement correctness for accented text - Font dict /Encoding verification - Hex string encoding in content streams - Unencodable character .notdef substitution - Round-trip PDF generation with all font families - Bytes pipeline backward compatibility (shapes, paths, images)
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes corrupted rendering of accented and Latin-1 characters (á, é, ñ, €, curly quotes, em dashes, etc.) when using Standard 14 fonts like Helvetica, Times-Roman, and Courier. The fix addresses four compounding bugs: wrong text encoding (PDFDocEncoding instead of WinAnsiEncoding), UTF-8 round-trip corruption through the content stream pipeline, missing /Encoding entries in font dictionaries, and incorrect width measurements for accented characters.
Changes:
- Implements proper font encoding selection (WinAnsi for Helvetica/Times/Courier, Symbol/ZapfDingbats for those respective fonts)
- Refactors content stream pipeline to work with
Uint8Arraythroughout, eliminating UTF-8 corruption - Adds
/Encoding /WinAnsiEncodingto Standard 14 font dictionaries (except Symbol/ZapfDingbats) - Extends
CHAR_TO_GLYPHmap to cover all WinAnsi non-ASCII characters for correct width measurement - Uses hex-format PdfStrings for Standard 14 text as defense-in-depth against encoding transformations
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/fonts/standard-14.ts |
Adds encoding helper functions (getEncodingForStandard14, isWinAnsiStandard14) and extends CHAR_TO_GLYPH map with ~95 WinAnsi non-ASCII entries |
src/api/pdf-page.ts |
Refactors encodeTextForFont() to use proper font encodings, updates appendContent/prependContent/createContentStream to accept bytes, implements bytes-first appendOperators(), adds /Encoding to font dicts |
src/api/drawing/path-builder.ts |
Updates ContentAppender type to accept string | Uint8Array for backward-compatible bytes support |
src/api/drawing/latin1-encoding.test.ts |
Adds comprehensive test coverage with 29 tests covering encoding selection, glyph mapping, font dictionary structure, content stream encoding, and round-trip rendering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Fixes corrupted rendering of accented/Latin-1 characters (á, é, ñ, ö, €, curly quotes, em dashes, etc.) when using Standard 14 fonts like Helvetica, Times-Roman, and Courier.
Problem
Three compounding bugs caused non-ASCII characters to render as mojibake:
encodeTextForFont()used PDFDocEncoding (a metadata encoding) instead of WinAnsiEncoding, producing incorrect bytes in the 0x80–0x9F range (€, curly quotes, em dash, etc.)Operator.toString()→TextDecoder(UTF-8) →TextEncoder(UTF-8), destroying any non-ASCII byte (e.g., 0xE9 forébecameU+FFFD)/Encodingin font dict — Without an explicit/Encoding WinAnsiEncodingentry, PDF viewers fell back to the font's built-in StandardEncoding, mapping bytes to wrong glyphsgetGlyphName()only mapped ASCII, returning"space"for all accented characters, breaking text layout and measurementSolution
encodeTextForFont()now usesWinAnsiEncodingfor Helvetica/Times/Courier,SymbolEncodingfor Symbol, andZapfDingbatsEncodingfor ZapfDingbats. Unencodable characters (CJK, emoji) substitute with.notdef(byte 0x00).<636166E9>) — pure ASCII that's immune to any encoding transformation. Matches pdf-lib's approach.appendOperators()usesOperator.toBytes()directly.createContentStream/appendContent/prependContentacceptstring | Uint8Array, eliminating the UTF-8 round-trip./Encoding: Standard 14 font dicts now include/Encoding /WinAnsiEncoding(omitted for Symbol/ZapfDingbats per PDF spec Table 5.15).CHAR_TO_GLYPHnow covers all WinAnsi non-ASCII characters (~95 entries), fixing width measurement for accented text.Changes
src/fonts/standard-14.tssrc/api/pdf-page.ts/Encodingsrc/api/drawing/path-builder.tsContentAppendertype acceptsstring | Uint8Arraysrc/api/drawing/latin1-encoding.test.tsTest coverage
/Encodingpresence/absence verification<E9>, not UTF-8C3A1).notdefsubstitution