Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9

Mythie · 2026-01-28T04:40:49Z

Summary

Fixes corrupted rendering of accented/Latin-1 characters (á, é, ñ, ö, €, curly quotes, em dashes, etc.) when using Standard 14 fonts like Helvetica, Times-Roman, and Courier.

Problem

Three compounding bugs caused non-ASCII characters to render as mojibake:

Wrong text encoding — encodeTextForFont() used PDFDocEncoding (a metadata encoding) instead of WinAnsiEncoding, producing incorrect bytes in the 0x80–0x9F range (€, curly quotes, em dash, etc.)
UTF-8 round-trip corruption — The pipeline shuttled content stream bytes through Operator.toString() → TextDecoder (UTF-8) → TextEncoder (UTF-8), destroying any non-ASCII byte (e.g., 0xE9 for é became U+FFFD)
Missing /Encoding in font dict — Without an explicit /Encoding WinAnsiEncoding entry, PDF viewers fell back to the font's built-in StandardEncoding, mapping bytes to wrong glyphs
Wrong width measurement — getGlyphName() only mapped ASCII, returning "space" for all accented characters, breaking text layout and measurement

Solution

Use proper font encoding: encodeTextForFont() now uses WinAnsiEncoding for Helvetica/Times/Courier, SymbolEncoding for Symbol, and ZapfDingbatsEncoding for ZapfDingbats. Unencodable characters (CJK, emoji) substitute with .notdef (byte 0x00).
Hex-format PdfString: Standard 14 text is encoded as hex strings (<636166E9>) — pure ASCII that's immune to any encoding transformation. Matches pdf-lib's approach.
Bytes-first pipeline: appendOperators() uses Operator.toBytes() directly. createContentStream/appendContent/prependContent accept string | Uint8Array, eliminating the UTF-8 round-trip.
Font dict /Encoding: Standard 14 font dicts now include /Encoding /WinAnsiEncoding (omitted for Symbol/ZapfDingbats per PDF spec Table 5.15).
Extended glyph map: CHAR_TO_GLYPH now covers all WinAnsi non-ASCII characters (~95 entries), fixing width measurement for accented text.

Changes

File	Change
`src/fonts/standard-14.ts`	Encoding helpers, extended CHAR_TO_GLYPH map
`src/api/pdf-page.ts`	Encoding fix, bytes pipeline, font dict `/Encoding`
`src/api/drawing/path-builder.ts`	`ContentAppender` type accepts `string \| Uint8Array`
`src/api/drawing/latin1-encoding.test.ts`	29 new tests

Test coverage

Font encoding selection (WinAnsi vs Symbol vs ZapfDingbats)
Glyph name mapping and width measurement for accented characters
Font dict /Encoding presence/absence verification
Hex string encoding in content streams (é → <E9>, not UTF-8 C3A1)
Unencodable character .notdef substitution
Round-trip PDF generation with all Standard 14 font families
Backward compatibility (shapes, paths, images still work)

…ndard 14 fonts - Add getEncodingForStandard14() to select the correct encoding per font - Add isWinAnsiStandard14() to distinguish Symbol/ZapfDingbats - Extend CHAR_TO_GLYPH map with all WinAnsi non-ASCII characters (0x80-0x9F and 0xA0-0xFF ranges) fixing width measurement for accented text like é, ñ, ü, €, etc.

…d 14 fonts Three compounding bugs caused accented characters (á, é, ñ, ö, €, etc.) to render as mojibake with Standard 14 fonts like Helvetica: 1. Wrong text encoding: used PDFDocEncoding instead of WinAnsiEncoding 2. UTF-8 round-trip corruption: Operator.toString() → TextDecoder (UTF-8) destroyed non-ASCII bytes when re-encoded via TextEncoder 3. Missing /Encoding in font dict: viewers fell back to StandardEncoding Fix: - encodeTextForFont() now uses WinAnsiEncoding (or SymbolEncoding/ ZapfDingbatsEncoding) with hex-format PdfString output - Unencodable characters substitute with .notdef (byte 0x00) - appendOperators() uses Operator.toBytes() directly, bypassing the string intermediate that caused UTF-8 corruption - createContentStream/appendContent/prependContent accept string | Uint8Array for the broad bytes-first pipeline refactor - addFontResource() adds /Encoding WinAnsiEncoding for Helvetica/ Times/Courier families (omitted for Symbol/ZapfDingbats per spec) - ContentAppender type updated to string | Uint8Array

…14 fonts 29 tests covering: - Font encoding selection (WinAnsi vs Symbol vs ZapfDingbats) - Glyph name mapping for accented/non-ASCII characters - Width measurement correctness for accented text - Font dict /Encoding verification - Hex string encoding in content streams - Unencodable character .notdef substitution - Round-trip PDF generation with all font families - Bytes pipeline backward compatibility (shapes, paths, images)

vercel · 2026-01-28T04:40:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
core	Ready	Preview, Comment	Jan 28, 2026 4:54am

Copilot

Pull request overview

This PR fixes corrupted rendering of accented and Latin-1 characters (á, é, ñ, €, curly quotes, em dashes, etc.) when using Standard 14 fonts like Helvetica, Times-Roman, and Courier. The fix addresses four compounding bugs: wrong text encoding (PDFDocEncoding instead of WinAnsiEncoding), UTF-8 round-trip corruption through the content stream pipeline, missing /Encoding entries in font dictionaries, and incorrect width measurements for accented characters.

Changes:

Implements proper font encoding selection (WinAnsi for Helvetica/Times/Courier, Symbol/ZapfDingbats for those respective fonts)
Refactors content stream pipeline to work with Uint8Array throughout, eliminating UTF-8 corruption
Adds /Encoding /WinAnsiEncoding to Standard 14 font dictionaries (except Symbol/ZapfDingbats)
Extends CHAR_TO_GLYPH map to cover all WinAnsi non-ASCII characters for correct width measurement
Uses hex-format PdfStrings for Standard 14 text as defense-in-depth against encoding transformations

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

File	Description
`src/fonts/standard-14.ts`	Adds encoding helper functions (`getEncodingForStandard14`, `isWinAnsiStandard14`) and extends `CHAR_TO_GLYPH` map with ~95 WinAnsi non-ASCII entries
`src/api/pdf-page.ts`	Refactors `encodeTextForFont()` to use proper font encodings, updates `appendContent`/`prependContent`/`createContentStream` to accept bytes, implements bytes-first `appendOperators()`, adds `/Encoding` to font dicts
`src/api/drawing/path-builder.ts`	Updates `ContentAppender` type to accept `string \| Uint8Array` for backward-compatible bytes support
`src/api/drawing/latin1-encoding.test.ts`	Adds comprehensive test coverage with 29 tests covering encoding selection, glyph mapping, font dictionary structure, content stream encoding, and round-trip rendering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/api/drawing/latin1-encoding.test.ts

Mythie added 3 commits January 28, 2026 15:29

Mythie requested a review from Copilot January 28, 2026 04:41

Copilot started reviewing on behalf of Mythie January 28, 2026 04:41 View session

dguyen approved these changes Jan 28, 2026

View reviewed changes

vercel bot deployed to Preview January 28, 2026 04:42 View deployment

Copilot AI reviewed Jan 28, 2026

View reviewed changes

src/api/drawing/latin1-encoding.test.ts Outdated Show resolved Hide resolved

fix: update test

e703209

vercel bot deployed to Preview January 28, 2026 04:54 View deployment

Mythie mentioned this pull request Jan 28, 2026

Standard 14 fonts corrupt accented characters (á, é, ñ) due to UTF-8 encoding #7

Closed

Mythie merged commit 13bd3f4 into main Jan 28, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9

Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9

Uh oh!

Mythie commented Jan 28, 2026

Uh oh!

vercel bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9

Fix accented character rendering (á, é, ñ, €, etc.) with Standard 14 fonts #9

Uh oh!

Conversation

Mythie commented Jan 28, 2026

Summary

Problem

Solution

Changes

Test coverage

Uh oh!

vercel bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel bot commented Jan 28, 2026 •

edited

Loading