Skip to content

Releases: mini-software/MiniPdf

v0.9.0 — Multi-Font Unicode & Horizontal Scaling

05 Mar 10:48

Choose a tag to compare

v0.9.0 — Multi-Font Unicode & Horizontal Scaling

Highlights

This release introduces multi-font embedding for full Unicode coverage and horizontal text scaling to prevent column overflow, significantly expanding support for multilingual, emoji, and symbol-heavy spreadsheets.

New Features

Multi-Font Embedding Engine

  • Replaced the single hardcoded Arial CID font with a dynamic multi-font system that discovers and embeds multiple system fonts at runtime
  • Cross-platform font discovery: Windows (YaHei, JhengHei, Malgun Gothic, Segoe UI, Segoe UI Emoji, Segoe UI Symbol), macOS (PingFang, Apple SD Gothic Neo, Apple Color Emoji), Linux (Noto Sans CJK, Noto Color Emoji, WenQuanYi)
  • Characters are automatically split into runs by font slot — e.g. CJK in F2, Korean in F3, emoji in F4 — with proper Td advances within the same BT/ET block
  • Full TrueType/TTC font parsing: cmap format 4 & 12, hmtx glyph widths, head/OS2/hhea metrics, glyf table subsetting
  • CIDToGIDMap streams for correct glyph mapping with ZLib compression
  • ToUnicode CMap with UTF-16 surrogate pair support for non-BMP code points (emoji, CJK Ext-B)
  • Font subsetting: zeros out unused glyph outlines to reduce embedded font size
  • Glyph outline validation (HasGlyphOutline) to detect placeholder/empty glyphs, enabling proper font fallback
  • Emoji range detection (IsEmojiRange) to prefer dedicated emoji fonts over CJK fonts with placeholder glyphs

Arabic Text Shaping

  • Built-in Arabic Presentation Forms-B shaping engine with contextual form selection (isolated, initial, medial, final)
  • Arabic joining type analysis (Non-Joining, Right-Joining, Dual-Joining, Join-Causing, Transparent)
  • Lam-Alef ligature handling (ﻻ ﻵ ﻷ ﻹ)

Horizontal Text Scaling (Tz Operator)

  • Added MaxWidth property to PdfTextBlock for per-cell width constraints
  • When text exceeds the column width, the PDF Tz (horizontal scaling) operator compresses text to fit — keeping all characters intact for text extraction while preventing visual overflow
  • Helvetica width table (MeasureTextWidth) with standard character widths in 1/1000 em units
  • Applied to both WinAnsi (F1) and Unicode (Fn) text rendering paths

Improvements

  • Adjusted default margins: left/right 50pt → 54pt, column padding 2pt → 3pt for better visual balance
  • Fill rectangles no longer include extra columnPadding width, matching LibreOffice cell boundary rendering more closely
  • Proper Unicode code point enumeration with surrogate pair handling

Benchmark

  • 16 new multilingual & emoji test cases (classic151–classic166):
    • Multilingual greetings, emoji sampler, currency symbols, math symbols, diacritical marks, RTL/BiDi text, CJK extended, emoji skin tones, ZWJ emoji sequences, punctuation marks, box drawing, CJK+emoji styled, Cyrillic alphabets, Indic scripts, Southeast Asian scripts, emoji progress
  • 180 total test cases, average overall score: 0.9652

Files Changed

  • PdfWriter.cs — +676 lines: multi-font engine, Arabic shaping, TrueType parsing, font subsetting
  • ExcelToPdfConverter.cs — horizontal scaling integration, margin adjustments
  • PdfTextBlock.cs — MaxWidth property
  • PdfPage.cs — maxWidth parameter on AddText

Full Changelog: v0.8.0...v0.9.0

v0.8.0

05 Mar 01:56

Choose a tag to compare

v0.8.0

Highlights

This release improves Excel-to-PDF fidelity with better text measurement, vertical alignment support, merged-cell rendering, and clipping support — raising the benchmark average score to 0.9712 across 150 test cases.

What's Changed

Rendering Improvements

  • Vertical alignment support — cells with top, center, or bottom vertical alignment in Excel are now rendered with the correct Y-position in PDF, matching LibreOffice behavior
  • Merged cell rendering — fill rectangles, borders, and text alignment (right/center) now correctly span the full merged column range instead of being limited to the first column
  • Calibri-scaled text measurement — introduced a dedicated CalibriFittingScale constant (0.86) and MeasureScaledWidth() method so that column-fit checks and text truncation use a consistent Calibri-to-Helvetica scale factor
  • Cell-level font size for fitting — text truncation and numeric reformatting now use each cell's actual font size instead of the global default, fixing clipping for cells with non-default sizes
  • Auto-expand row height for large fonts — rows containing cells with font sizes larger than the default now auto-grow (≈1.3× font size) to prevent text overlap
  • Column padding tightened — default ColumnPadding reduced from 4pt to 2pt for more compact table rendering that better matches LibreOffice output

Text & Number Formatting

  • Boolean cell alignment — cells with type "b" (boolean) now default to center alignment under the General format, matching Excel/LibreOffice behavior
  • Negative number parenthesis handling — negative number formatting no longer incorrectly prepends a minus sign when the format uses parentheses ( for negative display
  • Numeric reformat for all cellsFitNumericText() is now applied to all numeric cells (not just clipped ones), matching LibreOffice's General format auto-shrink behavior

PDF Engine

  • Clipping rectangle supportPdfTextBlock and PdfPage.AddText() now accept an optional clipRect parameter; PdfWriter emits PDF q/Q graphics state save/restore with a clipping path (re W n), so text is visually clipped but remains fully extractable

Chart Rendering

  • Legend text removed — legend labels in bar/column and pie charts are no longer emitted as extractable PDF text, matching LibreOffice's behavior of rendering legends as vector graphics only

Benchmark & Tooling

  • Added comprehensive benchmark analysis scripts for PDF comparison (compare_pdfs.py, analysis helpers)
  • 150 test cases with an average overall score of 0.9712

Stats

  • 5 commits since v0.7.0
  • 5 source files changed: ExcelReader.cs, ExcelToPdfConverter.cs, PdfPage.cs, PdfTextBlock.cs, PdfWriter.cs
  • +171 / −51 lines in source code

Full Changelog: v0.7.0...v0.8.0

v0.7.1 — Rich Cell Styling, Combo Charts & Number Formatting Overhaul

04 Mar 15:05

Choose a tag to compare

v0.7.0 — Rich Cell Styling, Combo Charts & Number Formatting Overhaul

Highlights

This release brings substantial fidelity improvements to the Excel-to-PDF conversion engine, adding support for cell borders, font sizes, bold/italic text, horizontal alignment, explicit row heights, combo (overlay) charts, and a comprehensive rewrite of date/time and number formatting to match LibreOffice output.

New Features

  • Cell border rendering — Read and draw left/right/top/bottom borders from .xlsx styles, with support for thin, medium, and thick stroke widths and per-side colors.
  • Font style support — Extract font size, bold, and italic properties from the Excel font table; render text at the per-cell font size instead of a fixed global size.
  • Horizontal text alignment — Parse alignment from cellXf styles; support left, center, and right alignment with automatic "general" resolution (numbers right-align, text left-aligns).
  • Explicit row heights — Read per-row ht attributes and defaultRowHeight from sheetFormatPr; use them in page layout instead of computing row height solely from font size.
  • Combo chart (overlay series) — Detect secondary chart types in the plot area (e.g., a line series overlaid on a bar chart); render overlay lines with markers and merge all series into the legend.
  • Non-solid fill patterns — Approximate darkGray, mediumGray, lightGray, gray125, gray0625, and various hatching patterns as tint-blended solid fills.

Improvements

  • Date/time format engine — Full ConvertExcelDateFormat() converter that maps Excel format codes (d, dd, m, mm, yy, yyyy, h, hh, s, ss, AM/PM, A/P) to .NET DateTime format strings with correct month-vs-minute disambiguation. Built-in format IDs 14–22 now each produce the correct date/time style.
  • Number format improvements
    • Zero-padding for integer formats (e.g., "0000"0042).
    • Correct negative sign placement for currency formats (e.g., -$180,000.00 instead of $-180,000.00).
    • Multi-section format handling (positive;negative;zero) with proper sign tracking.
  • General numeric displayFormatGeneral() now uses scientific notation for integers ≥ 1e10 (matching LibreOffice) and rounds near-integer values to shorter representations.
  • Chart title centering — Chart titles are now horizontally centered over the plot area.
  • Column width logic — Moved numeric text fitting (FitNumericText) inside the clipping branch so it only applies when content actually needs clipping; single-column sheets still expand to page width.
  • Default font size changed from 10pt to 11pt to match Excel's default.

Changed Files

File Change
ExcelReader.cs +679 lines — font styles, borders, row heights, overlay charts, date/number formatting
ExcelToPdfConverter.cs +168 lines — border drawing, alignment rendering, row height layout, combo charts
README.md Updated benchmark score images

New Internal Types

  • FontStyleInfo(Color, Size, Bold, Italic) — font metadata record
  • BorderSide(Style, Color) — single border edge
  • CellBorderInfo(Left, Right, Top, Bottom) — four-side cell border
  • ExcelCell extended with Alignment, FontSize, Bold, Italic, Border
  • ExcelSheet extended with RowHeights, DefaultRowHeight
  • ExcelChartInfo extended with OverlaySeries, OverlayChartType

Full Changelog: v0.6.0...v0.7.0

v0.6.0

04 Mar 05:38

Choose a tag to compare

v0.6.0

Highlights

Benchmark quality score improved from 94.7% → 96.5% across 120 test cases. Excellent-tier results rose from 97 → 108, while Needs-Improvement cases dropped from 2 → 1.

New Features

Excel Reader Enhancements

  • Number format support — reads numFmt entries from styles.xml and applies formatting (currency, percentage, date, scientific, custom format codes) to numeric cell values
  • Cell fill / background color — parses solid patternFill colors from styles.xml and renders them as colored rectangles behind cell text
  • Merged cells — reads <mergeCells> regions; text width and clipping now respect the merged column span

Chart Rendering

  • Scatter / Bubble chart — new dedicated RenderScatterChart with numeric X and Y axes
  • Radar chart — rendered via RenderLineChart with spoke labels around the center
  • Horizontal bar chart — new RenderHorizontalBarChart (categories on Y-axis)
  • Stacked & percent-stacked support for bar and area charts, including reversed legend order to match Excel's bottom-to-top stacking
  • Pie / Doughnut legends — vertical category legend with color swatches below the chart
  • Data label percentages on pie/doughnut charts (showPercent)
  • Axis value formatting using the chart's numFmt formatCode (e.g. #,##0)
  • Chart title clipping — long titles are now truncated to fit the chart width

PDF Writer

  • Unified CJK / Unicode rendering — when a text block contains any non-WinAnsi character, the entire block is now rendered in the CID font (F2), eliminating the ~3 pt Y-offset between Type1 and CIDFontType2 that caused PyMuPDF to split spans across lines
  • Improved font metrics — updated FontBBox, Ascent, Descent, CapHeight to more accurate values
  • Full-width character detection (IsFullWidthCharPdf) for correct CJK/Hangul width calculation

Text Layout & Clipping

  • FittingChars / FitNumericText — pixel-accurate character fitting replaces the old maxChars integer estimate, improving truncation fidelity across varying font sizes
  • Single-column overflow — clips text at the page right edge and calculates virtual row height from wrapping at the default column width (matches LibreOffice behavior)
  • Multi-page row splitting — rows taller than the usable page height are now split across pages line-by-line

Defaults Changed

Setting Old New
MarginTop / MarginBottom 50 pt 72 pt (1 inch)
LineSpacing 1.6 1.5

Project Reorganization

  • Moved translated README files to documents folder
  • Moved utility scripts (Run-Benchmark.ps1, analyze_overflow.py, etc.) to scripts
  • Removed obsolete .github_issue_body.md and root-level PdfPage.cs
  • Added 40+ benchmark analysis & debugging Python scripts

Full Changelog

v0.5.0...v0.6.0

v0.5.0

03 Mar 08:39

Choose a tag to compare

v0.5.0

Highlights

Excel Chart Rendering — MiniPdf now parses and renders charts embedded in .xlsx files directly to PDF, covering bar, line, area, pie, doughnut, scatter, radar, stacked, combo, stock (OHLC), bubble, and 3D chart types. Charts are rendered as native PDF vector primitives (rectangles + lines), producing lightweight, searchable output without rasterization.

New Features

  • Chart-to-PDF conversion — Automatically detects and renders c:chartSpace elements from .xlsx DrawingML, including:
    • Bar / Column (grouped, stacked, percent-stacked, 3D)
    • Line (with markers, multi-series)
    • Area (stacked, percent-stacked)
    • Pie / Doughnut (with slice labels)
    • Scatter (with trendlines) / Bubble / Radar
    • Combo (bar + line) / Stock OHLC
    • Chart sheet support
  • PdfPage.AddRectangle() — Draw filled rectangles with custom PdfColor
  • PdfPage.AddLine() — Draw line segments with custom color and stroke width
  • New PDF primitivesPdfRectBlock and PdfLineBlock records for rectangle and line rendering in the PDF content stream
  • Chart data resolutionExcelReader resolves numRef, strRef, and numLit references to extract series data, categories, and axis titles from worksheet cells
  • Nice axis scaling — Auto-calculated round-number axis labels with gridlines
  • Chart legend rendering — Multi-series charts display color-coded legends

Improvements

  • Text clipping at column boundaries — Cell text is now truncated when the adjacent cell contains content, matching LibreOffice behavior (previously text always overflowed)
  • Explicit newline handling — Cells with Alt+Enter line breaks (\n) are rendered as multi-line text
  • Chart overflow pages — Right-anchored charts produce overflow pages to match LibreOffice page count
  • README updated — Benchmark expanded from 90 → 120 test cases (30 new chart cases: classic91–classic120)

Benchmark

Category Count Threshold
Excellent 97 ≥ 90%
Acceptable 21 70%–90%
Needs Improvement 2 < 70%

Average overall score: 94.7% (text 40% + visual 40% + page count 20%)

Stats

  • 2 commits, 259 files changed, +6,798 / −1,266 lines
  • 30 new chart test cases with reference PDFs and visual comparison images

Full Changelog: v0.4.1...v0.5.0

v0.4.1

03 Mar 01:36

Choose a tag to compare

v0.4.0 — March 3, 2026

Breaking Changes

  • Namespace renamed from MiniPdf to MiniSoftware.
    Update all using MiniPdf; statements to using MiniSoftware;.

New Features

  • Unicode / CJK text renderingPdfWriter now detects non-Latin-1 characters and automatically embeds a composite Identity-H CIDFont (Type0/CIDFontType2) with a ToUnicode CMap, enabling correct rendering of CJK and other Unicode text without manual font setup.

  • Multi-framework targeting — The library now targets net6.0, net8.0, and net9.0 (previously only net9.0), ensuring broad compatibility across .NET LTS releases.

Improvements

  • Benchmark AI visual-comparison threshold raised from 0.90 → 0.97 for more accurate pass/fail decisions.
  • compare_pdfs.py gains --ai-compare, --ai-max-pages, and --ai-threshold CLI flags for fine-grained control over AI visual comparison.
  • Added check_clip.cs test to validate character-width calculations used for text clipping.

Internal / Tooling

  • Added helper scripts: analyze_overflow.py, analyze_xlsx.py, check_reference.py, fix_code.py.
  • Added update_readme_images.py to automate the Visual Comparison table in all README variants.
  • Updated benchmark report images and comparison JSON/Markdown reports.

What's Changed

Commit Summary
61b5de2 refactor: update namespace from MiniPdf to MiniSoftware across all files
7e94618 fix: update TargetFrameworks to support net6.0, net8.0, net9.0
586f265 Enhance PDF comparison step with AI options
234f273 Update AI threshold in benchmark; add check_clip test for character width calculations
6635d0b Add script to update Visual Comparison table in README.md

Full Changelog: v0.3.0...v0.4.0

v0.3.1

02 Mar 07:29

Choose a tag to compare

⚠️ Breaking Change — Namespace Renamed

The public namespace has been renamed from MiniPdf to MiniSoftware to align with the broader MiniSoftware package family.

Before:

using MiniPdf;

After:

using MiniSoftware;

Update all using directives in your project after upgrading.


📖 Documentation

  • Added Simplified Chinese README (README.zh-CN.md)
  • Added Traditional Chinese README (README.zh-TW.md)
  • Added language selector links across all README localizations (EN / 简体中文 / 繁體中文 / 日本語 / 한국어 / Italiano / Français)

Commits

Hash Message
61b5de2 refactor: update namespace from MiniPdf to MiniSoftware across all files
7037163 docs: add Simplified Chinese README
900388c feat: add Traditional Chinese README and language selector (#47)

Full Changelog: v0.3.0...v0.3.1

v0.3.0

02 Mar 05:59

Choose a tag to compare

v0.3.0

What's New

Image Support in Excel-to-PDF Conversion

  • Added embedded JPEG and PNG image rendering in PDF pages, including correct dimension handling and placement via PDF XObjects.
  • ExcelReader now extracts images from .xlsx files even when they are positioned outside the data area.
  • ExcelToPdfConverter ensures images are rendered correctly on the final PDF page.
  • PdfPage gained new image-embedding API surface; PdfWriter updated to emit image XObject streams.

Benchmark Improvements

  • Text extraction and comparison logic in compare_pdfs.py improved to account for structural differences and produce more accurate similarity scores.
  • Introduced content-coverage penalties in pixel comparison to better reflect visual differences between rendered PDFs.
  • Added 10 new image-heavy reference test cases (classic61–71) covering product cards, company logos, employee directories, inventory photos, invoices, real estate listings, restaurant menus, and multi-sheet layouts.

License Change

  • Switched from MIT to Apache-2.0 license.
  • LICENSE file is now bundled in the NuGet package.

Internationalization

  • Added READMEs in Traditional Chinese (README.zh-TW.md), Simplified Chinese (README.zh-CN.md), Japanese (README.ja.md), Korean (README.ko.md), French (README.fr.md), and Italian (README.it.md).
  • Updated English README.md with language selector and refreshed content.

Commits

  • 4ad587b — Enhance PDF rendering capabilities (image support + benchmark improvements)
  • 7037163 — Add Simplified Chinese README
  • 900388c — Add Traditional Chinese README and language selector (#47)

Full Changelog: v0.2.0...v0.3.0

v0.2.0

02 Mar 01:22

Choose a tag to compare

v0.2.0

What's New

New MiniPdf Simplified API (#10)

A new static entry point MiniPdf class provides three clean overloads for Excel → PDF conversion:

// File → File
MiniPdf.ConvertToPdf("input.xlsx", "output.pdf");

// File → byte[]
byte[] pdf = MiniPdf.ConvertToPdf("input.xlsx");

// Stream → byte[]
byte[] pdf = MiniPdf.ConvertToPdf(stream);

ExcelReader — Dedicated Excel parsing layer

Extracted Excel reading logic into a standalone ExcelReader class, improving separation of concerns.

ExcelToPdfConverter Refactored

Core conversion pipeline refactored for clarity and maintainability, in preparation for upcoming feature work.

Tests

  • Added ClassicExcelToPdfTests with 30 test scenarios covering:
    • Basic tables with headers
    • Multiple worksheets
    • Empty workbooks / single cells
    • Wide/tall tables
    • Numbers, decimals, negative numbers, percentages, currencies
    • Long text, Unicode text, special XML characters
    • Date strings, formula results, duplicate values
    • Sparse rows/columns, header-only sheets
    • Mixed empty and filled sheets

Benchmark Suite

  • New Python benchmark scripts (compare_pdfs.py, generate_reference_pdfs.py, run_benchmark.py)
  • 30 reference PDFs for regression testing
  • HTML/JSON/Markdown comparison reports with visual page diff images
  • Run-Benchmark.ps1 PowerShell helper script

Bug Fixes

  • Fixed PackageVersion missing property causing MSB4044 error on .NET 10 (e4aa97b)

Full Changelog

v0.1.0...v0.2.0: v0.1.0...v0.2.0

v0.1.0

22 Feb 17:14
f5d78f6

Choose a tag to compare

Merge pull request #41 from openaitx-system/feature/nuget-package

feat: add NuGet package support