Benchmark

Benchmark — Self-Evolution Pipeline

MiniPdf uses a self-evolution benchmark pipeline to measure output quality against LibreOffice as the reference renderer across 150 classic Excel scenarios.

Pipeline Architecture

generate_classic_xlsx.py  →  120 .xlsx files
        │
        ├──► dotnet (MiniPdf)    →  ./pdf_output/classic*.pdf
        └──► LibreOffice          →  ./reference_pdfs/classic*.pdf
                            │
                        compare_pdfs.py
                            │
                        reports/comparison_report.{md,json,html}

Scoring Formula

Component	Weight	Method
Text similarity	40%	`difflib.SequenceMatcher` on per-page extracted text
Visual similarity	40%	Pixel-level diff via PyMuPDF rendering (150 DPI)
Page count match	20%	`1.0` if equal, `min/max` ratio otherwise

Overall score = text × 0.4 + visual × 0.4 + pages × 0.2

Running the Benchmark

Prerequisites

pip install openpyxl pymupdf
# LibreOffice installed (soffice on PATH)
# .NET 9 SDK

Full Pipeline

# Windows (UTF-8 output)
$env:PYTHONIOENCODING="utf-8"
python tests/MiniPdf.Benchmark/run_benchmark.py

Partial Runs

# Skip Excel generation (xlsx already exist)
python tests/MiniPdf.Benchmark/run_benchmark.py --skip-generate

# Skip both generation and LibreOffice reference (only re-run MiniPdf + compare)
python tests/MiniPdf.Benchmark/run_benchmark.py --skip-generate --skip-reference

# Only compare (assumes all PDFs already exist)
python tests/MiniPdf.Benchmark/run_benchmark.py --compare-only

The report is written to tests/MiniPdf.Benchmark/reports/:

comparison_report.md — human-readable Markdown with diffs
comparison_report.json — machine-readable scores
comparison_report.html — visual side-by-side page images
images/ — PNG renders of each PDF page (minipdf + reference)

Current Results (2026-03-05)

Average overall score: 97.1% across 150 test cases (100 Excellent, 22 Good, 16 Acceptable, 9 Fair, 3 Needs Improvement).

Test cases classic61–classic90 include image scenarios, classic91–classic120 include chart rendering scenarios, and classic121–classic150 include style/border/fill scenarios.

#	Test Case	Text	Visual	Pages	Overall
1	classic01 basic table with headers	1.000	0.997	1/1	🟢 99.9%
2	classic02 multiple worksheets	0.997	0.998	3/3	🟢 99.8%
3	classic03 empty workbook	1.000	1.000	1/1	🟢 100.0%
4	classic04 single cell	1.000	1.000	1/1	🟢 100.0%
5	classic05 wide table (A–Z)	1.000	0.994	3/3	🟢 99.7%
6	classic06 tall table (200 rows)	1.000	0.945	5/5	🟡 97.8%
7	classic07 numbers only	1.000	0.999	1/1	🟢 100.0%
8	classic08 mixed text and numbers	1.000	0.998	1/1	🟢 99.9%
9	classic09 long text	0.675	0.577	7/12	🔴 60.1%
10	classic10 special XML characters	1.000	0.997	1/1	🟢 99.9%
11	classic11 sparse rows	1.000	0.999	2/2	🟢 100.0%
12	classic12 sparse columns	1.000	0.998	1/1	🟢 99.9%
13	classic13 date strings	0.974	0.996	1/1	🟡 98.8%
14	classic14 decimal numbers	1.000	0.997	1/1	🟢 99.9%
15	classic15 negative numbers	1.000	0.997	1/1	🟢 99.9%
16	classic16 percentage strings	0.994	0.997	1/1	🟢 99.6%
17	classic17 currency strings	1.000	0.996	1/1	🟢 99.9%
18	classic18 large dataset (1000×10)	1.000	0.893	24/24	🟡 95.7%
19	classic19 single column list	1.000	0.997	1/1	🟢 99.9%
20	classic20 all empty cells	1.000	1.000	1/1	🟢 100.0%
21	classic21 header only	1.000	0.999	1/1	🟢 100.0%
22	classic22 long sheet name	1.000	0.999	1/1	🟢 99.9%
23	classic23 unicode / CJK text	0.792	0.995	1/1	🟡 91.5%
24	classic24 red text	1.000	0.996	1/1	🟢 99.9%
25	classic25 multiple colors	1.000	0.995	1/1	🟢 99.8%
26	classic26 inline strings	1.000	0.997	1/1	🟢 99.9%
27	classic27 single row	1.000	0.999	1/1	🟢 99.9%
28	classic28 duplicate values	1.000	0.997	1/1	🟢 99.9%
29	classic29 formula results	1.000	0.997	1/1	🟢 99.9%
30	classic30 mixed empty & filled sheets	1.000	0.999	2/2	🟢 100.0%
31	classic31 bold header row	1.000	0.996	1/1	🟢 99.8%
32	classic32 right-aligned numbers	1.000	0.997	1/1	🟢 99.9%
33	classic33 centered text	1.000	0.998	1/1	🟢 99.9%
34	classic34 explicit column widths	1.000	0.996	1/1	🟢 99.8%
35	classic35 explicit row heights	1.000	0.999	1/1	🟢 100.0%
36	classic36 merged cells	1.000	0.996	1/1	🟢 99.8%
37	classic37 freeze panes	1.000	0.989	1/1	🟢 99.6%
38	classic38 hyperlink cell	1.000	0.997	1/1	🟢 99.9%
39	classic39 financial table	1.000	0.994	1/1	🟢 99.8%
40	classic40 scientific notation	1.000	0.996	1/1	🟢 99.9%
41	classic41 integer vs float	1.000	0.997	1/1	🟢 99.9%
42	classic42 boolean values	0.990	0.996	1/1	🟢 99.4%
43	classic43 inventory report	0.998	0.989	1/1	🟢 99.5%
44	classic44 employee roster	0.997	0.984	1/1	🟢 99.2%
45	classic45 sales by region (4 sheets)	1.000	0.998	4/4	🟢 99.9%
46	classic46 grade book	1.000	0.991	1/1	🟢 99.6%
47	classic47 time series (31 rows)	1.000	0.982	1/1	🟢 99.3%
48	classic48 survey results	0.994	0.993	1/1	🟢 99.5%
49	classic49 contact list	0.946	0.989	1/1	🟡 97.4%
50	classic50 budget vs actuals (3 sheets)	1.000	0.990	3/3	🟢 99.6%
51	classic51 product catalog	0.978	0.986	1/1	🟡 98.6%
52	classic52 pivot summary	1.000	0.990	1/1	🟢 99.6%
53	classic53 invoice layout	0.995	0.991	1/1	🟢 99.5%
54	classic54 multi-level header	1.000	0.994	1/1	🟢 99.8%
55	classic55 error values	1.000	0.995	1/1	🟢 99.8%
56	classic56 alternating row colors	1.000	0.991	1/1	🟢 99.6%
57	classic57 CJK-only sheet	0.783	0.908	1/1	🔴 87.6%
58	classic58 mixed numeric formats	0.994	0.996	1/1	🟢 99.6%
59	classic59 multi-sheet summary (4 sheets)	1.000	0.996	4/4	🟢 99.9%
60	classic60 large wide table (20×50)	1.000	0.934	4/4	🟡 97.4%
61	classic61 product card with image	1.000	0.998	1/1	🟢 99.9%
62	classic62 company logo header	1.000	0.995	1/1	🟢 99.8%
63	classic63 two products side by side	1.000	0.995	1/1	🟢 99.8%
64	classic64 employee directory with photo	0.997	0.993	1/1	🟢 99.6%
65	classic65 inventory with product photos	0.991	0.994	1/1	🟢 99.4%
66	classic66 invoice with logo	0.987	0.995	1/1	🟢 99.3%
67	classic67 real estate listing	1.000	0.994	1/1	🟢 99.8%
68	classic68 restaurant menu	0.978	0.981	1/1	🟡 98.4%
69	classic69 image only sheet	1.000	0.999	1/1	🟢 100.0%
70	classic70 product catalog with images	0.983	0.993	1/1	🟢 99.0%
71	classic71 multi sheet with images	1.000	0.999	3/3	🟢 100.0%
72	classic72 bar chart image with data	1.000	0.988	1/1	🟢 99.5%
73	classic73 event flyer with banner	0.945	0.993	1/1	🟡 97.5%
74	classic74 dashboard with KPI image	0.963	0.989	1/1	🟡 98.0%
75	classic75 certificate with seal	1.000	0.988	1/1	🟢 99.5%
76	classic76 product image grid	1.000	0.990	1/1	🟢 99.6%
77	classic77 news article with hero image	1.000	0.990	1/1	🟢 99.6%
78	classic78 small icon per row	0.990	0.995	1/1	🟢 99.4%
79	classic79 wide panoramic banner	1.000	0.994	1/1	🟢 99.8%
80	classic80 portrait tall image	1.000	0.994	1/1	🟢 99.8%
81	classic81 step by step with images	1.000	0.991	1/1	🟢 99.7%
82	classic82 before after images	0.993	0.992	1/1	🟢 99.4%
83	classic83 color swatch palette	0.992	0.992	1/1	🟢 99.4%
84	classic84 travel destination cards	1.000	0.989	1/1	🟢 99.6%
85	classic85 lab results with image	0.993	0.989	1/1	🟢 99.3%
86	classic86 software screenshot features	0.986	0.994	1/1	🟢 99.2%
87	classic87 sports results with logos	1.000	0.995	1/1	🟢 99.8%
88	classic88 image after data	1.000	0.994	1/1	🟢 99.8%
89	classic89 nutrition label with image	0.998	0.994	1/1	🟢 99.7%
90	classic90 project status with milestones	0.959	0.987	1/1	🟡 97.8%
91	classic91 simple bar chart	0.949	0.961	2/2	🟡 96.4%
92	classic92 horizontal bar chart	0.971	0.967	2/2	🟡 97.5%
93	classic93 line chart	0.826	0.986	2/2	🟡 92.5%
94	classic94 pie chart	0.878	0.926	2/2	🟡 92.1%
95	classic95 area chart	0.644	0.765	2/2	🔴 76.4%
96	classic96 scatter chart	0.892	0.986	2/2	🟡 95.1%
97	classic97 doughnut chart	0.857	0.932	2/2	🟡 91.5%
98	classic98 radar chart	0.888	0.990	2/2	🟡 95.1%
99	classic99 bubble chart	0.845	0.964	2/2	🟡 92.3%
100	classic100 stacked bar chart	0.966	0.905	1/1	🟡 94.9%
101	classic101 percent stacked bar	0.962	0.878	1/1	🟡 93.6%
102	classic102 line chart with markers	0.861	0.988	2/2	🟡 94.0%
103	classic103 pie chart with labels	0.737	0.975	2/2	🔴 88.4%
104	classic104 combo bar line chart	0.787	0.753	2/2	🔴 81.6%
105	classic105 3D bar chart	0.903	0.744	2/2	🔴 85.9%
106	classic106 3D pie chart	0.824	0.968	2/2	🟡 91.7%
107	classic107 multi series line	0.738	0.777	2/2	🔴 80.6%
108	classic108 stacked area chart	0.964	0.896	1/1	🟡 94.4%
109	classic109 scatter with trendline	0.829	0.986	2/2	🟡 92.6%
110	classic110 chart with legend	0.837	0.781	2/2	🔴 84.7%
111	classic111 chart with axis labels	0.827	0.977	2/2	🟡 92.1%
112	classic112 multiple charts	0.875	0.759	2/2	🔴 85.4%
113	classic113 chart sheet	0.926	0.735	2/2	🔴 86.4%
114	classic114 chart large dataset	0.901	0.887	4/4	🟡 91.5%
115	classic115 chart negative values	0.832	0.972	2/2	🟡 92.2%
116	classic116 percent stacked area	0.965	0.880	1/1	🟡 93.8%
117	classic117 stock OHLC chart	0.794	0.729	2/2	🔴 80.9%
118	classic118 bar chart custom colors	0.927	0.961	2/2	🟡 95.5%
119	classic119 dashboard multi charts	0.841	0.935	2/2	🟡 91.0%
120	classic120 chart with date axis	0.574	0.783	2/2	🔴 74.3%
121	classic121 thin borders	1.000	0.993	1/1	🟢 99.7%
122	classic122 thick outer thin inner	1.000	0.991	1/1	🟢 99.6%
123	classic123 dashed borders	0.992	0.995	1/1	🟢 99.5%
124	classic124 colored borders	1.000	0.993	1/1	🟢 99.7%
125	classic125 solid fills	0.992	0.993	1/1	🟢 99.4%
126	classic126 dark header	0.993	0.991	1/1	🟢 99.4%
127	classic127 font styles	0.991	0.992	1/1	🟢 99.3%
128	classic128 font sizes	0.982	0.994	1/1	🟢 99.0%
129	classic129 alignment combos	1.000	0.996	1/1	🟢 99.9%
130	classic130 wrap and indent	1.000	0.987	1/1	🟢 99.5%
131	classic131 number formats	1.000	0.991	1/1	🟢 99.7%
132	classic132 striped table	1.000	0.978	1/1	🟢 99.1%
133	classic133 gradient rows	1.000	0.991	1/1	🟢 99.7%
134	classic134 heatmap	1.000	0.971	1/1	🟡 98.9%
135	classic135 bottom border only	1.000	0.996	1/1	🟢 99.8%
136	classic136 financial report styled	1.000	0.983	1/1	🟢 99.3%
137	classic137 checkerboard	1.000	0.960	1/1	🟡 98.4%
138	classic138 color grid	1.000	0.986	1/1	🟢 99.5%
139	classic139 pattern fills	1.000	0.981	1/1	🟢 99.2%
140	classic140 rotated text	0.958	0.994	1/1	🟡 98.1%
141	classic141 mixed edge borders	1.000	0.993	1/1	🟢 99.7%
142	classic142 styled invoice	1.000	0.959	1/1	🟡 98.4%
143	classic143 colored tabs	1.000	0.999	4/4	🟢 100.0%
144	classic144 note style cells	1.000	0.989	1/1	🟢 99.6%
145	classic145 status badges	1.000	0.977	1/1	🟢 99.1%
146	classic146 double border table	1.000	0.989	1/1	🟢 99.6%
147	classic147 multi sheet styled	1.000	0.993	3/3	🟢 99.7%
148	classic148 frozen styled grid	1.000	0.934	1/1	🟡 97.4%
149	classic149 merged styled sections	1.000	0.968	1/1	🟡 98.7%
150	classic150 kitchen sink styles	0.968	0.969	1/1	🟡 97.5%

Known Limitations

Issue	Affected Cases	Root Cause
Long text not paginated	classic09 (60.1%)	MiniPdf renders as clipped single line; LibreOffice paginates with text wrapping
CJK / Unicode glyphs	classic23 (91.5%), classic57 (87.6%)	Helvetica (built-in PDF font) does not include CJK code points; width estimation differs
Chart text/layout	31 chart cases (74–96%)	Y-axis scale, legend placement, title clipping differ from LibreOffice's Calc charting engine
No 3-D chart effects	classic105, classic106	3-D chart styles rendered as flat 2-D equivalents
Unsupported chart types	stock, combo	Fall back to basic bar renderer
Pattern fill precision	classic137 (98.4%)	Checkerboard/hatch patterns approximated via solid fill blending

Adding New Test Cases

Add the generator in tests/MiniPdf.Scripts/generate_classic_xlsx.py:

def classic121_my_new_scenario():
    wb = Workbook()
    ws = wb.active
    ws.append(["Col1", "Col2"])
    ws.append(["val1", "val2"])
    save(wb, "classic121_my_new_scenario.xlsx")

Register it in main() → generators list.

Add the C# test in tests/MiniPdf.Tests/ClassicExcelToPdfTests.cs:

[Fact]
public void Classic121_MyNewScenario()
{
    using var xlsx = XlsxBuilder.Simple(
        new[] { "Col1", "Col2" },
        new[] { "val1", "val2" });
    AssertValidPdf(xlsx, "Col1", "val1");
}

Run the pipeline:

$env:PYTHONIOENCODING="utf-8"
python tests/MiniPdf.Benchmark/run_benchmark.py

Check the report in tests/MiniPdf.Benchmark/reports/comparison_report.md.

Self-Evolution Loop

Run Benchmark
     │
     ▼
Read report → find low-score cases
     │
     ▼
AI fixes ExcelToPdfConverter.cs
     │
     ▼
dotnet test (94 tests must pass)
     │
     ▼
Re-run benchmark (--skip-generate --skip-reference)
     │
     ▼
Score improved? → commit. Otherwise → iterate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark

Benchmark — Self-Evolution Pipeline

Pipeline Architecture

Scoring Formula

Running the Benchmark

Prerequisites

Full Pipeline

Partial Runs

Current Results (2026-03-05)

Known Limitations

Adding New Test Cases

Self-Evolution Loop

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally