Skip to content

Benchmark

Wei Lin edited this page Mar 5, 2026 · 3 revisions

Benchmark β€” Self-Evolution Pipeline

MiniPdf uses a self-evolution benchmark pipeline to measure output quality against LibreOffice as the reference renderer across 150 classic Excel scenarios.


Pipeline Architecture

generate_classic_xlsx.py  β†’  120 .xlsx files
        β”‚
        β”œβ”€β”€β–Ί dotnet (MiniPdf)    β†’  ./pdf_output/classic*.pdf
        └──► LibreOffice          β†’  ./reference_pdfs/classic*.pdf
                            β”‚
                        compare_pdfs.py
                            β”‚
                        reports/comparison_report.{md,json,html}

Scoring Formula

Component Weight Method
Text similarity 40% difflib.SequenceMatcher on per-page extracted text
Visual similarity 40% Pixel-level diff via PyMuPDF rendering (150 DPI)
Page count match 20% 1.0 if equal, min/max ratio otherwise

Overall score = text Γ— 0.4 + visual Γ— 0.4 + pages Γ— 0.2


Running the Benchmark

Prerequisites

pip install openpyxl pymupdf
# LibreOffice installed (soffice on PATH)
# .NET 9 SDK

Full Pipeline

# Windows (UTF-8 output)
$env:PYTHONIOENCODING="utf-8"
python tests/MiniPdf.Benchmark/run_benchmark.py

Partial Runs

# Skip Excel generation (xlsx already exist)
python tests/MiniPdf.Benchmark/run_benchmark.py --skip-generate

# Skip both generation and LibreOffice reference (only re-run MiniPdf + compare)
python tests/MiniPdf.Benchmark/run_benchmark.py --skip-generate --skip-reference

# Only compare (assumes all PDFs already exist)
python tests/MiniPdf.Benchmark/run_benchmark.py --compare-only

The report is written to tests/MiniPdf.Benchmark/reports/:

  • comparison_report.md β€” human-readable Markdown with diffs
  • comparison_report.json β€” machine-readable scores
  • comparison_report.html β€” visual side-by-side page images
  • images/ β€” PNG renders of each PDF page (minipdf + reference)

Current Results (2026-03-05)

Average overall score: 97.1% across 150 test cases (100 Excellent, 22 Good, 16 Acceptable, 9 Fair, 3 Needs Improvement).

Test cases classic61–classic90 include image scenarios, classic91–classic120 include chart rendering scenarios, and classic121–classic150 include style/border/fill scenarios.

# Test Case Text Visual Pages Overall
1 classic01 basic table with headers 1.000 0.997 1/1 🟒 99.9%
2 classic02 multiple worksheets 0.997 0.998 3/3 🟒 99.8%
3 classic03 empty workbook 1.000 1.000 1/1 🟒 100.0%
4 classic04 single cell 1.000 1.000 1/1 🟒 100.0%
5 classic05 wide table (A–Z) 1.000 0.994 3/3 🟒 99.7%
6 classic06 tall table (200 rows) 1.000 0.945 5/5 🟑 97.8%
7 classic07 numbers only 1.000 0.999 1/1 🟒 100.0%
8 classic08 mixed text and numbers 1.000 0.998 1/1 🟒 99.9%
9 classic09 long text 0.675 0.577 7/12 πŸ”΄ 60.1%
10 classic10 special XML characters 1.000 0.997 1/1 🟒 99.9%
11 classic11 sparse rows 1.000 0.999 2/2 🟒 100.0%
12 classic12 sparse columns 1.000 0.998 1/1 🟒 99.9%
13 classic13 date strings 0.974 0.996 1/1 🟑 98.8%
14 classic14 decimal numbers 1.000 0.997 1/1 🟒 99.9%
15 classic15 negative numbers 1.000 0.997 1/1 🟒 99.9%
16 classic16 percentage strings 0.994 0.997 1/1 🟒 99.6%
17 classic17 currency strings 1.000 0.996 1/1 🟒 99.9%
18 classic18 large dataset (1000Γ—10) 1.000 0.893 24/24 🟑 95.7%
19 classic19 single column list 1.000 0.997 1/1 🟒 99.9%
20 classic20 all empty cells 1.000 1.000 1/1 🟒 100.0%
21 classic21 header only 1.000 0.999 1/1 🟒 100.0%
22 classic22 long sheet name 1.000 0.999 1/1 🟒 99.9%
23 classic23 unicode / CJK text 0.792 0.995 1/1 🟑 91.5%
24 classic24 red text 1.000 0.996 1/1 🟒 99.9%
25 classic25 multiple colors 1.000 0.995 1/1 🟒 99.8%
26 classic26 inline strings 1.000 0.997 1/1 🟒 99.9%
27 classic27 single row 1.000 0.999 1/1 🟒 99.9%
28 classic28 duplicate values 1.000 0.997 1/1 🟒 99.9%
29 classic29 formula results 1.000 0.997 1/1 🟒 99.9%
30 classic30 mixed empty & filled sheets 1.000 0.999 2/2 🟒 100.0%
31 classic31 bold header row 1.000 0.996 1/1 🟒 99.8%
32 classic32 right-aligned numbers 1.000 0.997 1/1 🟒 99.9%
33 classic33 centered text 1.000 0.998 1/1 🟒 99.9%
34 classic34 explicit column widths 1.000 0.996 1/1 🟒 99.8%
35 classic35 explicit row heights 1.000 0.999 1/1 🟒 100.0%
36 classic36 merged cells 1.000 0.996 1/1 🟒 99.8%
37 classic37 freeze panes 1.000 0.989 1/1 🟒 99.6%
38 classic38 hyperlink cell 1.000 0.997 1/1 🟒 99.9%
39 classic39 financial table 1.000 0.994 1/1 🟒 99.8%
40 classic40 scientific notation 1.000 0.996 1/1 🟒 99.9%
41 classic41 integer vs float 1.000 0.997 1/1 🟒 99.9%
42 classic42 boolean values 0.990 0.996 1/1 🟒 99.4%
43 classic43 inventory report 0.998 0.989 1/1 🟒 99.5%
44 classic44 employee roster 0.997 0.984 1/1 🟒 99.2%
45 classic45 sales by region (4 sheets) 1.000 0.998 4/4 🟒 99.9%
46 classic46 grade book 1.000 0.991 1/1 🟒 99.6%
47 classic47 time series (31 rows) 1.000 0.982 1/1 🟒 99.3%
48 classic48 survey results 0.994 0.993 1/1 🟒 99.5%
49 classic49 contact list 0.946 0.989 1/1 🟑 97.4%
50 classic50 budget vs actuals (3 sheets) 1.000 0.990 3/3 🟒 99.6%
51 classic51 product catalog 0.978 0.986 1/1 🟑 98.6%
52 classic52 pivot summary 1.000 0.990 1/1 🟒 99.6%
53 classic53 invoice layout 0.995 0.991 1/1 🟒 99.5%
54 classic54 multi-level header 1.000 0.994 1/1 🟒 99.8%
55 classic55 error values 1.000 0.995 1/1 🟒 99.8%
56 classic56 alternating row colors 1.000 0.991 1/1 🟒 99.6%
57 classic57 CJK-only sheet 0.783 0.908 1/1 πŸ”΄ 87.6%
58 classic58 mixed numeric formats 0.994 0.996 1/1 🟒 99.6%
59 classic59 multi-sheet summary (4 sheets) 1.000 0.996 4/4 🟒 99.9%
60 classic60 large wide table (20Γ—50) 1.000 0.934 4/4 🟑 97.4%
61 classic61 product card with image 1.000 0.998 1/1 🟒 99.9%
62 classic62 company logo header 1.000 0.995 1/1 🟒 99.8%
63 classic63 two products side by side 1.000 0.995 1/1 🟒 99.8%
64 classic64 employee directory with photo 0.997 0.993 1/1 🟒 99.6%
65 classic65 inventory with product photos 0.991 0.994 1/1 🟒 99.4%
66 classic66 invoice with logo 0.987 0.995 1/1 🟒 99.3%
67 classic67 real estate listing 1.000 0.994 1/1 🟒 99.8%
68 classic68 restaurant menu 0.978 0.981 1/1 🟑 98.4%
69 classic69 image only sheet 1.000 0.999 1/1 🟒 100.0%
70 classic70 product catalog with images 0.983 0.993 1/1 🟒 99.0%
71 classic71 multi sheet with images 1.000 0.999 3/3 🟒 100.0%
72 classic72 bar chart image with data 1.000 0.988 1/1 🟒 99.5%
73 classic73 event flyer with banner 0.945 0.993 1/1 🟑 97.5%
74 classic74 dashboard with KPI image 0.963 0.989 1/1 🟑 98.0%
75 classic75 certificate with seal 1.000 0.988 1/1 🟒 99.5%
76 classic76 product image grid 1.000 0.990 1/1 🟒 99.6%
77 classic77 news article with hero image 1.000 0.990 1/1 🟒 99.6%
78 classic78 small icon per row 0.990 0.995 1/1 🟒 99.4%
79 classic79 wide panoramic banner 1.000 0.994 1/1 🟒 99.8%
80 classic80 portrait tall image 1.000 0.994 1/1 🟒 99.8%
81 classic81 step by step with images 1.000 0.991 1/1 🟒 99.7%
82 classic82 before after images 0.993 0.992 1/1 🟒 99.4%
83 classic83 color swatch palette 0.992 0.992 1/1 🟒 99.4%
84 classic84 travel destination cards 1.000 0.989 1/1 🟒 99.6%
85 classic85 lab results with image 0.993 0.989 1/1 🟒 99.3%
86 classic86 software screenshot features 0.986 0.994 1/1 🟒 99.2%
87 classic87 sports results with logos 1.000 0.995 1/1 🟒 99.8%
88 classic88 image after data 1.000 0.994 1/1 🟒 99.8%
89 classic89 nutrition label with image 0.998 0.994 1/1 🟒 99.7%
90 classic90 project status with milestones 0.959 0.987 1/1 🟑 97.8%
91 classic91 simple bar chart 0.949 0.961 2/2 🟑 96.4%
92 classic92 horizontal bar chart 0.971 0.967 2/2 🟑 97.5%
93 classic93 line chart 0.826 0.986 2/2 🟑 92.5%
94 classic94 pie chart 0.878 0.926 2/2 🟑 92.1%
95 classic95 area chart 0.644 0.765 2/2 πŸ”΄ 76.4%
96 classic96 scatter chart 0.892 0.986 2/2 🟑 95.1%
97 classic97 doughnut chart 0.857 0.932 2/2 🟑 91.5%
98 classic98 radar chart 0.888 0.990 2/2 🟑 95.1%
99 classic99 bubble chart 0.845 0.964 2/2 🟑 92.3%
100 classic100 stacked bar chart 0.966 0.905 1/1 🟑 94.9%
101 classic101 percent stacked bar 0.962 0.878 1/1 🟑 93.6%
102 classic102 line chart with markers 0.861 0.988 2/2 🟑 94.0%
103 classic103 pie chart with labels 0.737 0.975 2/2 πŸ”΄ 88.4%
104 classic104 combo bar line chart 0.787 0.753 2/2 πŸ”΄ 81.6%
105 classic105 3D bar chart 0.903 0.744 2/2 πŸ”΄ 85.9%
106 classic106 3D pie chart 0.824 0.968 2/2 🟑 91.7%
107 classic107 multi series line 0.738 0.777 2/2 πŸ”΄ 80.6%
108 classic108 stacked area chart 0.964 0.896 1/1 🟑 94.4%
109 classic109 scatter with trendline 0.829 0.986 2/2 🟑 92.6%
110 classic110 chart with legend 0.837 0.781 2/2 πŸ”΄ 84.7%
111 classic111 chart with axis labels 0.827 0.977 2/2 🟑 92.1%
112 classic112 multiple charts 0.875 0.759 2/2 πŸ”΄ 85.4%
113 classic113 chart sheet 0.926 0.735 2/2 πŸ”΄ 86.4%
114 classic114 chart large dataset 0.901 0.887 4/4 🟑 91.5%
115 classic115 chart negative values 0.832 0.972 2/2 🟑 92.2%
116 classic116 percent stacked area 0.965 0.880 1/1 🟑 93.8%
117 classic117 stock OHLC chart 0.794 0.729 2/2 πŸ”΄ 80.9%
118 classic118 bar chart custom colors 0.927 0.961 2/2 🟑 95.5%
119 classic119 dashboard multi charts 0.841 0.935 2/2 🟑 91.0%
120 classic120 chart with date axis 0.574 0.783 2/2 πŸ”΄ 74.3%
121 classic121 thin borders 1.000 0.993 1/1 🟒 99.7%
122 classic122 thick outer thin inner 1.000 0.991 1/1 🟒 99.6%
123 classic123 dashed borders 0.992 0.995 1/1 🟒 99.5%
124 classic124 colored borders 1.000 0.993 1/1 🟒 99.7%
125 classic125 solid fills 0.992 0.993 1/1 🟒 99.4%
126 classic126 dark header 0.993 0.991 1/1 🟒 99.4%
127 classic127 font styles 0.991 0.992 1/1 🟒 99.3%
128 classic128 font sizes 0.982 0.994 1/1 🟒 99.0%
129 classic129 alignment combos 1.000 0.996 1/1 🟒 99.9%
130 classic130 wrap and indent 1.000 0.987 1/1 🟒 99.5%
131 classic131 number formats 1.000 0.991 1/1 🟒 99.7%
132 classic132 striped table 1.000 0.978 1/1 🟒 99.1%
133 classic133 gradient rows 1.000 0.991 1/1 🟒 99.7%
134 classic134 heatmap 1.000 0.971 1/1 🟑 98.9%
135 classic135 bottom border only 1.000 0.996 1/1 🟒 99.8%
136 classic136 financial report styled 1.000 0.983 1/1 🟒 99.3%
137 classic137 checkerboard 1.000 0.960 1/1 🟑 98.4%
138 classic138 color grid 1.000 0.986 1/1 🟒 99.5%
139 classic139 pattern fills 1.000 0.981 1/1 🟒 99.2%
140 classic140 rotated text 0.958 0.994 1/1 🟑 98.1%
141 classic141 mixed edge borders 1.000 0.993 1/1 🟒 99.7%
142 classic142 styled invoice 1.000 0.959 1/1 🟑 98.4%
143 classic143 colored tabs 1.000 0.999 4/4 🟒 100.0%
144 classic144 note style cells 1.000 0.989 1/1 🟒 99.6%
145 classic145 status badges 1.000 0.977 1/1 🟒 99.1%
146 classic146 double border table 1.000 0.989 1/1 🟒 99.6%
147 classic147 multi sheet styled 1.000 0.993 3/3 🟒 99.7%
148 classic148 frozen styled grid 1.000 0.934 1/1 🟑 97.4%
149 classic149 merged styled sections 1.000 0.968 1/1 🟑 98.7%
150 classic150 kitchen sink styles 0.968 0.969 1/1 🟑 97.5%

Known Limitations

Issue Affected Cases Root Cause
Long text not paginated classic09 (60.1%) MiniPdf renders as clipped single line; LibreOffice paginates with text wrapping
CJK / Unicode glyphs classic23 (91.5%), classic57 (87.6%) Helvetica (built-in PDF font) does not include CJK code points; width estimation differs
Chart text/layout 31 chart cases (74–96%) Y-axis scale, legend placement, title clipping differ from LibreOffice's Calc charting engine
No 3-D chart effects classic105, classic106 3-D chart styles rendered as flat 2-D equivalents
Unsupported chart types stock, combo Fall back to basic bar renderer
Pattern fill precision classic137 (98.4%) Checkerboard/hatch patterns approximated via solid fill blending

Adding New Test Cases

  1. Add the generator in tests/MiniPdf.Scripts/generate_classic_xlsx.py:

    def classic121_my_new_scenario():
        wb = Workbook()
        ws = wb.active
        ws.append(["Col1", "Col2"])
        ws.append(["val1", "val2"])
        save(wb, "classic121_my_new_scenario.xlsx")

    Register it in main() β†’ generators list.

  2. Add the C# test in tests/MiniPdf.Tests/ClassicExcelToPdfTests.cs:

    [Fact]
    public void Classic121_MyNewScenario()
    {
        using var xlsx = XlsxBuilder.Simple(
            new[] { "Col1", "Col2" },
            new[] { "val1", "val2" });
        AssertValidPdf(xlsx, "Col1", "val1");
    }
  3. Run the pipeline:

    $env:PYTHONIOENCODING="utf-8"
    python tests/MiniPdf.Benchmark/run_benchmark.py
  4. Check the report in tests/MiniPdf.Benchmark/reports/comparison_report.md.


Self-Evolution Loop

Run Benchmark
     β”‚
     β–Ό
Read report β†’ find low-score cases
     β”‚
     β–Ό
AI fixes ExcelToPdfConverter.cs
     β”‚
     β–Ό
dotnet test (94 tests must pass)
     β”‚
     β–Ό
Re-run benchmark (--skip-generate --skip-reference)
     β”‚
     β–Ό
Score improved? β†’ commit. Otherwise β†’ iterate.

Clone this wiki locally