perf: lp write speed #562

FBumann · 2026-01-31T16:37:31Z

Changes proposed in this Pull Request

Optimize the LP file writing pipeline, achieving ~40-60% speedup on m.to_file() across synthetic and realistic PyPSA models.

Benchmark results

Measured with dev-scripts/benchmark_lp_writer.py (10 iterations, warmup). Before/after run back-to-back on same machine.

basic_model (2 × N² vars, 2 × N² constraints):

N	Vars	Cons	Before	After	Speedup
50	5,000	5,000	24ms	10ms	56%
100	20,000	20,000	42ms	18ms	57%
200	80,000	80,000	138ms	51ms	63%
500	500,000	500,000	732ms	251ms	66%
1000	2,000,000	2,000,000	3091ms	1008ms	67%

knapsack_model (N binary vars, 1 constraint with N terms):

N	Vars	Before	After	Speedup
100	100	5.6ms	4.5ms	20%
1,000	1,000	7.8ms	5.5ms	29%
10,000	10,000	11.8ms	8.5ms	28%
50,000	50,000	36.6ms	23.6ms	36%
100,000	100,000	63.1ms	37.3ms	41%

PyPSA SciGrid-DE (realistic power system, 585 buses, 1423 generators, 852 lines):

Model	Vars	Cons	Before	After	Speedup
scigrid-de (24 snapshots)	59,640	142,968	294ms	262ms	11%

Per-commit impact (basic_model)

Cumulative impact measured on basic_model (N=100 → 20k vars, N=500 → 500k vars):

Commit	Description	N=100	N=500	Δ N=500
Baseline	Original code	44ms	722ms	—
`ccb9cd2`	Replace concat+sort with join in `Constraint.to_polars()`, remove `group_by` validation in `constraints_to_file()`	41ms	613ms	-15%
`aab95f5`	Skip `group_terms` when `_term=1`, build short DataFrame with numpy instead of xarray broadcast, fast sign column via `pl.lit()`	29ms	516ms	-16%
`7762659`	Skip `group_terms` in `LinearExpression.to_polars()` when no duplicate vars (objective speedup)	28ms	508ms	-2%
`bdbb042`	Pre-cast rhs to String in `with_columns` instead of inside `concat_str`	33ms	483ms	-5%
`44b115f`	Use Polars streaming engine for `concat_str` + `write_csv` with fallback to eager	36ms	390ms	-19%
Total		36ms	390ms	-46%

Per-commit impact (PyPSA SciGrid-DE 240h — 596,400 vars, 1,429,680 cons)

Measured with 2 warmup iterations + 8 timed iterations on the extended SciGrid-DE model (240 snapshots).

Commit	Description	Time	Δ cumulative	Δ vs previous
Baseline	Original code	2724ms ± 121ms	—	—
`ccb9cd2`	Replace concat+sort with join	1926ms ± 11ms	-29%	-29%
`aab95f5`	Skip group_terms, numpy short DF	1560ms ± 29ms	-43%	-19%
`7762659`	Skip group_terms in objective	1527ms ± 53ms	-44%	-2%
`bdbb042`	Pre-cast rhs to String	1576ms ± 21ms	-42%	+3% (regression)
`44b115f`	Polars streaming engine	1237ms ± 27ms	-55%	-21%

Note: bdbb042 shows a small +3% regression on the PyPSA model while helping the basic model. The streaming engine commit recovers and extends the gains.

Benchmark script

Run with python dev-scripts/benchmark_lp_writer.py. Tests basic_model, knapsack_model (up to 100k vars), and PyPSA SciGrid-DE.

Checklist

Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
Unit tests for new features were added (if applicable).
A note for the release notes doc/release_notes.rst of the upcoming release is included.
I consent to the release of this PR's code under the MIT license.

Replace the vertical concat + sort approach in Constraint.to_polars() with an inner join, so every row has all columns populated. This removes the need for the group_by validation step in constraints_to_file() and simplifies the formatting expressions by eliminating null checks on coeffs/vars columns.

…r short DataFrame - Skip group_terms_polars when _term dim size is 1 (no duplicate vars) - Build the short DataFrame (labels, rhs, sign) directly with numpy instead of going through xarray.broadcast + to_polars - Add sign column via pl.lit when uniform (common case), avoiding costly numpy string array → polars conversion Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…e vars Check n_unique before running the expensive group_by+sum. When all variable references are unique (common case for objectives), this saves ~31ms per 320k terms. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…_str Move the rhs float→String cast into the with_columns step so it runs once unconditionally rather than inside a when().then() per row. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add realistic PyPSA SciGrid-DE network model and knapsack model to the benchmark script alongside the existing basic_model.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace np.unique with faster numpy equality check for sign uniformity. Eliminate redundant filter_nulls_polars and check_has_nulls_polars on the short DataFrame by applying the labels mask directly during construction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FBumann · 2026-01-31T21:14:45Z

Closed in favor of #564

FBumann and others added 11 commits January 31, 2026 17:24

fix: missing space in lp file

8524c29

perf: pre-cast rhs to String in with_columns instead of inside concat…

bdbb042

…_str Move the rhs float→String cast into the with_columns step so it runs once unconditionally rather than inside a when().then() per row. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

bench: add PyPSA and knapsack models to LP writer benchmark

e9cffb4

Add realistic PyPSA SciGrid-DE network model and knapsack model to the benchmark script alongside the existing basic_model.

perf: Use polars streaming engine

44b115f

bench: increase knapsack model sizes to 100k

940edc7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add better benchmark script

1919f3f

Add better benchmark script

ac761c0

FBumann closed this Jan 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: lp write speed #562

perf: lp write speed #562

Uh oh!

FBumann commented Jan 31, 2026 •

edited

Loading

Uh oh!

FBumann commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: lp write speed #562

perf: lp write speed #562

Uh oh!

Conversation

FBumann commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this Pull Request

Benchmark results

Per-commit impact (basic_model)

Per-commit impact (PyPSA SciGrid-DE 240h — 596,400 vars, 1,429,680 cons)

Benchmark script

Checklist

Uh oh!

FBumann commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FBumann commented Jan 31, 2026 •

edited

Loading