Skip to content

Conversation

@FBumann
Copy link
Contributor

@FBumann FBumann commented Jan 31, 2026

Changes proposed in this Pull Request

Optimize the LP file writing pipeline, achieving ~40-60% speedup on m.to_file() across synthetic and realistic PyPSA models.

Benchmark results

Measured with dev-scripts/benchmark_lp_writer.py (10 iterations, warmup). Before/after run back-to-back on same machine.

basic_model (2 × N² vars, 2 × N² constraints):

N Vars Cons Before After Speedup
50 5,000 5,000 24ms 10ms 56%
100 20,000 20,000 42ms 18ms 57%
200 80,000 80,000 138ms 51ms 63%
500 500,000 500,000 732ms 251ms 66%
1000 2,000,000 2,000,000 3091ms 1008ms 67%

knapsack_model (N binary vars, 1 constraint with N terms):

N Vars Before After Speedup
100 100 5.6ms 4.5ms 20%
1,000 1,000 7.8ms 5.5ms 29%
10,000 10,000 11.8ms 8.5ms 28%
50,000 50,000 36.6ms 23.6ms 36%
100,000 100,000 63.1ms 37.3ms 41%

PyPSA SciGrid-DE (realistic power system, 585 buses, 1423 generators, 852 lines):

Model Vars Cons Before After Speedup
scigrid-de (24 snapshots) 59,640 142,968 294ms 262ms 11%

Per-commit impact (basic_model)

Cumulative impact measured on basic_model (N=100 → 20k vars, N=500 → 500k vars):

Commit Description N=100 N=500 Δ N=500
Baseline Original code 44ms 722ms
ccb9cd2 Replace concat+sort with join in Constraint.to_polars(), remove group_by validation in constraints_to_file() 41ms 613ms -15%
aab95f5 Skip group_terms when _term=1, build short DataFrame with numpy instead of xarray broadcast, fast sign column via pl.lit() 29ms 516ms -16%
7762659 Skip group_terms in LinearExpression.to_polars() when no duplicate vars (objective speedup) 28ms 508ms -2%
bdbb042 Pre-cast rhs to String in with_columns instead of inside concat_str 33ms 483ms -5%
44b115f Use Polars streaming engine for concat_str + write_csv with fallback to eager 36ms 390ms -19%
Total 36ms 390ms -46%

Per-commit impact (PyPSA SciGrid-DE 240h — 596,400 vars, 1,429,680 cons)

Measured with 2 warmup iterations + 8 timed iterations on the extended SciGrid-DE model (240 snapshots).

Commit Description Time Δ cumulative Δ vs previous
Baseline Original code 2724ms ± 121ms
ccb9cd2 Replace concat+sort with join 1926ms ± 11ms -29% -29%
aab95f5 Skip group_terms, numpy short DF 1560ms ± 29ms -43% -19%
7762659 Skip group_terms in objective 1527ms ± 53ms -44% -2%
bdbb042 Pre-cast rhs to String 1576ms ± 21ms -42% +3% (regression)
44b115f Polars streaming engine 1237ms ± 27ms -55% -21%

Note: bdbb042 shows a small +3% regression on the PyPSA model while helping the basic model. The streaming engine commit recovers and extends the gains.

Benchmark script

Run with python dev-scripts/benchmark_lp_writer.py. Tests basic_model, knapsack_model (up to 100k vars), and PyPSA SciGrid-DE.

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

FBumann and others added 11 commits January 31, 2026 17:24
Replace the vertical concat + sort approach in Constraint.to_polars()
with an inner join, so every row has all columns populated. This removes
the need for the group_by validation step in constraints_to_file() and
simplifies the formatting expressions by eliminating null checks on
coeffs/vars columns.
…r short DataFrame

- Skip group_terms_polars when _term dim size is 1 (no duplicate vars)
- Build the short DataFrame (labels, rhs, sign) directly with numpy
  instead of going through xarray.broadcast + to_polars
- Add sign column via pl.lit when uniform (common case), avoiding
  costly numpy string array → polars conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e vars

Check n_unique before running the expensive group_by+sum. When all
variable references are unique (common case for objectives), this
saves ~31ms per 320k terms.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…_str

Move the rhs float→String cast into the with_columns step so it runs
once unconditionally rather than inside a when().then() per row.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add realistic PyPSA SciGrid-DE network model and knapsack model
to the benchmark script alongside the existing basic_model.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace np.unique with faster numpy equality check for sign uniformity.
Eliminate redundant filter_nulls_polars and check_has_nulls_polars on
the short DataFrame by applying the labels mask directly during
construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FBumann
Copy link
Contributor Author

FBumann commented Jan 31, 2026

Closed in favor of #564

@FBumann FBumann closed this Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant