Skip to content

Conversation

@FBumann
Copy link
Contributor

@FBumann FBumann commented Jan 31, 2026

Changes proposed in this Pull Request

Speed up LP file writing by up to 3.9x on large models, with consistent improvements across all problem sizes. Includes a benchmark script for reproducibility.
Added a lp-file benchmarking script, which is a majority of the lines changed

Performance optimizations

  • Use Polars streaming engine for concat_str + write_csv via new _format_and_write() helper (with automatic fallback + warning)
  • Replace concat + sort with join for constraint assembly
  • Extract maybe_group_terms_polars() to skip expensive group_by when terms already reference distinct variables
  • Reduce per-constraint overhead by applying labels mask directly and using fast sign uniformity check

Bug fixes

  • Fix missing space in LP file output
  • Fix IndexError on empty constraint slices in sign_flat check

Benchmark results

Reproduce with python dev-scripts/benchmark_lp_writer.py --model basic -o results.json --label "my run".

Synthetic model (2×N² vars, 2×N² constraints)

No regressions on small models, speedup grows with problem size up to 3.9x at 8M variables.

benchmark_lp_comparison

PyPSA SciGrid-DE (realistic power system model, 24–1000 snapshots)

Consistent 2.5–2.7x speedup across all sizes, reaching 7.0s → 2.7s at 2.5M variables / 6M constraints.

benchmark_lp_comparison

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

FBumann and others added 8 commits January 31, 2026 21:06
Extract _format_and_write() helper that uses lazy().collect(engine="streaming")
with automatic fallback, replacing 7 instances of df.select(concat_str(...)).write_csv(...).
Replace the vertical concat + sort approach in Constraint.to_polars()
with an inner join, so every row has all columns populated. This removes
the need for the group_by validation step in constraints_to_file() and
simplifies the formatting expressions by eliminating null checks on
coeffs/vars columns.
…r short DataFrame

- Skip group_terms_polars when _term dim size is 1 (no duplicate vars)
- Build the short DataFrame (labels, rhs, sign) directly with numpy
  instead of going through xarray.broadcast + to_polars
- Add sign column via pl.lit when uniform (common case), avoiding
  costly numpy string array → polars conversion

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…e vars

Check n_unique before running the expensive group_by+sum. When all
variable references are unique (common case for objectives), this
saves ~31ms per 320k terms.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace np.unique with faster numpy equality check for sign uniformity.
Eliminate redundant filter_nulls_polars and check_has_nulls_polars on
the short DataFrame by applying the labels mask directly during
construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Guard against IndexError when sign_flat is empty (no valid labels)
by checking len(sign_flat) > 0 before accessing sign_flat[0].

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FBumann FBumann mentioned this pull request Jan 31, 2026
4 tasks
FBumann and others added 5 commits January 31, 2026 22:17
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…for duplicate (labels, vars) pairs before calling group_terms_polars. Use it in both Constraint.to_polars() and LinearExpression.to_polars() to avoid expensive group_by when terms already reference distinct variables
@FBumann FBumann changed the title perf: speed up LP file writing (2-2.7x on large models) perf: speed up LP file writing (2.5-3.9x on large models) Jan 31, 2026
@FBumann FBumann changed the title perf: speed up LP file writing (2.5-3.9x on large models) perf: speed up LP file writing (2.5-3.9x on large models, no regressions on small) Jan 31, 2026
@FabianHofmann
Copy link
Collaborator

Wonderful @FBumann ! This is very much welcome!

@FBumann
Copy link
Contributor Author

FBumann commented Feb 2, 2026

@FabianHofmann Should a fix the codecov stuff?

@FBumann FBumann mentioned this pull request Feb 2, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants