Refactor simplex and Lazy simplex improvement#813
Open
nathanneike wants to merge 6 commits into
Open
Conversation
…orage # Conflicts: # .github/workflows/build_tests.yml
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #813 +/- ##
=======================================
Coverage 96.85% 96.85%
=======================================
Files 115 115
Lines 23344 23373 +29
=======================================
+ Hits 22610 22639 +29
Misses 734 734 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Types of changes
Refactor
Performance improvement
Tests
Motivation and context / Related issue
This refactors the network simplex storage used by lazy EMD so that lazy mode avoids materializing dense per-arc storage where possible.
The lazy path now uses explicit storage/accessor modes for costs, flows, endpoints, and arc states. Real transport arc costs are computed on demand from sample coordinates, while artificial simplex root arc costs are still stored explicitly. Lazy EMD also returns sparse transport output instead of allocating a dense transport matrix internally. Dense and sparse EMD behavior are unchanged.
This substantially reduces memory usage for lazy EMD. For larger problems, it can also improve runtime: although lazy mode recomputes real arc costs, it avoids materializing and repeatedly accessing a multi-GiB dense cost matrix, reducing memory-bandwidth pressure.
Peak memory includes the Python/import baseline. The “extra memory during solve” column better isolates solver-side growth. These timings use the default squared Euclidean distance in 2D. The runtime crossover depends on the distance metric and feature dimension: lazy mode recomputes costs on demand, so it benefits most when avoiding dense memory traffic is more expensive than recomputing the distance. In this local benchmark, the crossover happens between
n=15000andn=20000.How has this been tested (if it applies)
Rebuilt the C++/Cython extension and ran the whole test suite
PR checklist