Skip to content

Refactor simplex and Lazy simplex improvement#813

Open
nathanneike wants to merge 6 commits into
PythonOT:masterfrom
nathanneike:refactor-simplex-storage
Open

Refactor simplex and Lazy simplex improvement#813
nathanneike wants to merge 6 commits into
PythonOT:masterfrom
nathanneike:refactor-simplex-storage

Conversation

@nathanneike
Copy link
Copy Markdown
Contributor

@nathanneike nathanneike commented May 19, 2026

Types of changes

Refactor
Performance improvement
Tests

Motivation and context / Related issue

This refactors the network simplex storage used by lazy EMD so that lazy mode avoids materializing dense per-arc storage where possible.

The lazy path now uses explicit storage/accessor modes for costs, flows, endpoints, and arc states. Real transport arc costs are computed on demand from sample coordinates, while artificial simplex root arc costs are still stored explicitly. Lazy EMD also returns sparse transport output instead of allocating a dense transport matrix internally. Dense and sparse EMD behavior are unchanged.

This substantially reduces memory usage for lazy EMD. For larger problems, it can also improve runtime: although lazy mode recomputes real arc costs, it avoids materializing and repeatedly accessing a multi-GiB dense cost matrix, reducing memory-bandwidth pressure.

n method time (s) peak memory (MiB) extra memory during solve (MiB)
100 dense 0.002 699.2 0.0
100 lazy 0.002 698.7 0.0
300 dense 0.007 698.5 0.0
300 lazy 0.014 698.4 0.0
1000 dense 0.102 734.7 35.8
1000 lazy 0.205 698.0 0.0
3000 dense 1.077 1048.4 349.3
3000 lazy 2.197 698.8 0.0
10000 dense 15.682 4502.4 3803.7
10000 lazy 30.598 722.9 23.8
15000 dense 39.547 6695.0 5996.3
15000 lazy 72.819 754.5 55.7
20000 dense 175.729 6945.5 6245.9
20000 lazy 134.256 797.3 99.5
25000 dense 370.609 7098.5 6400.4
25000 lazy 241.820 853.9 155.4

Peak memory includes the Python/import baseline. The “extra memory during solve” column better isolates solver-side growth. These timings use the default squared Euclidean distance in 2D. The runtime crossover depends on the distance metric and feature dimension: lazy mode recomputes costs on demand, so it benefits most when avoiding dense memory traffic is more expensive than recomputing the distance. In this local benchmark, the crossover happens between n=15000 and n=20000.

How has this been tested (if it applies)

Rebuilt the C++/Cython extension and ran the whole test suite

PR checklist

  • I have read the CONTRIBUTING document.
  • The documentation is up-to-date with the changes I made (check build artifacts).
  • All tests passed, and additional code has been covered with new tests.
  • I have added the PR and Issue fix to the RELEASES.md file.

@github-actions github-actions Bot added the CI label May 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.85%. Comparing base (19db3ee) to head (523a0d6).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #813   +/-   ##
=======================================
  Coverage   96.85%   96.85%           
=======================================
  Files         115      115           
  Lines       23344    23373   +29     
=======================================
+ Hits        22610    22639   +29     
  Misses        734      734           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant