Skip to content

perf: O(n) spatial indexing, vectorized planners, sparse storage#3

Open
OldCrow wants to merge 1 commit intotjards:masterfrom
OldCrow:master
Open

perf: O(n) spatial indexing, vectorized planners, sparse storage#3
OldCrow wants to merge 1 commit intotjards:masterfrom
OldCrow:master

Conversation

@OldCrow
Copy link
Copy Markdown

@OldCrow OldCrow commented Apr 14, 2026

Summary

Reduces per-timestep complexity from O(n²) to O(n log n) for neighbor discovery and O(n) for bounded-density interactions, enabling simulations at 500–1000+ agents where the original was limited to ~100.

All upstream functionality is preserved. Optimizations activate automatically; set "use_optimized": false in config/config.json to revert to the original code paths. No new external dependencies.

Changes

Performance:

  • Spatial indexing (utils/spatial.py): KD-tree neighbor discovery replaces O(n²) brute-force scans
  • Sparse graph (utils/swarmgraph.py): scipy.sparse.csgraph for components; node-degree connectivity replaces exponential path enumeration; sparse adjacency with lazy dense cache
  • Vectorized planners: all 8 strategies optimized — saber/reynolds/starling fully vectorized via batched NumPy; encirclement/lemniscates via broadcast; pinning/shepherding/malicious via pre-built neighbor lists (sequential learning logic preserved)
  • Sparse data storage (data/data_manager.py): n×n arrays stored as scipy.sparse per timestep
  • Vectorized order() metric and plotting computations

Bug fixes:

  • Heap fragmentation: np.zeros((3, n)) replaced with np.zeros(3) and np.identity(3) replaced with pre-allocated _I3 in all per-agent methods. Original caused ~185 MB/100 steps RSS growth at n=200 due to CPython pymalloc arena fragmentation.
  • Distance-from-target plot: position-only norm (0:3) corrects dimensionally incorrect 6D norm that included velocity
  • Escape sequence warnings eliminated across codebase
  • Tracked __pycache__/*.pyc files removed

Documentation:

  • OPTIMIZATION.md: problem statement, methodology, per-file changes, design decisions (why 3 planners aren't fully vectorized), heap fragmentation analysis, validated benchmarks, future work

Tests: 78 tests + 7 benchmark scripts

Validated Results

Metric Before After
Full sim n=100 (Tf=300s) 12.6 min 52.5s (14.4× speedup)
Full sim n=200 (Tf=300s) ~43.5 min 1.4 min (32× speedup)
Graph ops n=500 474ms 1.96ms (242× speedup)
Memory n=1000 (10s sim) 15.0 GB 76.6 MB (201× reduction)
Scaling n=50→1000 O(n^1.79) O(n^1.0–1.15)
RSS stability (n=200, 15K steps) +925 MB (OOM) +2.9 MB (flat)

Full benchmark data in OPTIMIZATION.md and reproducible via tests/benchmark_*.py.

Co-Authored-By: Oz oz-agent@warp.dev

Reduces per-timestep complexity from O(n^2) to O(n log n) for neighbor
discovery and O(n) for bounded-density interactions, enabling simulations
at 500-1000+ agents where the original was limited to ~100.

Performance changes:
- Spatial indexing (utils/spatial.py): KD-tree neighbor discovery
- Sparse graph (utils/swarmgraph.py): csr_matrix adjacency, scipy
  connected components, node-degree connectivity, lazy dense cache
- Vectorized planners: saber/reynolds/starling via batched NumPy;
  encirclement/lemniscates via broadcast; pinning/shepherding/malicious
  via pre-built neighbor lists (sequential logic preserved)
- Sparse History storage: scipy.sparse per timestep, COO HDF5 format
- Vectorized metrics (order) and plotting computations
- Orchestrator: vectorized dispatch with use_optimized config toggle

Bug fixes:
- Heap fragmentation: np.zeros(3) replaces np.zeros((3,n)) and
  pre-allocated _I3 replaces per-call np.identity(3) in all planners.
  Original caused ~185 MB/100 steps RSS growth at n=200.
- Distance-from-target plot: position-only norm (0:3) corrects
  dimensionally incorrect 6D norm including velocity
- Escape sequence warnings eliminated across codebase
- Tracked __pycache__/*.pyc files removed

Documentation:
- OPTIMIZATION.md: problem statement, methodology, per-file changes,
  design decisions, heap fragmentation analysis, benchmarks, future work
- README.md: tests/ in project structure, use_optimized docs

Tests and benchmarks:
- 78 tests: spatial index, graph equivalence, planner vectorized-vs-
  scalar, HDF5 round-trip, plotting computations
- 7 benchmark scripts: graph ops, e2e sim, memory, scaling, extended,
  diverse agents, full 300s simulation, memory diagnostics

Validated results:
- Full sim n=100: 52.5s (was 12.6 min, 14.4x speedup)
- Full sim n=200: 1.4 min (32x speedup)
- Scaling n=50-1000: O(n^1.0-1.15), memory stable
- Graph ops n=500: 242x speedup
- Memory n=1000: 201x reduction (76.6 MB vs 15 GB)

No new external dependencies (uses scipy already in requirements.txt).

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant