Skip to content

fix: real benchmarks, Cypher multi-row fix, and honest README#293

Open
aepod wants to merge 6 commits intoruvnet:mainfrom
weave-logic-ai:fix/real-benchmarks
Open

fix: real benchmarks, Cypher multi-row fix, and honest README#293
aepod wants to merge 6 commits intoruvnet:mainfrom
weave-logic-ai:fix/real-benchmarks

Conversation

@aepod
Copy link

@aepod aepod commented Mar 24, 2026

Summary

Cypher Fix (Issue #269)

The execute_match() function collapsed all match results into a single HashMap — each context.bind() call overwrote the previous value. Three matches → only the last survived.

Fix: Implemented proper ResultSet pipeline (Vec<ExecutionContext>) threaded through all statement executors. Cross-product expansion for multiple patterns in one MATCH.

Test results: 30 passed, 0 failed, 2 ignored

Key tests:

  • test_match_returns_multiple_rows — 3 nodes → 3 rows (was returning 1)
  • test_match_return_properties — Alice + Bob both returned correctly
  • test_match_where_filter — WHERE correctly filters multi-row results
  • test_match_many_nodes — 100 nodes → 100 rows

Also fixed column name generation for property expressions (n.name instead of ?column?).

Real Benchmark Results

All measurements are real — recall measured against brute-force ground truth, no simulated competitors.

10,000 Vectors (128d, M=32, ef=200)

Engine QPS Recall@10 Build (s) p50 (ms)
hnswlib (C++) 1152.6 0.9895 7.5 0.73
ruvector-core 443.1 0.9830 44.0 1.98
numpy brute-force 134.8 1.0000 0.003 3.26

100,000 Vectors (128d, M=32, ef=200)

Engine QPS Recall@10 Build (s) p50 (ms) Memory
hnswlib (C++) 249.5 0.7427 395.3 2.57
ruvector-core 85.7 0.8675 855.6 10.14 ~523MB
numpy brute-force 69.2 1.0000 0.016 10.20 48.8MB

Key Findings

  • Recall is competitive: 98.3% at 10K (within 0.65% of hnswlib), 86.75% at 100K (higher than hnswlib's 74.27%)
  • QPS is 2.6-2.9x slower than hnswlib — expected since hnswlib has SIMD optimizations
  • Build time is 2.2-5.9x slower — the largest gap, narrows at scale
  • Previous benchmarks had hardcoded recall=1.0 and memory=0.0 — now corrected

README Corrections

  • Replaced "100% recall at every configuration" with actual measured recall ranges
  • Replaced "15.7x faster than Python" (simulated) with real hnswlib comparison
  • Added independent benchmark section with link to full methodology

Test plan

  • All 30 Cypher tests pass (0 failures, 2 ignored)
  • Multi-row MATCH returns correct number of rows
  • Property expressions generate correct column names
  • WHERE filtering works on multi-row results
  • No regressions in existing test suite
  • Benchmarks reproducible with included scripts and raw JSON data

🤖 Generated with claude-flow

aepod and others added 6 commits March 24, 2026 12:34
The execute_match() function previously collapsed all match results into
a single ExecutionContext via context.bind(), which overwrote previous
bindings. MATCH (n:Person) on 3 Person nodes returned only 1 row.

This commit refactors the executor to use a ResultSet pipeline:
- type ResultSet = Vec<ExecutionContext>
- Each clause transforms ResultSet → ResultSet
- execute_match() expands the set (one context per match)
- execute_return() projects one row per context
- execute_set/delete() apply to all contexts
- Cross-product semantics for multiple patterns in one MATCH

Also adds comprehensive tests:
- test_match_returns_multiple_rows (the Issue ruvnet#269 regression)
- test_match_return_properties (verify correct values per row)
- test_match_where_filter (WHERE correctly filters multi-row)
- test_match_single_result (1 match → 1 row, no regression)
- test_match_no_results (0 matches → 0 rows)
- test_match_many_nodes (100 nodes → 100 rows, stress test)

Co-Authored-By: claude-flow <ruv@ruv.net>
RETURN n.name now produces column "n.name" instead of "?column?".
Property expressions (Expression::Property) are formatted as
"object.property" for column naming, matching standard Cypher behavior.

Co-Authored-By: claude-flow <ruv@ruv.net>
  Built from commit b2347ce

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
  Built from commit 2adb949

  Platforms updated:
  - linux-x64-gnu
  - linux-arm64-gnu
  - darwin-x64
  - darwin-arm64
  - win32-x64-msvc

  🤖 Generated by GitHub Actions
Phase 2 of the ruvector remediation plan. Replaces simulated benchmarks
with real measurements:

- Python harness: hnswlib (C++) and numpy brute-force on same datasets
- Rust test: ruvector-core HNSW with ground-truth recall measurement
- Datasets: random-10K and random-100K, 128 dimensions
- Metrics: QPS (p50/p95), recall@10 vs ground truth, memory, build time

Key findings:
- ruvector recall@10 is good: 98.3% (10K), 86.75% (100K)
- ruvector QPS is 2.6-2.9x slower than hnswlib
- ruvector build time is 2.2-5.9x slower than hnswlib
- ruvector uses ~523MB for 100K vectors (10x raw data size)
- All numbers are REAL — no hardcoded values, no simulation

Co-Authored-By: claude-flow <ruv@ruv.net>
- Add independent benchmark report comparing ruvector-core vs hnswlib (C++) vs numpy brute-force
- 10K vectors: 443 QPS / 98.3% recall (vs hnswlib 1153 QPS / 98.95% recall)
- 100K vectors: 86 QPS / 86.75% recall (vs hnswlib 250 QPS / 74.27% recall)
- Fix README "100% recall" claim — actual recall is 86.75-98.3% depending on scale
- Fix "simulated Python baseline" — now compared against real hnswlib competitor
- Include raw JSON data and full methodology documentation

Co-Authored-By: claude-flow <ruv@ruv.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant