fix: real benchmarks, Cypher multi-row fix, and honest README by aepod · Pull Request #293 · ruvnet/RuVector

aepod · 2026-03-24T14:37:26Z

Summary

Fix Cypher Issue CypherEngine: MATCH RETURN produces single row — context.bind() overwrites previous bindings #269: MATCH queries now return all matching rows instead of collapsing to 1
Real benchmarks: Independent comparison against hnswlib (C++) and numpy brute-force with ground-truth recall
Fix misleading README claims: "100% recall" and "simulated Python baseline" corrected with real numbers

Cypher Fix (Issue #269)

The execute_match() function collapsed all match results into a single HashMap — each context.bind() call overwrote the previous value. Three matches → only the last survived.

Fix: Implemented proper ResultSet pipeline (Vec<ExecutionContext>) threaded through all statement executors. Cross-product expansion for multiple patterns in one MATCH.

Test results: 30 passed, 0 failed, 2 ignored

Key tests:

test_match_returns_multiple_rows — 3 nodes → 3 rows (was returning 1)
test_match_return_properties — Alice + Bob both returned correctly
test_match_where_filter — WHERE correctly filters multi-row results
test_match_many_nodes — 100 nodes → 100 rows

Also fixed column name generation for property expressions (n.name instead of ?column?).

Real Benchmark Results

All measurements are real — recall measured against brute-force ground truth, no simulated competitors.

10,000 Vectors (128d, M=32, ef=200)

Engine	QPS	Recall@10	Build (s)	p50 (ms)
hnswlib (C++)	1152.6	0.9895	7.5	0.73
ruvector-core	443.1	0.9830	44.0	1.98
numpy brute-force	134.8	1.0000	0.003	3.26

100,000 Vectors (128d, M=32, ef=200)

Engine	QPS	Recall@10	Build (s)	p50 (ms)	Memory
hnswlib (C++)	249.5	0.7427	395.3	2.57	—
ruvector-core	85.7	0.8675	855.6	10.14	~523MB
numpy brute-force	69.2	1.0000	0.016	10.20	48.8MB

Key Findings

Recall is competitive: 98.3% at 10K (within 0.65% of hnswlib), 86.75% at 100K (higher than hnswlib's 74.27%)
QPS is 2.6-2.9x slower than hnswlib — expected since hnswlib has SIMD optimizations
Build time is 2.2-5.9x slower — the largest gap, narrows at scale
Previous benchmarks had hardcoded recall=1.0 and memory=0.0 — now corrected

README Corrections

Replaced "100% recall at every configuration" with actual measured recall ranges
Replaced "15.7x faster than Python" (simulated) with real hnswlib comparison
Added independent benchmark section with link to full methodology

Test plan

All 30 Cypher tests pass (0 failures, 2 ignored)
Multi-row MATCH returns correct number of rows
Property expressions generate correct column names
WHERE filtering works on multi-row results
No regressions in existing test suite
Benchmarks reproducible with included scripts and raw JSON data

🤖 Generated with claude-flow

The execute_match() function previously collapsed all match results into a single ExecutionContext via context.bind(), which overwrote previous bindings. MATCH (n:Person) on 3 Person nodes returned only 1 row. This commit refactors the executor to use a ResultSet pipeline: - type ResultSet = Vec<ExecutionContext> - Each clause transforms ResultSet → ResultSet - execute_match() expands the set (one context per match) - execute_return() projects one row per context - execute_set/delete() apply to all contexts - Cross-product semantics for multiple patterns in one MATCH Also adds comprehensive tests: - test_match_returns_multiple_rows (the Issue ruvnet#269 regression) - test_match_return_properties (verify correct values per row) - test_match_where_filter (WHERE correctly filters multi-row) - test_match_single_result (1 match → 1 row, no regression) - test_match_no_results (0 matches → 0 rows) - test_match_many_nodes (100 nodes → 100 rows, stress test) Co-Authored-By: claude-flow <ruv@ruv.net>

RETURN n.name now produces column "n.name" instead of "?column?". Property expressions (Expression::Property) are formatted as "object.property" for column naming, matching standard Cypher behavior. Co-Authored-By: claude-flow <ruv@ruv.net>

Built from commit b2347ce Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

Built from commit 2adb949 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

Phase 2 of the ruvector remediation plan. Replaces simulated benchmarks with real measurements: - Python harness: hnswlib (C++) and numpy brute-force on same datasets - Rust test: ruvector-core HNSW with ground-truth recall measurement - Datasets: random-10K and random-100K, 128 dimensions - Metrics: QPS (p50/p95), recall@10 vs ground truth, memory, build time Key findings: - ruvector recall@10 is good: 98.3% (10K), 86.75% (100K) - ruvector QPS is 2.6-2.9x slower than hnswlib - ruvector build time is 2.2-5.9x slower than hnswlib - ruvector uses ~523MB for 100K vectors (10x raw data size) - All numbers are REAL — no hardcoded values, no simulation Co-Authored-By: claude-flow <ruv@ruv.net>

- Add independent benchmark report comparing ruvector-core vs hnswlib (C++) vs numpy brute-force - 10K vectors: 443 QPS / 98.3% recall (vs hnswlib 1153 QPS / 98.95% recall) - 100K vectors: 86 QPS / 86.75% recall (vs hnswlib 250 QPS / 74.27% recall) - Fix README "100% recall" claim — actual recall is 86.75-98.3% depending on scale - Fix "simulated Python baseline" — now compared against real hnswlib competitor - Include raw JSON data and full methodology documentation Co-Authored-By: claude-flow <ruv@ruv.net>

aepod and others added 6 commits March 24, 2026 12:34

chore: Update NAPI-RS binaries for all platforms

c504a29

Built from commit b2347ce Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

chore: Update NAPI-RS binaries for all platforms

5156ceb

Built from commit 2adb949 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

aepod mentioned this pull request Mar 24, 2026

perf: HNSW optimization sweep — batch transactions, search I/O, SIMD, and more #294

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: real benchmarks, Cypher multi-row fix, and honest README#293

fix: real benchmarks, Cypher multi-row fix, and honest README#293
aepod wants to merge 6 commits intoruvnet:mainfrom
weave-logic-ai:fix/real-benchmarks

aepod commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aepod commented Mar 24, 2026

Summary

Cypher Fix (Issue #269)

Real Benchmark Results

10,000 Vectors (128d, M=32, ef=200)

100,000 Vectors (128d, M=32, ef=200)

Key Findings

README Corrections

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant