CASSANDRA-20086: Use score ordered iterators for ANN search by michaeljmarshall · Pull Request #4615 · apache/cassandra

michaeljmarshall · 2026-02-13T21:16:27Z

(cherry picked from commit f2e3411)

This PR implements the feature proposed in https://issues.apache.org/jira/browse/CASSANDRA-20086.

Motivation

Current ANN search does the following:

Get top k vectors from each index segment (memory or sstable)
Sort them in Primary Key order
Consume and materialize all of of them (LIMIT * numSegments)
Re-query with higher LIMIT if shadow keys are encountered (updates/overwrites)
Re-score to sort by similarity score
Take LIMIT best vectors
Sort by Primary Key

This flow is both inefficient due to multiple sorts/requeries and can produce invalid results in cases where a row has an update to a vector to a worse score.

The new flow:

Get top k vectors from each index segment (memory or sstable)
Build score ordered iterators
Merge those iterators, sorting by score descending
Consume, materialize, and validate rows from the global iterator until we have LIMIT rows
If an iterator is exhausted during step 4, requery the exhausted iterator's graph to get more vectors, using a bitset to ignore already visited nodes, until the graph is exhausted.
Sort LIMIT rows by primary key.

This change removes two sort operations. It also fixes all update cases by asserting that a row's position within the score ordered iterator is only valid if the vector cell's materialized value is from the same sstable.

Other minor changes

Increased test coverage by parameterizing the logic to test brute force, graph search, and "graph's choice".
Implemented custom brute force iterator that utilizes two priority queues to do the PQ based ranking then the full resolution ranking
Fixed a bug in the mapping of limit to topk where we could get topk < limit for limit 1000

I plan on completing some basic bench marks next week to demonstrate the value of this change. For now, I am opening this to start to get some reviews.

(cherry picked from commit f2e3411)

michaeljmarshall added 2 commits February 13, 2026 15:01

CASSANDRA-20086: Use score ordered iterators for ANN search

059477b

(cherry picked from commit f2e3411)

Address interface changes to Cell and Row

fbb96e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-20086: Use score ordered iterators for ANN search#4615

CASSANDRA-20086: Use score ordered iterators for ANN search#4615
michaeljmarshall wants to merge 2 commits intoapache:trunkfrom
michaeljmarshall:cassandra-20086-trunk

michaeljmarshall commented Feb 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michaeljmarshall commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Other minor changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

michaeljmarshall commented Feb 13, 2026 •

edited

Loading