Add VecSimIndex_RelabelVector to re-key a vector without re-insertion#978
Draft
ofiryanai wants to merge 4 commits into
Draft
Add VecSimIndex_RelabelVector to re-key a vector without re-insertion#978ofiryanai wants to merge 4 commits into
ofiryanai wants to merge 4 commits into
Conversation
Adds a public C API VecSimIndex_RelabelVector(index, old_label, new_label) that changes a stored vector's external label in place, without touching the graph topology. HNSW neighbor edges reference internal ids, which a relabel leaves unchanged, so only the label<->internal-id mapping and idToMetaData[id].label are updated (O(1) per stored vector). This lets a caller re-key an unchanged vector (e.g. a search module that assigns a new doc id on update but whose vector value did not change) and avoid the delete + re-insert HNSW graph churn. - vec_sim_interface.h: non-pure virtual relabelVector with a NOT_SUPPORTED default, so index types without an implementation (e.g. SVS) are unaffected. - BruteForce single/multi: O(1) label-map relabel (no internal lock; the caller provides mutual exclusion, matching addVector/deleteVector). - HNSW single/multi: relabelVectorUnsafe updates labelLookup + idToMetaData; relabelVector wraps it under an exclusive indexDataGuard. - Tiered HNSW: under flatIndexGuard + mainIndexGuard, relabels the flat copy, rewrites any pending HNSWInsertJob.label and rekeys labelToInsertJobs (so a not-yet-ingested vector is ingested under the new label), then relabels the HNSW backend via the lock-free variant (no recursive lock). - Tests: RelabelVectorInBackend and RelabelVectorPendingInFlat (tiered, covering single/multi and float/double). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
2 tasks
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RelabelVectorInBackend / RelabelVectorPendingInFlat tests access the tiered index internals (frontendIndex, backendIndex, labelToInsertJobs), like the other HNSWTieredIndexTest cases. gtest friendship is not inherited, so each generated test class must be friend-declared explicitly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers the standalone relabelVector path (the locking wrapper for HNSW; the no-lock BF path) and the edge cases not exercised by the tiered tests: relabel of a missing label is a no-op (returns 0), and relabel onto an already-existing label is refused (returns 0) leaving the vector unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a public C API
VecSimIndex_RelabelVector(index, old_label, new_label)that changes a stored vector's external label in place, without re-inserting the vector or modifying the graph topology.HNSW neighbor links reference internal ids, which a relabel leaves unchanged, so only the
label↔internal-idmapping andidToMetaData[id].labelare updated — O(1) per stored vector, zero graph edges touched.Why
Enables a caller to re-key an unchanged vector cheaply instead of
DeleteVector+AddVector. Motivating case (RediSearch partial updates): on anHSETthat doesn't modify the vector fields, RediSearch still assigns a new internal doc id, which today forces a delete + re-insert of every vector into HNSW — churning the graph (ghost/marked-deleted nodes, repair jobs) even though the vector is identical. With relabel, the existing graph node is simply re-pointed to the new label. (This is the// TODO: use VecSimReplacenoted in RediSearchsrc/indexer.c.)How
vec_sim_interface.h— non-purevirtual relabelVectorwith aVECSIM_RELABEL_NOT_SUPPORTEDdefault, so index types without an implementation (e.g. SVS) are unaffected and callers can fall back to delete + add.addVector/deleteVector).relabelVectorUnsafeupdateslabelLookup+idToMetaData[id].label;relabelVectorwraps it under an exclusiveindexDataGuard.flatIndexGuard+mainIndexGuard(canonical order), relabels the flat-buffer copy, rewrites any pendingHNSWInsertJob.labeland rekeyslabelToInsertJobs(so a not-yet-ingested vector is later ingested under the new label), then relabels the HNSW backend via the lock-free variant (avoids a recursiveindexDataGuardlock).Return:
1relabeled,0old_labelnot found,-1not supported. Caller must guaranteenew_labeldoes not already exist.Tests
tests/unit/test_hnsw_tiered.cpp:RelabelVectorInBackend— relabel a vector already ingested into HNSW.RelabelVectorPendingInFlat— relabel a vector still in the flat buffer with a pending insert job; verifies the buffered vector and the pending job are re-keyed and the later ingestion lands under the new label.Both run across single/multi and float/double.
Validation note
The relabel method bodies were type-checked locally via explicit template instantiation (clean). The full gtest suite was not run locally — VecSim's tiered factory hard-includes the SVS headers, and the SVS dependency chain (LVQ / fmt) doesn't configure on macOS ARM. Please rely on CI (Linux) to build and run the new tests.
🤖 Generated with Claude Code