Skip to content

Add VecSimIndex_RelabelVector to re-key a vector without re-insertion#978

Draft
ofiryanai wants to merge 4 commits into
8.4from
ofir-relabel-vector
Draft

Add VecSimIndex_RelabelVector to re-key a vector without re-insertion#978
ofiryanai wants to merge 4 commits into
8.4from
ofir-relabel-vector

Conversation

@ofiryanai
Copy link
Copy Markdown
Collaborator

What

Adds a public C API VecSimIndex_RelabelVector(index, old_label, new_label) that changes a stored vector's external label in place, without re-inserting the vector or modifying the graph topology.

HNSW neighbor links reference internal ids, which a relabel leaves unchanged, so only the labelinternal-id mapping and idToMetaData[id].label are updated — O(1) per stored vector, zero graph edges touched.

Why

Enables a caller to re-key an unchanged vector cheaply instead of DeleteVector + AddVector. Motivating case (RediSearch partial updates): on an HSET that doesn't modify the vector fields, RediSearch still assigns a new internal doc id, which today forces a delete + re-insert of every vector into HNSW — churning the graph (ghost/marked-deleted nodes, repair jobs) even though the vector is identical. With relabel, the existing graph node is simply re-pointed to the new label. (This is the // TODO: use VecSimReplace noted in RediSearch src/indexer.c.)

How

  • vec_sim_interface.h — non-pure virtual relabelVector with a VECSIM_RELABEL_NOT_SUPPORTED default, so index types without an implementation (e.g. SVS) are unaffected and callers can fall back to delete + add.
  • BruteForce single/multi — O(1) label-map relabel. No internal lock; the caller provides mutual exclusion (matching addVector/deleteVector).
  • HNSW single/multirelabelVectorUnsafe updates labelLookup + idToMetaData[id].label; relabelVector wraps it under an exclusive indexDataGuard.
  • Tiered HNSW — under flatIndexGuard + mainIndexGuard (canonical order), relabels the flat-buffer copy, rewrites any pending HNSWInsertJob.label and rekeys labelToInsertJobs (so a not-yet-ingested vector is later ingested under the new label), then relabels the HNSW backend via the lock-free variant (avoids a recursive indexDataGuard lock).

Return: 1 relabeled, 0 old_label not found, -1 not supported. Caller must guarantee new_label does not already exist.

Tests

tests/unit/test_hnsw_tiered.cpp:

  • RelabelVectorInBackend — relabel a vector already ingested into HNSW.
  • RelabelVectorPendingInFlat — relabel a vector still in the flat buffer with a pending insert job; verifies the buffered vector and the pending job are re-keyed and the later ingestion lands under the new label.

Both run across single/multi and float/double.

Validation note

The relabel method bodies were type-checked locally via explicit template instantiation (clean). The full gtest suite was not run locally — VecSim's tiered factory hard-includes the SVS headers, and the SVS dependency chain (LVQ / fmt) doesn't configure on macOS ARM. Please rely on CI (Linux) to build and run the new tests.

🤖 Generated with Claude Code

Adds a public C API VecSimIndex_RelabelVector(index, old_label, new_label) that
changes a stored vector's external label in place, without touching the graph
topology. HNSW neighbor edges reference internal ids, which a relabel leaves
unchanged, so only the label<->internal-id mapping and idToMetaData[id].label are
updated (O(1) per stored vector). This lets a caller re-key an unchanged vector
(e.g. a search module that assigns a new doc id on update but whose vector value
did not change) and avoid the delete + re-insert HNSW graph churn.

- vec_sim_interface.h: non-pure virtual relabelVector with a NOT_SUPPORTED
  default, so index types without an implementation (e.g. SVS) are unaffected.
- BruteForce single/multi: O(1) label-map relabel (no internal lock; the caller
  provides mutual exclusion, matching addVector/deleteVector).
- HNSW single/multi: relabelVectorUnsafe updates labelLookup + idToMetaData;
  relabelVector wraps it under an exclusive indexDataGuard.
- Tiered HNSW: under flatIndexGuard + mainIndexGuard, relabels the flat copy,
  rewrites any pending HNSWInsertJob.label and rekeys labelToInsertJobs (so a
  not-yet-ingested vector is ingested under the new label), then relabels the
  HNSW backend via the lock-free variant (no recursive lock).
- Tests: RelabelVectorInBackend and RelabelVectorPendingInFlat (tiered, covering
  single/multi and float/double).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented Jun 7, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

ofiryanai and others added 3 commits June 7, 2026 13:09
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RelabelVectorInBackend / RelabelVectorPendingInFlat tests access the tiered
index internals (frontendIndex, backendIndex, labelToInsertJobs), like the other
HNSWTieredIndexTest cases. gtest friendship is not inherited, so each generated
test class must be friend-declared explicitly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers the standalone relabelVector path (the locking wrapper for HNSW; the
no-lock BF path) and the edge cases not exercised by the tiered tests: relabel
of a missing label is a no-op (returns 0), and relabel onto an already-existing
label is refused (returns 0) leaving the vector unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant