feat: add TurboQuant compressed vector backend by dudegladiator · Pull Request #182 · cocoindex-io/cocoindex-code

dudegladiator · 2026-06-07T10:10:29Z

What

Adds an optional TurboQuant compressed vector-search backend alongside the existing sqlite-vec path, selectable at ccc init:

ccc init --backend turbo-quant            # 4-bit (default)
ccc init --backend turbo-quant --tq-bits 2
ccc init --backend sqlite-vec             # explicit default (unchanged)

TurboQuant (Zandieh et al., 2025) is a data-oblivious quantizer: random rotation → per-coordinate Lloyd-Max scalar quantization → 1-bit QJL residual for an unbiased inner-product estimate. No training or calibration.

Motivation

sqlite-vec stores raw float32 and is exact + fast, but the on-disk index grows large on big codebases. TurboQuant compresses the index ~8× at 4-bit with recall@10 ≈ 0.9, for projects where index size matters more than the last bit of ranking precision.

Measured results

On a real 64.8k-chunk Go/TS repo (d=384, 4-bit):

	sqlite-vec	turbo-quant (4-bit)
on-disk index	7.2 MB	0.9 MB (~8×)
recall@10 vs exact	1.0	~0.92
warm query latency	~130 ms	~165 ms

Distortion bounds and estimator unbiasedness are unit-tested against the paper's Theorem 1 / Theorem 2.

Design

turbo_quant.py — rotation, Lloyd-Max codebooks (b=1..4), MSE quantizer, two-stage unbiased inner-product estimator, bit-packing. Pure NumPy, no app coupling.
tq_store.py — SQLite-backed compressed store with vectorized inner-product search; full filter parity (language/path/limit/offset) with the sqlite-vec path. Only a seed is persisted; rotation/QJL matrices are regenerated on load.
wiring — index-time quantization, query-time dispatch with a daemon-lifetime store cache (row-count invalidated), backend-agnostic index status, settings validation, --backend/--tq-bits CLI flags + interactive prompt.

Compatibility

Not a breaking change. sqlite-vec remains the default and its code path is untouched. The backend is recorded per-index; switching requires re-init + re-index.

Testing

New unit + e2e + benchmark coverage. All prek hooks pass (ruff, ruff-format, mypy on src/ and tests/, full pytest — 245 passed).

Data-oblivious vector quantizer: random rotation + per-coordinate Lloyd-Max codebooks (b=1..4), MSE quantizer, and an unbiased two-stage inner-product estimator (MSE + 1-bit QJL residual). Pure NumPy, no storage or app coupling. Verified against the paper's distortion bounds and estimator unbiasedness.

SQLite-backed compressed vector store: bit-packed rows, seed-reproducible matrices (only the seed is persisted), and a vectorized NumPy inner-product search with language/path/limit/offset filter parity. Batched bitpack decode keeps load cheap at scale.

Make TurboQuant selectable at `ccc init` via `--backend turbo-quant` (`--tq-bits`), alongside the default sqlite-vec path. Index-time quantization, query-time dispatch with a daemon-lifetime store cache, backend-agnostic index status, and settings validation. sqlite-vec remains the default and its path is unchanged.

Add type annotations to the new TurboQuant tests (required by the mypy pre-commit hook, which checks tests/ too), document the turbo-quant backend and `--backend` / `--tq-bits` flags in the README, and apply ruff-format normalizations.

dudegladiator-devrev · 2026-06-08T09:07:09Z

@georgeh0 @badmonster0, could you please review this new capability?

dudegladiator added 4 commits June 7, 2026 14:50

chore: annotate tests, document backends, format

3c202db

Add type annotations to the new TurboQuant tests (required by the mypy pre-commit hook, which checks tests/ too), document the turbo-quant backend and `--backend` / `--tq-bits` flags in the README, and apply ruff-format normalizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TurboQuant compressed vector backend#182

feat: add TurboQuant compressed vector backend#182
dudegladiator wants to merge 4 commits into
cocoindex-io:mainfrom
dudegladiator:feat/turboquant-vector-backend

dudegladiator commented Jun 7, 2026

Uh oh!

dudegladiator-devrev commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dudegladiator commented Jun 7, 2026

What

Motivation

Measured results

Design

Compatibility

Testing

Related

Uh oh!

dudegladiator-devrev commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants