feat(logosdb): add LogosDB vector database integration by jose-compu · Pull Request #782 · zilliztech/VectorDBBench

jose-compu · 2026-05-17T13:20:54Z

Summary

Adds LogosDB as a supported vector database backend — a fast, embedded HNSW vector store written in C/C++ with Python bindings, backed by memory-mapped binary storage and hnswlib.
Implements the full VectorDB interface: __init__, init context manager, insert_embeddings (via put_batch), search_embedding, and optimize.
Registers DB.LogosDB in the enum and wires up init_cls, config_cls, and case_config_cls.
Adds the logosdb CLI subcommand with a --uri flag (local directory path).
Adds logosdb as an optional extra in pyproject.toml.

Design notes

LogosDB is an embedded (single-process, file-based) database — no server required. The DB directory is passed via --uri.
Distance metric is derived from the case MetricType at runtime (COSINE / L2 / IP). COSINE is the default and auto-normalizes vectors.
Benchmark metadata IDs are stored as the text field (str(id)) and parsed back on search, since LogosDB's internal row IDs are independent of the benchmark ID space.
HNSW index is built incrementally on insert; optimize() is a no-op with a log message.

Benchmark result

Tested on Performance1536D50K (OpenAI embeddings, 50K vectors, 1536 dim, COSINE) on Apple M-series:

Metric	Value
Load duration	340 s
Serial latency p99	4.6 ms
Serial latency p95	4.0 ms
Recall@100	0.9347
NDCG	0.9464

Test plan

pip install logosdb (binary wheels for Linux x86_64/aarch64 and macOS x86_64/arm64, CPython 3.9-3.13)
vectordbbench logosdb --uri /tmp/vdbbench_logosdb --case-type Performance1536D50K --skip-search-concurrent
Verify recall, latency, and result JSON written to vectordb_bench/results/LogosDB/

- Add LogosDB embedded HNSW client (local file-based, mmap, hnswlib) - Config: LogosDBConfig (uri path) + LogosDBIndexConfig (metric type) - Supports COSINE, L2, and IP distance metrics - Uses put_batch for efficient bulk insert; metadata IDs stored as text - Register DB.LogosDB enum, init_cls, config_cls, case_config_cls - Register 'logosdb' CLI command in vectordbbench - Add logosdb optional extra in pyproject.toml Benchmark result (50K OpenAI 1536-dim, COSINE): recall@100=0.9347 ndcg=0.9464 p99=4.6ms p95=4.0ms

sre-ci-robot · 2026-05-17T13:21:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jose-compu
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jose-compu · 2026-05-20T19:00:16Z

can you please review @sre-ci-robot @jkatz @javiervegas @claude ?

XuanYang-cn

I found blockers in the current LogosDB integration. CI is red on the changed files, and the default command still enables a concurrent-search mode that conflicts with LogosDB's single-process database-path constraint.

XuanYang-cn · 2026-05-29T09:15:26Z

+
+@cli.command()
+@click_parameter_decorators_from_typed_dict(LogosDBTypedDict)
+def LogosDB(**parameters: Unpack[LogosDBTypedDict]):


must-change: LogosDB inherits search_concurrent=True from CommonTypedDict, but LogosDB documents one DB directory as single-process while VDBBench concurrent search starts multiple ProcessPoolExecutor workers against the same --uri. The default command can fail or report invalid concurrent-search results after loading. Set parameters["search_concurrent"] = False or reject --search-concurrent for LogosDB until a supported single-process concurrent runner exists.

Thanks for catching this. Fixed in the latest commit by hard-setting parameters["search_concurrent"] = False in the CLI handler.

Quick note: I did test multi-process concurrent reads empirically (4 Pool workers opening the same DB path and running 50 searches each) and all succeeded without errors (LogosDB's memory-mapped storage appears safe for concurrent readers). That said, since the official docs declare it single-process, disabling concurrent search is the right conservative call for now. Can revisit if/when LogosDB formally documents multi-reader support.

Fixed here: b932872

XuanYang-cn · 2026-05-29T09:15:26Z

+        self.uri = db_config["uri"]
+        self.db = None
+
+        if drop_old and os.path.exists(self.uri):


must-change: this os.path.exists() call fails the repo ruff gate with PTH110, so all PR test jobs are red before unit tests run. Replace it with Path(self.uri).exists() and update the imports.

ok thanks a lot, fixed here: 01e0fd0

XuanYang-cn · 2026-05-29T09:15:26Z

 from ..backend.clients.endee.cli import Endee
 from ..backend.clients.hologres.cli import HologresHGraph
 from ..backend.clients.lancedb.cli import LanceDB
+from ..backend.clients.logosdb.cli import LogosDB


must-change: this import placement fails ruff I001, so CI stays red after adding the command. Run ruff check --fix vectordb_bench/cli/vectordbbench.py or move the import to the order ruff expects.

fixed here, possibly: 812eea6

…10 ruff rule Co-authored-by: Cursor <cursoragent@cursor.com>

… ruff I001 Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

jose-compu · 2026-05-29T15:16:18Z

@XuanYang-cn the benchmark is running now locally, can you try CI again please?

jose-compu · 2026-05-29T15:23:07Z

New results:

Processed from run 8bfdce6fe0a64bd8a43011bdbeb8c298 (2026-05-29). Serial search only — concurrent search was disabled for LogosDB.

Test configuration

Parameter	Value
Database	LogosDB
Case	Search Performance Test (50K Dataset, 1536 Dim)
Dataset	OpenAI-SMALL-50K
Vectors	50,000
Dimensions	1,536
Distance	Cosine
Top-K	100
Stages	drop_old → load → search_serial
DB path	`/tmp/vectordbbench_logosdb`
Run ID	`8bfdce6fe0a64bd8a43011bdbeb8c298`

Results

Metric	Value
Status	Success
Load duration	342.8 s (~5.7 min)
Insert duration	341.9 s
Optimize duration	0.9 s
Vectors loaded	50,000
Load throughput	~146 vectors/s
Recall@100	0.9347 (93.47%)
NDCG@100	0.9464 (94.64%)
Search queries	1,000
Search wall time	3.25 s
Serial QPS (derived)	~308
Mean latency	3.2 ms
P95 latency	4.0 ms
P99 latency	4.2 ms
Concurrent QPS	N/A (disabled)

Notes

Summary qps=0.0 is expected: that field is for the concurrent-search stage, which LogosDB skips (search_concurrent=False).
Serial QPS ≈ 1000 queries ÷ 3.25 s ≈ 308 QPS.
Recall 93.5% and sub-5 ms P99 latency are solid for a 50K × 1536-dim cosine workload on embedded LogosDB.

XuanYang-cn requested changes May 29, 2026

View reviewed changes

jose-compu and others added 3 commits May 29, 2026 16:49

fix(logosdb): replace os.path.exists with Path.exists to satisfy PTH1…

01e0fd0

…10 ruff rule Co-authored-by: Cursor <cursoragent@cursor.com>

fix(logosdb): move import to correct alphabetical position to satisfy…

812eea6

… ruff I001 Co-authored-by: Cursor <cursoragent@cursor.com>

fix(logosdb): disable concurrent search (single-process embedded DB)

b932872

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(logosdb): add LogosDB vector database integration#782

feat(logosdb): add LogosDB vector database integration#782
jose-compu wants to merge 4 commits into
zilliztech:mainfrom
jose-compu:feat/logosdb-integration

jose-compu commented May 17, 2026

Uh oh!

sre-ci-robot commented May 17, 2026

Uh oh!

jose-compu commented May 20, 2026

Uh oh!

XuanYang-cn left a comment

Uh oh!

XuanYang-cn May 29, 2026

Uh oh!

jose-compu May 29, 2026

Uh oh!

XuanYang-cn May 29, 2026

Uh oh!

jose-compu May 29, 2026

Uh oh!

XuanYang-cn May 29, 2026

Uh oh!

jose-compu May 29, 2026

Uh oh!

jose-compu commented May 29, 2026

Uh oh!

jose-compu commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jose-compu commented May 17, 2026

Summary

Design notes

Benchmark result

Test plan

Uh oh!

sre-ci-robot commented May 17, 2026

Uh oh!

jose-compu commented May 20, 2026

Uh oh!

XuanYang-cn left a comment

Choose a reason for hiding this comment

Uh oh!

XuanYang-cn May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jose-compu May 29, 2026

Choose a reason for hiding this comment

Uh oh!

XuanYang-cn May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jose-compu May 29, 2026

Choose a reason for hiding this comment

Uh oh!

XuanYang-cn May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jose-compu May 29, 2026

Choose a reason for hiding this comment

Uh oh!

jose-compu commented May 29, 2026

Uh oh!

jose-compu commented May 29, 2026

New results:

Test configuration

Results

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants