Skip to content

feat(logosdb): add LogosDB vector database integration#782

Open
jose-compu wants to merge 4 commits into
zilliztech:mainfrom
jose-compu:feat/logosdb-integration
Open

feat(logosdb): add LogosDB vector database integration#782
jose-compu wants to merge 4 commits into
zilliztech:mainfrom
jose-compu:feat/logosdb-integration

Conversation

@jose-compu
Copy link
Copy Markdown

Summary

  • Adds LogosDB as a supported vector database backend — a fast, embedded HNSW vector store written in C/C++ with Python bindings, backed by memory-mapped binary storage and hnswlib.
  • Implements the full VectorDB interface: __init__, init context manager, insert_embeddings (via put_batch), search_embedding, and optimize.
  • Registers DB.LogosDB in the enum and wires up init_cls, config_cls, and case_config_cls.
  • Adds the logosdb CLI subcommand with a --uri flag (local directory path).
  • Adds logosdb as an optional extra in pyproject.toml.

Design notes

  • LogosDB is an embedded (single-process, file-based) database — no server required. The DB directory is passed via --uri.
  • Distance metric is derived from the case MetricType at runtime (COSINE / L2 / IP). COSINE is the default and auto-normalizes vectors.
  • Benchmark metadata IDs are stored as the text field (str(id)) and parsed back on search, since LogosDB's internal row IDs are independent of the benchmark ID space.
  • HNSW index is built incrementally on insert; optimize() is a no-op with a log message.

Benchmark result

Tested on Performance1536D50K (OpenAI embeddings, 50K vectors, 1536 dim, COSINE) on Apple M-series:

Metric Value
Load duration 340 s
Serial latency p99 4.6 ms
Serial latency p95 4.0 ms
Recall@100 0.9347
NDCG 0.9464

Test plan

  • pip install logosdb (binary wheels for Linux x86_64/aarch64 and macOS x86_64/arm64, CPython 3.9-3.13)
  • vectordbbench logosdb --uri /tmp/vdbbench_logosdb --case-type Performance1536D50K --skip-search-concurrent
  • Verify recall, latency, and result JSON written to vectordb_bench/results/LogosDB/

- Add LogosDB embedded HNSW client (local file-based, mmap, hnswlib)
- Config: LogosDBConfig (uri path) + LogosDBIndexConfig (metric type)
- Supports COSINE, L2, and IP distance metrics
- Uses put_batch for efficient bulk insert; metadata IDs stored as text
- Register DB.LogosDB enum, init_cls, config_cls, case_config_cls
- Register 'logosdb' CLI command in vectordbbench
- Add logosdb optional extra in pyproject.toml

Benchmark result (50K OpenAI 1536-dim, COSINE):
  recall@100=0.9347  ndcg=0.9464  p99=4.6ms  p95=4.0ms
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jose-compu
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jose-compu
Copy link
Copy Markdown
Author

can you please review @sre-ci-robot @jkatz @javiervegas @claude ?

Copy link
Copy Markdown
Collaborator

@XuanYang-cn XuanYang-cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found blockers in the current LogosDB integration. CI is red on the changed files, and the default command still enables a concurrent-search mode that conflicts with LogosDB's single-process database-path constraint.


@cli.command()
@click_parameter_decorators_from_typed_dict(LogosDBTypedDict)
def LogosDB(**parameters: Unpack[LogosDBTypedDict]):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must-change: LogosDB inherits search_concurrent=True from CommonTypedDict, but LogosDB documents one DB directory as single-process while VDBBench concurrent search starts multiple ProcessPoolExecutor workers against the same --uri. The default command can fail or report invalid concurrent-search results after loading. Set parameters["search_concurrent"] = False or reject --search-concurrent for LogosDB until a supported single-process concurrent runner exists.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Fixed in the latest commit by hard-setting parameters["search_concurrent"] = False in the CLI handler.

Quick note: I did test multi-process concurrent reads empirically (4 Pool workers opening the same DB path and running 50 searches each) and all succeeded without errors (LogosDB's memory-mapped storage appears safe for concurrent readers). That said, since the official docs declare it single-process, disabling concurrent search is the right conservative call for now. Can revisit if/when LogosDB formally documents multi-reader support.

Fixed here: b932872

self.uri = db_config["uri"]
self.db = None

if drop_old and os.path.exists(self.uri):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must-change: this os.path.exists() call fails the repo ruff gate with PTH110, so all PR test jobs are red before unit tests run. Replace it with Path(self.uri).exists() and update the imports.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks a lot, fixed here: 01e0fd0

Comment thread vectordb_bench/cli/vectordbbench.py Outdated
from ..backend.clients.endee.cli import Endee
from ..backend.clients.hologres.cli import HologresHGraph
from ..backend.clients.lancedb.cli import LanceDB
from ..backend.clients.logosdb.cli import LogosDB
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must-change: this import placement fails ruff I001, so CI stays red after adding the command. Run ruff check --fix vectordb_bench/cli/vectordbbench.py or move the import to the order ruff expects.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here, possibly: 812eea6

jose-compu and others added 3 commits May 29, 2026 16:49
…10 ruff rule

Co-authored-by: Cursor <cursoragent@cursor.com>
… ruff I001

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@jose-compu
Copy link
Copy Markdown
Author

@XuanYang-cn the benchmark is running now locally, can you try CI again please?

@jose-compu
Copy link
Copy Markdown
Author

New results:

Processed from run 8bfdce6fe0a64bd8a43011bdbeb8c298 (2026-05-29). Serial search only — concurrent search was disabled for LogosDB.

Test configuration

Parameter Value
Database LogosDB
Case Search Performance Test (50K Dataset, 1536 Dim)
Dataset OpenAI-SMALL-50K
Vectors 50,000
Dimensions 1,536
Distance Cosine
Top-K 100
Stages drop_old → load → search_serial
DB path /tmp/vectordbbench_logosdb
Run ID 8bfdce6fe0a64bd8a43011bdbeb8c298

Results

Metric Value
Status Success
Load duration 342.8 s (~5.7 min)
Insert duration 341.9 s
Optimize duration 0.9 s
Vectors loaded 50,000
Load throughput ~146 vectors/s
Recall@100 0.9347 (93.47%)
NDCG@100 0.9464 (94.64%)
Search queries 1,000
Search wall time 3.25 s
Serial QPS (derived) ~308
Mean latency 3.2 ms
P95 latency 4.0 ms
P99 latency 4.2 ms
Concurrent QPS N/A (disabled)

Notes

  • Summary qps=0.0 is expected: that field is for the concurrent-search stage, which LogosDB skips (search_concurrent=False).
  • Serial QPS ≈ 1000 queries ÷ 3.25 s ≈ 308 QPS.
  • Recall 93.5% and sub-5 ms P99 latency are solid for a 50K × 1536-dim cosine workload on embedded LogosDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants