Skip to content

Commit 804004e

Browse files
committed
docs: add vector similarity search and update DSN options in README
1 parent 1fb1cde commit 804004e

File tree

1 file changed

+74
-2
lines changed

1 file changed

+74
-2
lines changed

README.md

Lines changed: 74 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -201,10 +201,16 @@ db = Database.open("file:///path/to/mydata?sync=none&snapshot_interval=600")
201201
| `compression` | `on`, `off` | `on` | Enable WAL + snapshot compression (LZ4) |
202202
| `wal_compression` | `on`, `off` | `on` | WAL compression only |
203203
| `snapshot_compression` | `on`, `off` | `on` | Snapshot compression only |
204+
| `compression_threshold` | bytes | `64` | Minimum data size before compression |
204205
| `sync_interval_ms` | milliseconds | `10` | Background sync interval |
205206
| `wal_buffer_size` | bytes | `65536` | WAL write buffer size |
207+
| `wal_flush_trigger` | bytes | `32768` | WAL size before flush |
206208
| `wal_max_size` | bytes | `67108864` | WAL size before forced snapshot |
207209
| `commit_batch_size` | count | `100` | Commits batched before flush |
210+
| `cleanup` | `on`, `off` | `on` | Enable background cleanup thread |
211+
| `cleanup_interval` | seconds | `60` | Interval between cleanup runs |
212+
| `deleted_row_retention` | seconds | `300` | Keep deleted rows before permanent removal |
213+
| `transaction_retention` | seconds | `3600` | Keep stale transaction metadata |
208214

209215
## Type Mapping
210216

@@ -217,6 +223,71 @@ db = Database.open("file:///path/to/mydata?sync=none&snapshot_interval=600")
217223
| `None` | `NULL` | |
218224
| `datetime.datetime` | `TIMESTAMP` | Converted to/from UTC |
219225
| `dict` / `list` | `JSON` | Serialized via `json.dumps` |
226+
| `Vector` | `VECTOR(N)` | `list[float]` on output |
227+
228+
## Vector Similarity Search
229+
230+
Store embeddings and perform k-NN similarity search using HNSW indexes:
231+
232+
```python
233+
from stoolap import Database, Vector
234+
235+
db = Database.open(":memory:")
236+
237+
# Create a table with a VECTOR column
238+
db.exec("""
239+
CREATE TABLE documents (
240+
id INTEGER PRIMARY KEY,
241+
title TEXT,
242+
embedding VECTOR(3)
243+
);
244+
CREATE INDEX idx_emb ON documents(embedding) USING HNSW WITH (metric = 'cosine');
245+
""")
246+
247+
# Insert vectors using the Vector wrapper
248+
db.execute(
249+
"INSERT INTO documents VALUES ($1, $2, $3)",
250+
[1, "Hello world", Vector([0.1, 0.2, 0.3])],
251+
)
252+
db.execute(
253+
"INSERT INTO documents VALUES ($1, $2, $3)",
254+
[2, "Goodbye world", Vector([0.9, 0.1, 0.0])],
255+
)
256+
257+
# k-NN search: find 5 nearest neighbors
258+
results = db.query(
259+
"SELECT id, title, VEC_DISTANCE_COSINE(embedding, '[0.1, 0.2, 0.3]') AS dist "
260+
"FROM documents ORDER BY dist LIMIT 5"
261+
)
262+
263+
# Read vectors back as list[float]
264+
row = db.query_one("SELECT embedding FROM documents WHERE id = 1")
265+
emb = row["embedding"] # [0.1, 0.2, 0.3]
266+
```
267+
268+
### Distance Functions
269+
270+
| Function | Description |
271+
|----------|-------------|
272+
| `VEC_DISTANCE_L2(a, b)` | Euclidean distance |
273+
| `VEC_DISTANCE_COSINE(a, b)` | Cosine distance (1 - similarity) |
274+
| `VEC_DISTANCE_IP(a, b)` | Negative inner product |
275+
276+
### Vector Utilities
277+
278+
| Function | Description |
279+
|----------|-------------|
280+
| `VEC_DIMS(v)` | Number of dimensions |
281+
| `VEC_NORM(v)` | L2 norm (magnitude) |
282+
| `VEC_TO_TEXT(v)` | Convert to string `[1.0, 2.0, 3.0]` |
283+
284+
### HNSW Index Options
285+
286+
```sql
287+
CREATE INDEX idx ON table(column) USING HNSW WITH (metric = 'cosine');
288+
```
289+
290+
Supported metrics: `l2` (default), `cosine`, `ip` (inner product).
220291

221292
## Features
222293

@@ -230,8 +301,9 @@ Stoolap is a full-featured embedded SQL database:
230301
- **Window functions**: ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE
231302
- **CTEs**: WITH and WITH RECURSIVE
232303
- **Aggregations**: GROUP BY, HAVING, ROLLUP, CUBE, GROUPING SETS
233-
- **Indexes**: B-tree, Hash, Bitmap (auto-selected), multi-column composite
234-
- **110 built-in functions**: string, math, date/time, JSON, aggregate
304+
- **Vector similarity search** with HNSW indexes (L2, cosine, inner product)
305+
- **Indexes**: B-tree, Hash, Bitmap (auto-selected), HNSW, multi-column composite
306+
- **110+ built-in functions**: string, math, date/time, JSON, vector, aggregate
235307
- **WAL + snapshots** for crash recovery
236308
- **Semantic query caching** with predicate subsumption
237309

0 commit comments

Comments
 (0)