@@ -201,10 +201,16 @@ db = Database.open("file:///path/to/mydata?sync=none&snapshot_interval=600")
201201| ` compression ` | ` on ` , ` off ` | ` on ` | Enable WAL + snapshot compression (LZ4) |
202202| ` wal_compression ` | ` on ` , ` off ` | ` on ` | WAL compression only |
203203| ` snapshot_compression ` | ` on ` , ` off ` | ` on ` | Snapshot compression only |
204+ | ` compression_threshold ` | bytes | ` 64 ` | Minimum data size before compression |
204205| ` sync_interval_ms ` | milliseconds | ` 10 ` | Background sync interval |
205206| ` wal_buffer_size ` | bytes | ` 65536 ` | WAL write buffer size |
207+ | ` wal_flush_trigger ` | bytes | ` 32768 ` | WAL size before flush |
206208| ` wal_max_size ` | bytes | ` 67108864 ` | WAL size before forced snapshot |
207209| ` commit_batch_size ` | count | ` 100 ` | Commits batched before flush |
210+ | ` cleanup ` | ` on ` , ` off ` | ` on ` | Enable background cleanup thread |
211+ | ` cleanup_interval ` | seconds | ` 60 ` | Interval between cleanup runs |
212+ | ` deleted_row_retention ` | seconds | ` 300 ` | Keep deleted rows before permanent removal |
213+ | ` transaction_retention ` | seconds | ` 3600 ` | Keep stale transaction metadata |
208214
209215## Type Mapping
210216
@@ -217,6 +223,71 @@ db = Database.open("file:///path/to/mydata?sync=none&snapshot_interval=600")
217223| ` None ` | ` NULL ` | |
218224| ` datetime.datetime ` | ` TIMESTAMP ` | Converted to/from UTC |
219225| ` dict ` / ` list ` | ` JSON ` | Serialized via ` json.dumps ` |
226+ | ` Vector ` | ` VECTOR(N) ` | ` list[float] ` on output |
227+
228+ ## Vector Similarity Search
229+
230+ Store embeddings and perform k-NN similarity search using HNSW indexes:
231+
232+ ``` python
233+ from stoolap import Database, Vector
234+
235+ db = Database.open(" :memory:" )
236+
237+ # Create a table with a VECTOR column
238+ db.exec("""
239+ CREATE TABLE documents (
240+ id INTEGER PRIMARY KEY,
241+ title TEXT,
242+ embedding VECTOR(3)
243+ );
244+ CREATE INDEX idx_emb ON documents(embedding) USING HNSW WITH (metric = 'cosine');
245+ """ )
246+
247+ # Insert vectors using the Vector wrapper
248+ db.execute(
249+ " INSERT INTO documents VALUES ($1, $2, $3)" ,
250+ [1 , " Hello world" , Vector([0.1 , 0.2 , 0.3 ])],
251+ )
252+ db.execute(
253+ " INSERT INTO documents VALUES ($1, $2, $3)" ,
254+ [2 , " Goodbye world" , Vector([0.9 , 0.1 , 0.0 ])],
255+ )
256+
257+ # k-NN search: find 5 nearest neighbors
258+ results = db.query(
259+ " SELECT id, title, VEC_DISTANCE_COSINE(embedding, '[0.1, 0.2, 0.3]') AS dist "
260+ " FROM documents ORDER BY dist LIMIT 5"
261+ )
262+
263+ # Read vectors back as list[float]
264+ row = db.query_one(" SELECT embedding FROM documents WHERE id = 1" )
265+ emb = row[" embedding" ] # [0.1, 0.2, 0.3]
266+ ```
267+
268+ ### Distance Functions
269+
270+ | Function | Description |
271+ | ----------| -------------|
272+ | ` VEC_DISTANCE_L2(a, b) ` | Euclidean distance |
273+ | ` VEC_DISTANCE_COSINE(a, b) ` | Cosine distance (1 - similarity) |
274+ | ` VEC_DISTANCE_IP(a, b) ` | Negative inner product |
275+
276+ ### Vector Utilities
277+
278+ | Function | Description |
279+ | ----------| -------------|
280+ | ` VEC_DIMS(v) ` | Number of dimensions |
281+ | ` VEC_NORM(v) ` | L2 norm (magnitude) |
282+ | ` VEC_TO_TEXT(v) ` | Convert to string ` [1.0, 2.0, 3.0] ` |
283+
284+ ### HNSW Index Options
285+
286+ ``` sql
287+ CREATE INDEX idx ON table(column) USING HNSW WITH (metric = ' cosine' );
288+ ```
289+
290+ Supported metrics: ` l2 ` (default), ` cosine ` , ` ip ` (inner product).
220291
221292## Features
222293
@@ -230,8 +301,9 @@ Stoolap is a full-featured embedded SQL database:
230301- ** Window functions** : ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, NTILE
231302- ** CTEs** : WITH and WITH RECURSIVE
232303- ** Aggregations** : GROUP BY, HAVING, ROLLUP, CUBE, GROUPING SETS
233- - ** Indexes** : B-tree, Hash, Bitmap (auto-selected), multi-column composite
234- - ** 110 built-in functions** : string, math, date/time, JSON, aggregate
304+ - ** Vector similarity search** with HNSW indexes (L2, cosine, inner product)
305+ - ** Indexes** : B-tree, Hash, Bitmap (auto-selected), HNSW, multi-column composite
306+ - ** 110+ built-in functions** : string, math, date/time, JSON, vector, aggregate
235307- ** WAL + snapshots** for crash recovery
236308- ** Semantic query caching** with predicate subsumption
237309
0 commit comments