From 6e76ad3a44778bd0b05220d5d0994b2feec69dc8 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Tue, 24 Feb 2026 11:23:46 -0500 Subject: [PATCH 01/12] Updates to configuration.md --- docs/configuration.md | 203 +++++++++++++++++++++++++++--------------- 1 file changed, 132 insertions(+), 71 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index 20bd763..d1b3257 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,12 +1,24 @@ # Configuration -pg_semantic_cache provides flexible configuration options for vector dimensions, index types, and cache behavior. +pg_semantic_cache provides flexible configuration options for vector +dimensions, index types, and cache behavior. -## Vector Dimensions +!!! tip "Start Simple" + + When configuring semantic caching, begin with simple defaults (1536 + dimensions, IVFFlat, 0.95 threshold) and adjust your system based on + monitoring. -The extension supports configurable embedding dimensions to match your chosen embedding model. +!!! warning "Test Before Production" + + Always test configuration changes in development before applying to + production! -### Supported Dimensions +## Vector Dimensions + +The extension supports configurable embedding dimensions to match your +chosen embedding model. pg_semantic_cache supports the following dimensions +and associated models: | Dimension | Common Models | |-----------|---------------| @@ -19,7 +31,9 @@ The extension supports configurable embedding dimensions to match your chosen em ### Setting Dimensions !!! warning "Rebuild Required" - Changing dimensions requires rebuilding the index, which **clears all cached data**. + + Changing dimensions requires rebuilding the index, which clears all + cached data. ```sql -- Set vector dimension (default: 1536) @@ -32,7 +46,7 @@ SELECT semantic_cache.rebuild_index(); SELECT semantic_cache.get_vector_dimension(); ``` -### Initial Setup for Custom Dimensions +### Initial Setup For Custom Dimensions If you know your embedding model before installation: @@ -49,18 +63,19 @@ SELECT semantic_cache.rebuild_index(); ## Vector Index Types -Choose between IVFFlat (fast, approximate) or HNSW (accurate, slower build). +Choose between IVFFlat (fast, approximate) or HNSW (accurate, slower +build). ### IVFFlat Index (Default) Best for most use cases - fast lookups with good recall. -**Characteristics:** -- **Lookup Speed**: Very fast (< 5ms typical) -- **Build Time**: Fast -- **Recall**: Good (95%+) -- **Memory**: Moderate -- **Best For**: Production caches with frequent updates +Characteristics: +- Lookup Speed: Very fast (< 5ms typical) +- Build Time: Fast +- Recall: Good (95%+) +- Memory: Moderate +- Best For: Production caches with frequent updates ```sql -- Set index type @@ -68,7 +83,7 @@ SELECT semantic_cache.set_index_type('ivfflat'); SELECT semantic_cache.rebuild_index(); ``` -**IVFFlat Parameters** (set during `init_schema()`): +IVFFlat Parameters (set during `init_schema()`): ```sql -- Default configuration @@ -76,6 +91,7 @@ lists = 100 -- For < 100K entries -- For larger caches, increase lists -- Adjust in the init_schema() function or manually: + DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding; CREATE INDEX idx_cache_entries_embedding ON semantic_cache.cache_entries @@ -87,12 +103,12 @@ WITH (lists = 1000); -- For 100K-1M entries More accurate but slower to build - requires pgvector 0.5.0+. -**Characteristics:** -- **Lookup Speed**: Fast (1-3ms typical) -- **Build Time**: Slower -- **Recall**: Excellent (98%+) -- **Memory**: Higher -- **Best For**: Read-heavy caches with infrequent updates +Characteristics: +- Lookup Speed: Fast (1-3ms typical) +- Build Time: Slower +- Recall: Excellent (98%+) +- Memory: Higher +- Best For: Read-heavy caches with infrequent updates ```sql -- Set index type (requires pgvector 0.5.0+) @@ -100,10 +116,11 @@ SELECT semantic_cache.set_index_type('hnsw'); SELECT semantic_cache.rebuild_index(); ``` -**HNSW Parameters:** +HNSW Parameters: ```sql -- Adjust manually for optimal performance + DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding; CREATE INDEX idx_cache_entries_embedding ON semantic_cache.cache_entries @@ -123,19 +140,25 @@ WITH (m = 16, ef_construction = 64); ## Cache Configuration -The extension stores configuration in the `semantic_cache.cache_config` table. +The extension stores configuration in the +`semantic_cache.cache_config` table. ### View Current Configuration +Use the following command to view the current configuration: + ```sql SELECT * FROM semantic_cache.cache_config ORDER BY key; ``` ### Key Configuration Parameters +Use the following configuration parameters to control cache settings: + #### max_cache_size_mb -Maximum cache size in megabytes before auto-eviction triggers. +Use max_cache_size_mb to specify the maximum cache size in megabytes +before auto-eviction triggers. ```sql -- Set to 2GB @@ -148,7 +171,8 @@ WHERE key = 'max_cache_size_mb'; #### default_ttl_seconds -Default time-to-live for cached entries (can be overridden per query). +Use default_ttl_seconds to specify the default time-to-live for cached +entries (can be overridden per query). ```sql -- Set default to 2 hours @@ -161,7 +185,8 @@ WHERE key = 'default_ttl_seconds'; #### eviction_policy -Automatic eviction strategy when cache size limit is reached. +Use eviction_policy to specify the automatic eviction strategy when +cache size limit is reached. ```sql -- Options: 'lru', 'lfu', 'ttl' @@ -170,15 +195,16 @@ SET value = 'lru' WHERE key = 'eviction_policy'; ``` -**Eviction Policies:** +Eviction Policies: -- **lru**: Least Recently Used - evicts oldest accessed entries -- **lfu**: Least Frequently Used - evicts least accessed entries -- **ttl**: Time To Live - evicts entries closest to expiration +- lru: Least Recently Used - evicts oldest accessed entries +- lfu: Least Frequently Used - evicts least accessed entries +- ttl: Time To Live - evicts entries closest to expiration #### similarity_threshold -Default similarity threshold for cache hits (0.0 - 1.0). +Use similarity_threshold to specify the default similarity threshold for +cache hits (0.0 - 1.0). ```sql -- More strict matching (fewer hits, more accurate) @@ -196,9 +222,13 @@ WHERE key = 'similarity_threshold'; ## Production Configurations +The following sections detail configuration settings useful in a +production environment. + ### High-Throughput Configuration -For applications with thousands of queries per second: +Use the following configuration for applications with thousands of queries +per second: ```sql -- Use IVFFlat with optimized lists @@ -206,13 +236,16 @@ SELECT semantic_cache.set_index_type('ivfflat'); SELECT semantic_cache.rebuild_index(); -- Increase cache size -UPDATE semantic_cache.cache_config SET value = '5000' WHERE key = 'max_cache_size_mb'; +UPDATE semantic_cache.cache_config SET value = '5000' +WHERE key = 'max_cache_size_mb'; -- Use LRU for fast eviction -UPDATE semantic_cache.cache_config SET value = 'lru' WHERE key = 'eviction_policy'; +UPDATE semantic_cache.cache_config SET value = 'lru' +WHERE key = 'eviction_policy'; -- Shorter TTL to keep cache fresh -UPDATE semantic_cache.cache_config SET value = '1800' WHERE key = 'default_ttl_seconds'; +UPDATE semantic_cache.cache_config SET value = '1800' +WHERE key = 'default_ttl_seconds'; ``` PostgreSQL settings: @@ -226,7 +259,7 @@ maintenance_work_mem = 2GB ### High-Accuracy Configuration -For applications requiring maximum precision: +Use the following configuration for applications requiring maximum precision: ```sql -- Use HNSW for best recall @@ -234,15 +267,18 @@ SELECT semantic_cache.set_index_type('hnsw'); SELECT semantic_cache.rebuild_index(); -- Strict similarity threshold -UPDATE semantic_cache.cache_config SET value = '0.98' WHERE key = 'similarity_threshold'; +UPDATE semantic_cache.cache_config SET value = '0.98' +WHERE key = 'similarity_threshold'; -- Longer TTL for stable results -UPDATE semantic_cache.cache_config SET value = '14400' WHERE key = 'default_ttl_seconds'; +UPDATE semantic_cache.cache_config SET value = '14400' +WHERE key = 'default_ttl_seconds'; ``` ### LLM/AI Application Configuration -Optimized for caching expensive AI API calls: +Use the following configuration settings to optimize caching for expensive AI +API calls: ```sql -- OpenAI ada-002 dimensions @@ -250,18 +286,22 @@ SELECT semantic_cache.set_vector_dimension(1536); SELECT semantic_cache.rebuild_index(); -- Balance between accuracy and coverage -UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold'; +UPDATE semantic_cache.cache_config SET value = '0.93' +WHERE key = 'similarity_threshold'; -- Cache longer (AI responses stable) -UPDATE semantic_cache.cache_config SET value = '7200' WHERE key = 'default_ttl_seconds'; +UPDATE semantic_cache.cache_config SET value = '7200' +WHERE key = 'default_ttl_seconds'; -- Large cache for many queries -UPDATE semantic_cache.cache_config SET value = '10000' WHERE key = 'max_cache_size_mb'; +UPDATE semantic_cache.cache_config SET value = '10000' +WHERE key = 'max_cache_size_mb'; ``` ### Analytics Query Configuration -For caching expensive analytical queries: +The following configuration is well-suited for caching expensive analytical +queries: ```sql -- Use standard dimensions @@ -269,19 +309,26 @@ SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); -- Moderate similarity (query variations common) -UPDATE semantic_cache.cache_config SET value = '0.90' WHERE key = 'similarity_threshold'; +UPDATE semantic_cache.cache_config SET value = '0.90' +WHERE key = 'similarity_threshold'; -- Short TTL (data changes frequently) -UPDATE semantic_cache.cache_config SET value = '900' WHERE key = 'default_ttl_seconds'; +UPDATE semantic_cache.cache_config SET value = '900' +WHERE key = 'default_ttl_seconds'; -- LFU policy (popular queries cached longer) -UPDATE semantic_cache.cache_config SET value = 'lfu' WHERE key = 'eviction_policy'; +UPDATE semantic_cache.cache_config SET value = 'lfu' +WHERE key = 'eviction_policy'; ``` ## Monitoring Configuration Impact +Use the following commands to monitor your semantic cache. + ### Check Index Performance +Use the following query to view index usage statistics: + ```sql -- View index usage SELECT @@ -297,6 +344,8 @@ WHERE schemaname = 'semantic_cache'; ### Measure Lookup Times +Use the following commands to measure lookup performance: + ```sql -- Enable timing \timing on @@ -312,6 +361,8 @@ Target: < 5ms for most queries ### Cache Hit Rate +Use the following query to monitor cache hit rate: + ```sql -- Monitor hit rate with current config SELECT * FROM semantic_cache.cache_stats(); @@ -319,59 +370,69 @@ SELECT * FROM semantic_cache.cache_stats(); Target: > 70% for effective caching -## Configuration Best Practices +### Tuning Checklist -!!! tip "Start Simple" - Begin with defaults (1536 dimensions, IVFFlat, 0.95 threshold) and adjust based on monitoring. +Follow this checklist when tuning your cache configuration: -!!! warning "Test Before Production" - Always test configuration changes in development before applying to production. +- Choose a dimension matching your embedding model. +- Select an index type based on workload (IVFFlat for most cases). +- Set a similarity threshold based on accuracy requirements. +- Configure cache size based on available memory. +- Choose an eviction policy matching access patterns. +- Set TTL based on data freshness requirements. +- Monitor hit rate and adjust as needed. -### Tuning Checklist +### Common Mistakes -- [ ] Choose dimension matching your embedding model -- [ ] Select index type based on workload (IVFFlat for most cases) -- [ ] Set similarity threshold based on accuracy requirements -- [ ] Configure cache size based on available memory -- [ ] Choose eviction policy matching access patterns -- [ ] Set TTL based on data freshness requirements -- [ ] Monitor hit rate and adjust as needed +The following common mistakes have simple remediations: -### Common Mistakes +#### Using Wrong Dimensions -❌ **Using wrong dimensions** ```sql --- Extension configured for 1536, but sending 768-dim vectors +-- Extension configured for 1536, but sending 768-dim +-- vectors -- Result: Error or poor performance ``` -✓ **Match model dimensions** +You should use matching model dimensions: + ```sql -SELECT semantic_cache.set_vector_dimension(768); -- Match your model +-- Match your model +SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); ``` -❌ **Too strict threshold** +#### Too Strict Threshold + ```sql -UPDATE semantic_cache.cache_config SET value = '0.99' WHERE key = 'similarity_threshold'; +UPDATE semantic_cache.cache_config SET value = '0.99' +WHERE key = 'similarity_threshold'; -- Result: Very low hit rate ``` -✓ **Balanced threshold** +Use a more balanced threshold: + ```sql -UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold'; +UPDATE semantic_cache.cache_config SET value = '0.93' +WHERE key = 'similarity_threshold'; -- Allows reasonable variation ``` -❌ **Forgetting to rebuild** +#### Forgetting To Rebuild + ```sql SELECT semantic_cache.set_vector_dimension(768); -- Forgot: SELECT semantic_cache.rebuild_index(); -- Result: Old index still in use! ``` +Rebuild your cache to use the new index! + ## Next Steps -- [Functions Reference](functions/index.md) - Learn about all configuration functions -- [Monitoring](monitoring.md) - Track performance and tune configuration -- [Use Cases](use_cases.md) - See configuration examples in practice +- [Functions Reference](functions/index.md) - Learn about all + configuration functions. +- [Monitoring](monitoring.md) - Track performance and tune + configuration. +- [Use Cases](use_cases.md) - See configuration examples in + practice. From fab196bdf95de97cb9b53ccd8e0d33884cb8b205 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Tue, 24 Feb 2026 11:35:11 -0500 Subject: [PATCH 02/12] Updates to FAQ.md and configuration.md --- docs/FAQ.md | 229 +++++++++++++++++++++++++----------------- docs/configuration.md | 4 +- 2 files changed, 140 insertions(+), 93 deletions(-) diff --git a/docs/FAQ.md b/docs/FAQ.md index 5779a97..214c2af 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -4,40 +4,53 @@ ### What is semantic caching? -Semantic caching uses vector embeddings to understand the *meaning* of queries, not just exact text matching. When you search for "What was Q4 revenue?", the cache can return results for semantically similar queries like "Show Q4 revenue" or "Q4 revenue please" even though the exact text is different. +Semantic caching uses vector embeddings to understand the meaning of +queries, not just exact text matching. When you search for "What was Q4 +revenue?", the cache can return results for semantically similar queries +like "Show Q4 revenue" or "Q4 revenue please" even though the exact text +is different. -Traditional caching requires exact string matches, while semantic caching matches based on similarity scores (typically 90-98%). +Traditional caching requires exact string matches, while semantic caching +matches based on similarity scores (typically 90-98%). ### Why use pg_semantic_cache instead of a traditional cache like Redis? -**Use pg_semantic_cache when:** -- Queries are phrased differently but mean the same thing (LLM applications, natural language queries) -- You need semantic understanding of query similarity -- You're already using PostgreSQL and want tight integration -- You need persistent caching with complex querying capabilities +Use pg_semantic_cache when: -**Use traditional caching (Redis, Memcached) when:** -- You need exact key-value matching -- Sub-millisecond latency is critical -- Queries are deterministic and rarely vary -- You need distributed caching across multiple services +- Queries are phrased differently but mean the same thing (LLM + applications, natural language queries). +- You need semantic understanding of query similarity. +- You're already using PostgreSQL and want tight integration. +- You need persistent caching with complex querying capabilities. -**Use both:** pg_semantic_cache for semantic matching + Redis for hot-path exact matches! +Use traditional caching (Redis, Memcached) when: + +- You need exact key-value matching. +- Sub-millisecond latency is critical. +- Queries are deterministic and rarely vary. +- You need distributed caching across multiple services. + +Use both: pg_semantic_cache for semantic matching + Redis for hot-path +exact matches! ### How does it compare to application-level caching? +The following table compares pg_semantic_cache to application-level +caching: + | Feature | pg_semantic_cache | Application Cache | |---------|-------------------|-------------------| -| Semantic Matching | ✅ Yes | ❌ No | -| Database Integration | ✅ Native | ⚠️ Requires sync | -| Multi-language | ✅ Yes | ⚠️ Per-instance | -| Persistence | ✅ Automatic | ⚠️ Manual | -| Vector Operations | ✅ Optimized | ❌ Not available | -| Shared Across Apps | ✅ Yes | ❌ No | +| Semantic Matching | Yes | No | +| Database Integration | Native | Requires sync | +| Multi-language | Yes | Per-instance | +| Persistence | Automatic | Manual | +| Vector Operations | Optimized | Not available | +| Shared Across Apps | Yes | No | ### Is it production-ready? -Yes! pg_semantic_cache is: +Yes! pg_semantic_cache is production-ready and has the following +characteristics: - Written in C using stable PostgreSQL APIs - Tested with PostgreSQL 14-18 - Used in production environments @@ -48,7 +61,8 @@ Yes! pg_semantic_cache is: ### Do I need to install pgvector separately? -Yes, pgvector is a required dependency. Install it before pg_semantic_cache: +Yes, pgvector is a required dependency. Install it before +pg_semantic_cache: ```bash # Install pgvector @@ -65,12 +79,12 @@ make && sudo make install It depends on the service: -- **Self-hosted PostgreSQL**: ✅ Yes -- **AWS RDS**: ✅ Yes (if you can install extensions) -- **Azure Database for PostgreSQL**: ✅ Yes (flexible server) -- **Google Cloud SQL**: ⚠️ Check extension support -- **Supabase**: ✅ Yes (pgvector supported) -- **Neon**: ✅ Yes (pgvector supported) +- Self-hosted PostgreSQL: Yes +- AWS RDS: Yes (if you can install extensions) +- Azure Database for PostgreSQL: Yes (flexible server) +- Google Cloud SQL: Check extension support +- Supabase: Yes (pgvector supported) +- Neon: Yes (pgvector supported) Check if your provider supports custom C extensions and pgvector. @@ -80,6 +94,8 @@ PostgreSQL 14, 15, 16, 17, and 18 are fully supported and tested. ### How do I upgrade the extension? +Use one of the following methods to upgrade the extension: + ```sql -- Drop and recreate (WARNING: clears cache) DROP EXTENSION pg_semantic_cache CASCADE; @@ -93,14 +109,19 @@ ALTER EXTENSION pg_semantic_cache UPDATE TO '0.4.0'; ### How fast are cache lookups? -**Target**: < 5ms for most queries +Cache lookups are very fast, with the following performance +characteristics: + +Target: < 5ms for most queries + +Typical Performance: -**Typical Performance:** - IVFFlat index: 2-5ms - HNSW index: 1-3ms - Without index: 50-500ms (don't do this!) -**Factors affecting speed:** +Factors affecting speed: + - Cache size (more entries = slightly slower) - Vector dimension (1536 vs 3072) - Index type and parameters @@ -114,25 +135,31 @@ SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); ### How much storage does it use? -**Storage per entry:** +Storage requirements vary based on vector dimensions and result sizes: + +Storage per entry: + - Vector embedding: ~6KB (1536 dimensions) - Result data: Varies (your cached JSONB) - Metadata: ~200 bytes -- **Total**: 6KB + your data size +- Total: 6KB + your data size + +Example: -**Example:** - 10K entries with 10KB results each = ~160MB - 100K entries with 5KB results each = ~1.1GB ### What's the maximum cache size? -There's no hard limit, but practical considerations: +There's no hard limit, but consider the following practical +considerations: + +- < 100K entries: Excellent performance with default settings +- 100K - 1M entries: Increase IVFFlat lists parameter +- > 1M entries: Consider partitioning or HNSW index -- **< 100K entries**: Excellent performance with default settings -- **100K - 1M entries**: Increase IVFFlat lists parameter -- **> 1M entries**: Consider partitioning or HNSW index +Use the following command to configure max size: -Configure max size: ```sql UPDATE semantic_cache.cache_config SET value = '5000' -- 5GB @@ -141,7 +168,8 @@ WHERE key = 'max_cache_size_mb'; ### Does it work with large result sets? -Yes, but consider: +Yes, but consider the following factors: + - Large results (> 1MB) consume more storage - Serializing/deserializing large JSONB has overhead - Consider caching aggregated results instead of full datasets @@ -161,7 +189,8 @@ FROM huge_table; -- 1KB result Any embedding model that produces fixed-dimension vectors: -**Popular Models:** +Popular Models: + - OpenAI text-embedding-ada-002 (1536 dim) - OpenAI text-embedding-3-small (1536 dim) - OpenAI text-embedding-3-large (3072 dim) @@ -169,7 +198,8 @@ Any embedding model that produces fixed-dimension vectors: - Sentence Transformers all-MiniLM-L6-v2 (384 dim) - Sentence Transformers all-mpnet-base-v2 (768 dim) -Configure dimension: +Use the following commands to configure dimension: + ```sql SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); @@ -177,9 +207,11 @@ SELECT semantic_cache.rebuild_index(); ### Do I need to generate embeddings myself? -Yes. pg_semantic_cache stores and searches embeddings, but doesn't generate them. +Yes. pg_semantic_cache stores and searches embeddings, but doesn't +generate them. + +Typical workflow: -**Typical workflow:** 1. Generate embedding using your chosen model/API 2. Pass embedding to `cache_query()` or `get_cached_result()` 3. Extension handles similarity search @@ -202,15 +234,17 @@ SELECT semantic_cache.rebuild_index(); ### What similarity threshold should I use? -**Recommendations:** +Use the following recommendations to select an appropriate similarity +threshold: + +- 0.98-0.99: Nearly identical queries (financial data, strict matching) +- 0.95-0.97: Very similar queries (recommended starting point) +- 0.90-0.94: Similar queries (good for exploratory queries) +- 0.85-0.89: Somewhat related (use with caution) +- < 0.85: Too lenient (likely irrelevant results) -- **0.98-0.99**: Nearly identical queries (financial data, strict matching) -- **0.95-0.97**: Very similar queries (recommended starting point) -- **0.90-0.94**: Similar queries (good for exploratory queries) -- **0.85-0.89**: Somewhat related (use with caution) -- **< 0.85**: Too lenient (likely irrelevant results) +Start with 0.95 and adjust based on your hit rate: -**Start with 0.95** and adjust based on your hit rate: - Hit rate too low? Lower threshold (0.92) - Getting irrelevant results? Raise threshold (0.97) @@ -218,13 +252,17 @@ SELECT semantic_cache.rebuild_index(); ### How do I choose between IVFFlat and HNSW? -**Use IVFFlat (default) when:** +Choose the index type based on your workload characteristics: + +Use IVFFlat (default) when: + - Cache updates frequently - Build time matters - < 100K entries - Good enough recall (95%+) -**Use HNSW when:** +Use HNSW when: + - Maximum accuracy needed - Cache mostly read-only - Have pgvector 0.5.0+ @@ -238,7 +276,7 @@ SELECT semantic_cache.rebuild_index(); ### What TTL should I set? -Depends on data freshness requirements: +The TTL depends on your data freshness requirements: ```sql -- Real-time data (stock prices, weather) @@ -256,7 +294,7 @@ ttl_seconds := NULL -- Never expires ### How often should I run maintenance? -**Recommended Schedule:** +Follow this recommended maintenance schedule: ```sql -- Every 15 minutes: Evict expired entries @@ -279,26 +317,27 @@ SELECT cron.schedule('cache-evict', '*/15 * * * *', ### Why is my hit rate so low? -**Common causes:** +Low hit rates typically have one of the following common causes: -1. **Threshold too high** +1. Threshold too high ```sql -- Lower from 0.95 to 0.90 SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90); ``` -2. **TTL too short** +2. TTL too short ```sql -- Check average entry lifetime - SELECT AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) / 3600 as avg_age_hours + SELECT AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) / 3600 + as avg_age_hours FROM semantic_cache.cache_entries; ``` -3. **Poor embedding quality** +3. Poor embedding quality - Use better embedding model - Ensure consistent embedding generation -4. **Cache too small** +4. Cache too small ```sql -- Check if entries being evicted too quickly SELECT * FROM semantic_cache.cache_stats(); @@ -306,7 +345,7 @@ SELECT cron.schedule('cache-evict', '*/15 * * * *', ### Cache lookups are returning no results -**Debugging steps:** +Use the following debugging steps to troubleshoot this issue: ```sql -- 1. Check cache has entries @@ -333,11 +372,13 @@ LIMIT 5; ### Extension won't load +If you encounter the following error: + ```sql ERROR: could not open extension control file ``` -**Solution:** +Use this solution: ```bash # Check installation ls -l $(pg_config --sharedir)/extension/pg_semantic_cache* @@ -352,11 +393,13 @@ ls -l $(pg_config --pkglibdir)/vector.so ### Build errors +If you encounter the following build error: + ```bash fatal error: postgres.h: No such file or directory ``` -**Solution:** +Use this solution: ```bash # Debian/Ubuntu sudo apt-get install postgresql-server-dev-17 @@ -371,23 +414,25 @@ export PATH="/opt/homebrew/opt/postgresql@17/bin:$PATH" ### Out of memory errors +If you encounter the following error: + ```sql ERROR: out of memory ``` -**Solutions:** +Try one of these solutions: -1. **Increase work_mem** +1. Increase work_mem ```sql SET work_mem = '512MB'; ``` -2. **Reduce cache size** +2. Reduce cache size ```sql SELECT semantic_cache.evict_lru(5000); -- Keep only 5K entries ``` -3. **Lower vector dimension** +3. Lower vector dimension ```sql SELECT semantic_cache.set_vector_dimension(768); -- Use smaller model SELECT semantic_cache.rebuild_index(); @@ -398,19 +443,24 @@ ERROR: out of memory ### Should I cache everything? No! Cache queries that are: -- ✅ Expensive (slow execution) -- ✅ Frequently repeated (similar queries) -- ✅ Tolerant of slight staleness -- ✅ Semantically searchable + +- Expensive (slow execution) +- Frequently repeated (similar queries) +- Tolerant of slight staleness +- Semantically searchable Don't cache: -- ❌ Simple key-value lookups (use Redis) -- ❌ Real-time critical data -- ❌ Unique, one-off queries -- ❌ Queries that must be current + +- Simple key-value lookups (use Redis) +- Real-time critical data +- Unique, one-off queries +- Queries that must be current ### How do I test if caching helps? +Use the following approach to measure the performance improvement from +caching: + ```sql -- Measure query time without cache \timing on @@ -423,18 +473,19 @@ SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); -- With cache (subsequent calls - hit) SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); --- Time: 3.456 ms (cache hit!) +-- Time: 3.456 ms (cache hit) --- Speedup: 450 / 3.5 = 128x faster! +-- Speedup: 450 / 3.5 = 128x faster ``` ### Should I use tags? Yes! Tags are useful for: -- **Organization**: Group by feature (`ARRAY['dashboard', 'sales']`) -- **Bulk invalidation**: `invalidate_cache(tag := 'user_123')` -- **Analytics**: `SELECT * FROM semantic_cache.cache_by_tag` -- **Debugging**: Find entries by category + +- Organization: Group by feature (`ARRAY['dashboard', 'sales']`) +- Bulk invalidation: `invalidate_cache(tag := 'user_123')` +- Analytics: `SELECT * FROM semantic_cache.cache_by_tag` +- Debugging: Find entries by category ```sql -- Tag everything @@ -447,16 +498,12 @@ SELECT semantic_cache.cache_query( ); ``` -## See Also - -- [Getting Started](index.md) -- [Installation Guide](installation.md) -- [Configuration](configuration.md) -- [Use Cases](use_cases.md) -- [Functions Reference](functions/index.md) -- [Monitoring](monitoring.md) ## Still Have Questions? -- **GitHub Issues**: [Report bugs or ask questions](https://github.com/pgedge/pg_semantic_cache/issues) -- **Discussions**: [Community discussions](https://github.com/pgedge/pg_semantic_cache/discussions) +Contact us through the following channels: + +- GitHub Issues: [Report bugs or ask + questions](https://github.com/pgedge/pg_semantic_cache/issues) +- Discussions: [Community + discussions](https://github.com/pgedge/pg_semantic_cache/discussions) diff --git a/docs/configuration.md b/docs/configuration.md index d1b3257..c357dad 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,7 +1,7 @@ # Configuration -pg_semantic_cache provides flexible configuration options for vector -dimensions, index types, and cache behavior. +This guide describes how to configure pg_semantic_cache for your use case, +including vector dimensions, index types, and cache behavior. !!! tip "Start Simple" From 7f23c145b519cbde31369215e447c8b23ecc32ef Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Thu, 26 Feb 2026 11:05:31 -0500 Subject: [PATCH 03/12] Updated file contents to break into sections; add edits, etc --- docs/architecture.md | 54 ++++++++++++++ docs/index.md | 161 +++++++++------------------------------- docs/installation.md | 123 ++++++++++-------------------- docs/quick_start.md | 60 +++++++++++++++ docs/troubleshooting.md | 84 +++++++++++++++++++++ mkdocs.yml | 7 +- 6 files changed, 278 insertions(+), 211 deletions(-) create mode 100644 docs/architecture.md create mode 100644 docs/quick_start.md create mode 100644 docs/troubleshooting.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..764e9ee --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,54 @@ +# Architecture + +pg_semantic_cache is implemented in pure C using the PostgreSQL extension API +(PGXS), providing: + +- Small binary size of ~100KB vs 2-5MB for Rust-based extensions. +- Fast build times of 10-30 seconds vs 2-5 minutes. +- Immediate compatibility works with new PostgreSQL versions immediately. +- Standard packaging is compatible with all PostgreSQL packaging tools. + +## How It Works + +```mermaid +graph LR + A[Query] --> B[Generate Embedding] + B --> C{Cache Lookup} + C -->|Hit| D[Return Cached Result] + C -->|Miss| E[Execute Query] + E --> F[Store Result + Embedding] + F --> G[Return Result] +``` + +1. Generate an embedding by converting your query text into a vector embedding + using your preferred model (OpenAI, Cohere, etc.). +2. Check the cache by searching for semantically similar cached queries using + cosine similarity. +3. On a cache hit, if a similar query exists above the similarity threshold, + the cached result is returned. +4. On a cache miss, the actual query is executed and the result is cached with + its embedding for future use. +5. Automatic maintenance evicts expired entries based on TTL and configured + policies. + +## Performance + +- Lookup time is < 5ms for most queries with IVFFlat index. +- Scalability handles 100K+ cached entries efficiently. +- Throughput reaches thousands of cache lookups per second. +- Storage provides configurable cache size limits with automatic eviction. + +!!! tip "Pro Tip" + + Start with the default IVFFlat index and 1536 dimensions (OpenAI + ada-002). You can always reconfigure your cache later with the + `set_vector_dimension()` and `rebuild_index()` functions. + +## Getting Help + +- Browse the sections in the navigation menu for documentation. +- Report issues at + [GitHub Issues](https://github.com/pgedge/pg_semantic_cache/issues). +- See [Use Cases](use_cases.md) for practical implementation examples. +- Check the [FAQ](FAQ.md) for answers to common questions. + diff --git a/docs/index.md b/docs/index.md index 478ecc4..e776e85 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,103 +1,32 @@ # pg_semantic_cache -!!! info "Welcome to pg_semantic_cache" - Semantic query result caching for PostgreSQL using vector embeddings - making expensive queries fast through intelligent reuse. - -## Overview - -pg_semantic_cache is a PostgreSQL extension that implements semantic query result caching using vector embeddings. Unlike traditional query caching that relies on exact string matching, pg_semantic_cache understands the *meaning* of queries through vector similarity, enabling cache hits even when queries are phrased differently. +pg_semantic_cache is a PostgreSQL extension that implements semantic query +result caching using vector embeddings. Unlike traditional query caching that +relies on exact string matching, pg_semantic_cache understands the *meaning* +of queries through vector similarity, enabling cache hits even when queries +are phrased differently. This extension is particularly valuable for: -- **AI/LLM Applications**: Cache expensive LLM API calls and RAG (Retrieval Augmented Generation) results -- **Analytics Workloads**: Reuse results from complex analytical queries with similar parameters -- **External API Queries**: Cache results from expensive external data sources -- **Database Query Optimization**: Reduce load on expensive database operations - -## Key Features - -- **Semantic Matching**: Uses pgvector for similarity-based cache lookups -- **Flexible TTL**: Per-entry time-to-live configuration -- **Tag-Based Management**: Organize and invalidate cache entries by tags -- **Multiple Eviction Policies**: LRU, LFU, and TTL-based automatic eviction -- **Cost Tracking**: Monitor and report on query cost savings -- **Configurable Dimensions**: Support for various embedding models (768, 1536, 3072+ dimensions) -- **Multiple Index Types**: IVFFlat (fast) or HNSW (accurate) vector indexes -- **Comprehensive Monitoring**: Built-in statistics, views, and health metrics - -## How It Works - -```mermaid -graph LR - A[Query] --> B[Generate Embedding] - B --> C{Cache Lookup} - C -->|Hit| D[Return Cached Result] - C -->|Miss| E[Execute Query] - E --> F[Store Result + Embedding] - F --> G[Return Result] -``` - -1. **Generate Embedding**: Convert your query text into a vector embedding using your preferred model (OpenAI, Cohere, etc.) -2. **Check Cache**: Search for semantically similar cached queries using cosine similarity -3. **Cache Hit**: If a similar query exists above the similarity threshold, return the cached result -4. **Cache Miss**: Execute the actual query, cache the result with its embedding for future use -5. **Automatic Maintenance**: Expired entries are evicted based on TTL and configured policies - -## Quick Start - -### Prerequisites - -- PostgreSQL 14, 15, 16, 17, or 18 -- pgvector extension installed -- C compiler (gcc or clang) -- PostgreSQL development headers - -### Installation - -```bash -# Clone the repository -git clone https://github.com/pgedge/pg_semantic_cache.git -cd pg_semantic_cache +- AI/LLM applications can cache expensive LLM API calls and RAG (Retrieval + Augmented Generation) results. +- Analytics workloads can reuse results from complex analytical queries with + similar parameters. +- External API queries can cache results from expensive external data + sources. +- Database query optimization can reduce load on expensive database + operations. -# Build and install -make clean -make -sudo make install -``` +### Why Use Semantic Caching -### Setup +Semantic caching transforms how applications handle query results by +using vector matching rather than matching exact queries. Traditional caching +systems can miss cached result sets when queries are phrased differently, +while semantic caching recognizes that "What was Q4 revenue?" and "Show Q4 revenue" as the same question. This approach dramatically increases cache hit rates +and reduces costs for AI applications, analytics workloads, and external API +calls. -```sql --- Install required extensions -CREATE EXTENSION IF NOT EXISTS vector; -CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; - --- Verify installation -SELECT * FROM semantic_cache.cache_health; -``` - -## Simple Example - -```sql --- Cache a query result with its embedding -SELECT semantic_cache.cache_query( - query_text := 'What was our Q4 2024 revenue?', - query_embedding := '[0.123, 0.456, ...]'::text, -- From your embedding model - result_data := '{"answer": "Q4 2024 revenue was $2.4M"}'::jsonb, - ttl_seconds := 1800, -- 30 minutes - tags := ARRAY['llm', 'revenue'] -); - --- Retrieve with a semantically similar query -SELECT * FROM semantic_cache.get_cached_result( - query_embedding := '[0.124, 0.455, ...]'::text, -- Slightly different query - similarity_threshold := 0.95 -- 95% similarity required -); -``` - -## Why Use pg_semantic_cache? - -### Traditional Caching vs Semantic Caching +Queries that would overlook cached result sets work with a semantic cache: | Traditional Cache | Semantic Cache | |-------------------|----------------| @@ -108,40 +37,24 @@ SELECT * FROM semantic_cache.get_cached_result( ### Cost Savings Example For an LLM application making 10,000 queries per day: -- Without caching: $200/day (at $0.02 per query) -- With 80% cache hit rate: $40/day -- **Savings: $160/day or $58,400/year** - -## Architecture - -pg_semantic_cache is implemented in pure C using the PostgreSQL extension API (PGXS), providing: - -- **Small Binary Size**: ~100KB vs 2-5MB for Rust-based extensions -- **Fast Build Times**: 10-30 seconds vs 2-5 minutes -- **Immediate Compatibility**: Works with new PostgreSQL versions immediately -- **Standard Packaging**: Compatible with all PostgreSQL packaging tools - -## Performance - -- **Lookup Time**: < 5ms for most queries with IVFFlat index -- **Scalability**: Handles 100K+ cached entries efficiently -- **Throughput**: Thousands of cache lookups per second -- **Storage**: Configurable cache size limits with automatic eviction -## Getting Help +- Without caching costs $200/day (at $0.02 per query). +- With 80% cache hit rate costs $40/day. +- Savings are $160/day or $58,400/year. -- **Documentation**: Browse the sections in the navigation menu -- **Issues**: Report bugs at [GitHub Issues](https://github.com/pgedge/pg_semantic_cache/issues) -- **Examples**: See [Use Cases](use_cases.md) for practical implementations -- **FAQ**: Check the [FAQ](FAQ.md) for common questions +### Key Features -## Next Steps +- Semantic matching uses pgvector for similarity-based cache lookups. +- Flexible TTL provides per-entry time-to-live configuration. +- Tag-based management organizes and invalidates cache entries by tags. +- Multiple eviction policies include LRU, LFU, and TTL-based automatic + eviction. +- Cost tracking monitors and reports on query cost savings. +- Configurable dimensions support various embedding models (768, 1536, + 3072+ dimensions). +- Multiple index types include IVFFlat (fast) or HNSW (accurate) vector + indexes. +- Comprehensive monitoring provides built-in statistics, views, and health + metrics. -- [Installation Guide](installation.md) - Detailed installation instructions -- [Configuration](configuration.md) - Configure dimensions, indexes, and policies -- [Functions Reference](functions/index.md) - Complete function documentation -- [Use Cases](use_cases.md) - Practical examples and integration patterns -- [Monitoring](monitoring.md) - Track performance and optimize cache usage -!!! tip "Pro Tip" - Start with the default IVFFlat index and 1536 dimensions (OpenAI ada-002). You can always reconfigure later with `set_vector_dimension()` and `rebuild_index()`. diff --git a/docs/installation.md b/docs/installation.md index 1b248bb..0173931 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -4,16 +4,19 @@ This guide covers installing pg_semantic_cache from source on various platforms. ## Prerequisites -### Required +Before installing pg_semantic_cache, you must install: -- **PostgreSQL**: Version 14, 15, 16, 17, or 18 -- **pgvector**: Must be installed before pg_semantic_cache -- **C Compiler**: gcc or clang -- **make**: GNU Make or compatible -- **PostgreSQL Development Headers**: Required for building extensions +- PostgreSQL: Version 14, 15, 16, 17, or 18 +- pgvector: Must be installed before pg_semantic_cache +- C Compiler: gcc or clang +- make: GNU Make or compatible +- PostgreSQL Development Headers: Required for building extensions ### Platform-Specific Packages +Use the following platform-specific commands to ensure that your host is +prepared for pg_semantic_cache: + === "Debian/Ubuntu" ```bash sudo apt-get install -y \ @@ -63,7 +66,8 @@ This guide covers installing pg_semantic_cache from source on various platforms. ## Building from Source -### Standard Installation +After installing the prerequisites, build pg_semantic_cache using the standard +PostgreSQL extension build commands. ```bash # Clone the repository @@ -80,6 +84,9 @@ sudo make install ### Multi-Version PostgreSQL +Use PG_CONFIG to target specific PostgreSQL versions when multiple versions +are installed. + If you have multiple PostgreSQL versions installed: ```bash @@ -94,6 +101,8 @@ done ### Development Build +Development builds include verbose output and debugging information. + For development with verbose output: ```bash @@ -102,6 +111,8 @@ make dev-install ### View Build Configuration +Check your build environment and configuration settings before compiling. + ```bash make info ``` @@ -114,7 +125,9 @@ Output includes: ## Verifying Installation -### Check Extension Files +After installation completes, verify that all extension files are in place. + +Check for the extension files: ```bash # Verify shared library is installed @@ -127,7 +140,7 @@ ls -lh $(pg_config --sharedir)/extension/pg_semantic_cache.control ls -lh $(pg_config --sharedir)/extension/pg_semantic_cache--*.sql ``` -### Check pgvector Installation +Use the following command to confirm that pgvector is installed: ```bash # pgvector must be installed first @@ -136,9 +149,12 @@ ls -lh $(pg_config --pkglibdir)/vector.so ## PostgreSQL Configuration +Optimize PostgreSQL settings for better performance with semantic caching. + ### Update postgresql.conf -pg_semantic_cache works out of the box without special configuration, but for optimal performance with large caches: +pg_semantic_cache works out of the box without special configuration, but for +optimal performance with large caches: ```ini # Recommended for production with large caches @@ -151,7 +167,7 @@ maintenance_work_mem = 1GB # For index creation track_io_timing = on ``` -Restart PostgreSQL after configuration changes: +Restart PostgreSQL after making configuration changes: ```bash # Systemd @@ -163,7 +179,8 @@ pg_ctl restart -D /var/lib/postgresql/data ## Creating the Extension -### In psql +Create the extension in your PostgreSQL database to begin using semantic +caching. Open the psql command line, and run the following commands: ```sql -- Connect to your database @@ -189,6 +206,8 @@ Expected output: ### Verify Schema Creation +Check that the semantic_cache schema and tables were created successfully. + ```sql -- Check that schema and tables were created \dt semantic_cache.* @@ -197,75 +216,11 @@ Expected output: SELECT * FROM semantic_cache.cache_health; ``` -## Troubleshooting Installation - -### pg_config not found - -```bash -# Find PostgreSQL installation -sudo find / -name pg_config 2>/dev/null - -# Add to PATH -export PATH="/usr/pgsql-17/bin:$PATH" - -# Or specify directly -PG_CONFIG=/path/to/pg_config make install -``` - -### Permission Denied During Installation - -```bash -# Use sudo for system directories -sudo make install - -# Or install to custom directory (no sudo required) -make install DESTDIR=/path/to/custom/location -``` - -### pgvector Not Found - -```sql --- Error: could not open extension control file --- Solution: Install pgvector first -``` - -```bash -cd /tmp -git clone https://github.com/pgvector/pgvector.git -cd pgvector -make -sudo make install -``` - -### Extension Already Exists - -```sql --- If you're upgrading, drop the old version first -DROP EXTENSION IF EXISTS pg_semantic_cache CASCADE; - --- Then reinstall -CREATE EXTENSION pg_semantic_cache; -``` - -!!! warning "Data Loss Warning" - Dropping the extension will delete all cached data. Use `ALTER EXTENSION UPDATE` for upgrades when available. - -### Compilation Errors - -```bash -# Ensure development headers are installed -# Debian/Ubuntu -sudo apt-get install postgresql-server-dev-17 - -# RHEL/Rocky -sudo yum install postgresql17-devel - -# Verify pg_config works -pg_config --includedir-server -``` ## Testing Installation +Validate your installation by running the test suite or manual tests. + Run the included test suite: ```bash @@ -285,13 +240,16 @@ Or run manual tests: ## Uninstalling -### Remove Extension from Database +You can remove pg_semantic_cache from your database and system when it is no +longer needed. + +Use the following command to remove the extension from your database: ```sql DROP EXTENSION IF EXISTS pg_semantic_cache CASCADE; ``` -### Remove Files from System +Then, clean up extension files from PostgreSQL directories: ```bash cd pg_semantic_cache @@ -303,8 +261,3 @@ This removes: - Control file - SQL installation files -## Next Steps - -- [Configuration](configuration.md) - Configure vector dimensions and index types -- [Functions Reference](functions/index.md) - Learn about available functions -- [Use Cases](use_cases.md) - See practical examples diff --git a/docs/quick_start.md b/docs/quick_start.md new file mode 100644 index 0000000..5df92c2 --- /dev/null +++ b/docs/quick_start.md @@ -0,0 +1,60 @@ +# Quick Start + +The steps that follow are designed to get you started with semantic caching +quickly and easily. Before using pg_semantic_cache, you must install: + +- PostgreSQL 14, 15, 16, 17, or 18 +- the pgvector extension +- a C compiler (gcc or clang) +- PostgreSQL development headers + +## Installation + +Use the following commands to build the extension from the Github +repository: + +```bash +# Clone the repository +git clone https://github.com/pgedge/pg_semantic_cache.git +cd pg_semantic_cache + +# Build and install +make clean +make +sudo make install +``` + +After building the extension, you need to install and create the extensions +you'll be using: + +```sql +-- Install required extensions +CREATE EXTENSION IF NOT EXISTS vector; +CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; + +-- Verify installation +SELECT * FROM semantic_cache.cache_health; +``` + +### Using pg_semantic_cache + +Use the following commands to add a result set to a cache, and then query the +cache with a similar query: + +```sql +-- Cache a query result with its embedding +SELECT semantic_cache.cache_query( + query_text := 'What was our Q4 2024 revenue?', + query_embedding := '[0.123, 0.456, ...]'::text, -- From embedding model + result_data := '{"answer": "Q4 2024 revenue was $2.4M"}'::jsonb, + ttl_seconds := 1800, -- 30 minutes + tags := ARRAY['llm', 'revenue'] +); + +-- Retrieve with a semantically similar query +SELECT * FROM semantic_cache.get_cached_result( + query_embedding := '[0.124, 0.455, ...]'::text, -- Slightly different + similarity_threshold := 0.95 -- 95% similarity required +); +``` + diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 0000000..c5eeec5 --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,84 @@ +# Troubleshooting Installation + +The following lists some common issues encountered during installation, and +how to resolve the problems. + +## pg_config not found + +The build system needs pg_config to locate PostgreSQL installation paths. If +pg_config is not in your PATH, the build will fail. + +```bash +# Find PostgreSQL installation +sudo find / -name pg_config 2>/dev/null + +# Add to PATH +export PATH="/usr/pgsql-17/bin:$PATH" + +# Or specify directly +PG_CONFIG=/path/to/pg_config make install +``` + +## Permission Denied During Installation + +Installing extensions requires write access to PostgreSQL's system directories. +Use sudo for standard installations or specify a custom directory. + +```bash +# Use sudo for system directories +sudo make install + +# Or install to custom directory (no sudo required) +make install DESTDIR=/path/to/custom/location +``` + +## pgvector Not Found + +pg_semantic_cache depends on pgvector and will fail to create if pgvector is +not installed. Install pgvector before installing pg_semantic_cache. + +```sql +-- Error: could not open extension control file +-- Solution: Install pgvector first +``` + +```bash +cd /tmp +git clone https://github.com/pgvector/pgvector.git +cd pgvector +make +sudo make install +``` + +## Extension Already Exists + +When reinstalling or upgrading, PostgreSQL may report that the extension +already exists. Drop the existing extension before creating a new one. + +```sql +-- If you're upgrading, drop the old version first +DROP EXTENSION IF EXISTS pg_semantic_cache CASCADE; + +-- Then reinstall +CREATE EXTENSION pg_semantic_cache; +``` + +!!! warning "Data Loss Warning" + Dropping the extension will delete all cached data. Use `ALTER EXTENSION UPDATE` for upgrades when available. + +## Compilation Errors + +Compilation failures typically occur when PostgreSQL development headers are +missing. Install the appropriate development package for your platform. + +```bash +# Ensure development headers are installed +# Debian/Ubuntu +sudo apt-get install postgresql-server-dev-17 + +# RHEL/Rocky +sudo yum install postgresql17-devel + +# Verify pg_config works +pg_config --includedir-server +``` diff --git a/mkdocs.yml b/mkdocs.yml index 6852334..78fbff7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -69,9 +69,11 @@ markdown_extensions: nav: - Home: index.md + - Using Semantic Caching: architecture.md - Getting Started: - - Installation: installation.md - - Configuration: configuration.md + - Quick Start Guide: quick_start.md + - Building from Source: installation.md + - Configuring pg_semantic_cache: configuration.md - Usage: - Use Cases: use_cases.md - Monitoring: monitoring.md @@ -102,4 +104,5 @@ nav: - get_cost_savings: functions/get_cost_savings.md - Utility: - init_schema: functions/init_schema.md + - Troubleshooting: troubleshooting.md - FAQ: FAQ.md From 1c0401805ced4c1ff24f5c2f101daca078bc6405 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Fri, 6 Mar 2026 10:01:28 -0500 Subject: [PATCH 04/12] Updates to use_cases.md - editing --- docs/use_cases.md | 216 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 168 insertions(+), 48 deletions(-) diff --git a/docs/use_cases.md b/docs/use_cases.md index 6bc6358..2a9ecbf 100644 --- a/docs/use_cases.md +++ b/docs/use_cases.md @@ -1,16 +1,24 @@ # Use Cases -Practical examples and integration patterns for pg_semantic_cache in real-world applications. +This document provides practical examples and integration patterns for the +pg_semantic_cache extension in real-world applications. ## LLM and AI Applications +This section demonstrates how to use the pg_semantic_cache extension to +optimize costs and performance in LLM and AI-powered applications. + ### RAG (Retrieval Augmented Generation) Caching -Cache expensive LLM API calls based on semantic similarity of user questions. +The RAG caching pattern addresses the challenge of expensive LLM API calls by +caching responses based on semantic similarity of user questions. -**Problem**: LLM API calls cost $0.02-$0.05 per request. Users ask similar questions differently. +LLM API calls typically cost between $0.02 and $0.05 per request, and users +often ask similar questions using different wording. The pg_semantic_cache +extension solves this problem by caching LLM responses with semantic matching. -**Solution**: Cache LLM responses with semantic matching. +In the following example, the `SemanticLLMCache` class uses the OpenAI API to +generate embeddings and cache LLM responses based on semantic similarity. ```python import openai @@ -45,7 +53,8 @@ class SemanticLLMCache: result = cur.fetchone() if result: # Cache hit - print(f"✓ Cache HIT (similarity: {result[2]:.4f}, age: {result[3]}s)") + print(f"✓ Cache HIT (similarity: {result[2]:.4f}, " + f"age: {result[3]}s)") return json.loads(result[1]) # Cache miss - call actual LLM @@ -82,14 +91,23 @@ cache = SemanticLLMCache("dbname=mydb user=postgres") # These similar questions will hit the cache cache.ask_llm_cached("What was our Q4 revenue?") -cache.ask_llm_cached("Show me Q4 revenue") # Cache hit! -cache.ask_llm_cached("Q4 revenue please") # Cache hit! +cache.ask_llm_cached("Show me Q4 revenue") # Cache hit! +cache.ask_llm_cached("Q4 revenue please") # Cache hit! ``` -**Savings**: With 80% hit rate on 10K daily queries: **$140/day** or **$51,100/year** +An organization processing 10,000 daily queries with an 80% cache hit rate +can save approximately $140 per day or $51,100 per year using this approach. ### Chatbot Response Caching +The chatbot response caching pattern optimizes conversational AI +applications by storing and reusing responses for semantically similar +user messages. + +In the following example, the `ChatbotCache` class uses TypeScript to +implement a caching layer for chatbot responses with configurable +similarity thresholds. + ```typescript import { OpenAI } from 'openai'; import { Pool } from 'pg'; @@ -123,7 +141,7 @@ class ChatbotCache { // Check cache const cacheResult = await this.pool.query( - 'SELECT * FROM semantic_cache.get_cached_result($1, 0.92)', + `SELECT * FROM semantic_cache.get_cached_result($1, 0.92)`, [embeddingStr] ); @@ -148,7 +166,8 @@ class ChatbotCache { // Cache response await this.pool.query( - `SELECT semantic_cache.cache_query($1, $2, $3::jsonb, 3600, ARRAY['chatbot'])`, + `SELECT semantic_cache.cache_query( + $1, $2, $3::jsonb, 3600, ARRAY['chatbot'])`, [userMessage, embeddingStr, JSON.stringify({ answer })] ); @@ -159,9 +178,18 @@ class ChatbotCache { ## Analytics and Reporting +This section demonstrates how to use the pg_semantic_cache extension to +improve performance of analytical queries and reporting workloads. + ### Dashboard Query Caching -Cache expensive analytical queries that power dashboards. +The dashboard query caching pattern reduces latency for expensive +analytical queries that power business intelligence dashboards and +reporting tools. + +In the following example, the `app.get_sales_analytics` function uses +a deterministic embedding to cache analytics results for a configurable +TTL period. ```sql -- Application caching wrapper for analytics @@ -180,7 +208,8 @@ BEGIN -- (In production, use actual embedding service) query_embedding := ( SELECT array_agg( - (hashtext((query_text || params::text)::text) + i)::float / 2147483647 + (hashtext((query_text || params::text)::text) + i)::float + / 2147483647 )::text FROM generate_series(1, 1536) i ); @@ -234,16 +263,24 @@ $$ LANGUAGE plpgsql; -- Usage SELECT app.get_sales_analytics( 'Total sales and order metrics', - '{"period": "Q4", "start_date": "2024-10-01", "end_date": "2024-12-31"}'::jsonb + '{"period": "Q4", "start_date": "2024-10-01", + "end_date": "2024-12-31"}'::jsonb ); ``` ### Time-Series Report Caching +The time-series report caching pattern optimizes recurring reports by +adjusting cache TTL based on the temporal granularity of the data being +reported. + +In the following example, the `app.cached_time_series_report` function +uses different TTL values for daily, weekly, and monthly reports. + ```sql -- Cache daily/weekly/monthly reports CREATE OR REPLACE FUNCTION app.cached_time_series_report( - report_type TEXT, -- 'daily', 'weekly', 'monthly' + report_type TEXT, -- 'daily', 'weekly', 'monthly' metric_name TEXT ) RETURNS TABLE(period DATE, value NUMERIC) AS $$ DECLARE @@ -265,12 +302,15 @@ BEGIN END; -- Try cache - SELECT * INTO cached FROM semantic_cache.get_cached_result(query_emb, 0.95); + SELECT * INTO cached + FROM semantic_cache.get_cached_result(query_emb, 0.95); IF cached.found IS NOT NULL THEN -- Return cached data as table RETURN QUERY - SELECT (item->>'period')::DATE, (item->>'value')::NUMERIC + SELECT + (item->>'period')::DATE, + (item->>'value')::NUMERIC FROM jsonb_array_elements(cached.result_data->'data') item; RETURN; END IF; @@ -279,7 +319,7 @@ BEGIN PERFORM semantic_cache.cache_query( format('Report: %s - %s', report_type, metric_name), query_emb, - '{"data": []}'::jsonb, -- Your actual query results + '{"data": []}'::jsonb, -- Your actual query results ttl_seconds, ARRAY['reports', report_type] ); @@ -291,9 +331,18 @@ $$ LANGUAGE plpgsql; ## External API Results +This section demonstrates how to use the pg_semantic_cache extension to +reduce costs and latency when integrating with third-party external APIs. + ### Third-Party API Response Caching -Cache responses from expensive external APIs (weather, geocoding, stock prices, etc.). +The external API caching pattern stores responses from expensive +third-party APIs such as weather services, geocoding providers, and stock +price feeds. + +In the following example, the `APICache` class uses the +sentence-transformers library to generate embeddings and cache API +responses with semantic matching. ```python import requests @@ -310,7 +359,8 @@ class APICache: Fetch from API with semantic caching Args: - query: Natural language query (e.g., "weather in San Francisco") + query: Natural language query + (e.g., "weather in San Francisco") api_call_fn: Function to call API ttl: Cache TTL in seconds """ @@ -338,22 +388,27 @@ class APICache: import json cur.execute(""" SELECT semantic_cache.cache_query( - %s, %s, %s::jsonb, %s, ARRAY['api', 'external'] + %s, %s, %s::jsonb, %s, + ARRAY['api', 'external'] ) """, (query, embedding_str, json.dumps(api_response), ttl)) self.conn.commit() return api_response +``` -# Usage examples +The following examples demonstrate how to use the `APICache` class with +different external APIs using appropriate TTL values for each use case. +```python # Weather API def get_weather(city): cache = APICache("dbname=mydb") return cache.fetch_with_cache( f"weather in {city}", - lambda: requests.get(f"https://api.weather.com/{city}").json(), - ttl=1800 # 30 minutes + lambda: requests.get( + f"https://api.weather.com/{city}").json(), + ttl=1800 # 30 minutes ) # Geocoding API @@ -361,7 +416,8 @@ def geocode(address): cache = APICache("dbname=mydb") return cache.fetch_with_cache( f"geocode {address}", - lambda: requests.get(f"https://api.geocode.com?q={address}").json(), + lambda: requests.get( + f"https://api.geocode.com?q={address}").json(), ttl=86400 # 24 hours (addresses don't change) ) @@ -370,21 +426,31 @@ def get_stock_price(symbol): cache = APICache("dbname=mydb") return cache.fetch_with_cache( f"stock price {symbol}", - lambda: requests.get(f"https://api.stocks.com/{symbol}").json(), - ttl=60 # 1 minute (real-time data) + lambda: requests.get( + f"https://api.stocks.com/{symbol}").json(), + ttl=60 # 1 minute (real-time data) ) ``` ## Database Query Optimization +This section demonstrates how to use the pg_semantic_cache extension to +optimize expensive database queries and reduce computational overhead. + ### Expensive Join Caching -Cache results from expensive multi-table joins. +The expensive join caching pattern stores results from complex multi-table +joins to avoid repeated execution of resource-intensive database +operations. + +In the following example, the `app.get_customer_summary` function caches +the results of a complex customer data aggregation query with multiple +joins. ```sql -- Wrap expensive queries with semantic caching CREATE OR REPLACE FUNCTION app.get_customer_summary( - customer_identifier TEXT -- email, name, or ID + customer_identifier TEXT -- email, name, or ID ) RETURNS JSONB AS $$ DECLARE query_emb TEXT; @@ -393,13 +459,18 @@ DECLARE BEGIN -- Simple embedding generation (replace with actual service) query_emb := ( - SELECT array_agg((hashtext(customer_identifier || i::text)::float / 2147483647)::float4)::text + SELECT array_agg( + (hashtext(customer_identifier || i::text)::float + / 2147483647)::float4 + )::text FROM generate_series(1, 1536) i ); -- Check cache SELECT * INTO cached - FROM semantic_cache.get_cached_result(query_emb, 0.98, 300); + FROM semantic_cache.get_cached_result( + query_emb, 0.98, 300 + ); IF cached.found IS NOT NULL THEN RETURN cached.result_data; @@ -444,19 +515,37 @@ $$ LANGUAGE plpgsql; -- Usage - these similar queries hit cache: SELECT app.get_customer_summary('[email protected]'); -SELECT app.get_customer_summary('john@example.com'); -- Exact match -SELECT app.get_customer_summary('John Doe'); -- By name -SELECT app.get_customer_summary('john'); -- Partial match +SELECT app.get_customer_summary('john@example.com'); + -- Exact match +SELECT app.get_customer_summary('John Doe'); + -- By name +SELECT app.get_customer_summary('john'); + -- Partial match ``` ## Scheduled Maintenance +This section demonstrates how to implement automated maintenance routines +for the pg_semantic_cache extension to ensure optimal performance and +storage use. + ### Automatic Cache Cleanup +The automatic cache cleanup pattern uses scheduled maintenance functions +to evict expired entries and optimize cache storage on a regular basis. + +In the following example, the `semantic_cache.scheduled_maintenance` +function performs multiple maintenance operations and returns timing +information. + ```sql -- Create maintenance function CREATE OR REPLACE FUNCTION semantic_cache.scheduled_maintenance() -RETURNS TABLE(operation TEXT, affected_rows BIGINT, duration INTERVAL) AS $$ +RETURNS TABLE( + operation TEXT, + affected_rows BIGINT, + duration INTERVAL +) AS $$ DECLARE start_time TIMESTAMPTZ; evicted BIGINT; @@ -502,7 +591,12 @@ SELECT * FROM semantic_cache.scheduled_maintenance(); ### Cache Warming -Pre-populate cache with common queries. +The cache warming pattern pre-populates the cache with common queries to +improve application performance during startup or after cache +invalidation. + +In the following example, the `app.warm_cache` function pre-caches +frequently accessed dashboard queries to reduce initial page load times. ```sql -- Warm cache with popular queries @@ -514,8 +608,10 @@ BEGIN -- Example: Pre-cache common dashboard queries PERFORM semantic_cache.cache_query( 'Total sales this month', - (SELECT array_agg(random()::float4)::text FROM generate_series(1, 1536)), - (SELECT jsonb_build_object('total', SUM(amount)) FROM orders + (SELECT array_agg(random()::float4)::text + FROM generate_series(1, 1536)), + (SELECT jsonb_build_object('total', SUM(amount)) + FROM orders WHERE created_at >= DATE_TRUNC('month', NOW())), 3600, ARRAY['dashboard', 'warmed'] @@ -534,9 +630,19 @@ SELECT app.warm_cache(); ## Multi-Language Support +This section demonstrates how to use the pg_semantic_cache extension to +support caching across multiple languages using multilingual embedding +models. + ### Caching Across Languages -Cache queries regardless of language using embeddings. +The multilingual caching pattern enables cache hits across different +languages by using multilingual embedding models that map semantically +similar queries. + +In the following example, the `MultilingualCache` class uses the +multilingual mpnet model to cache queries across English, Spanish, French, +and Portuguese. ```python from sentence_transformers import SentenceTransformer @@ -546,7 +652,9 @@ class MultilingualCache: def __init__(self, db_conn_string): self.conn = psycopg2.connect(db_conn_string) # Use multilingual model - self.encoder = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2') + self.encoder = SentenceTransformer( + 'paraphrase-multilingual-mpnet-base-v2' + ) def cached_query(self, query_text, language): """Cache works across languages!""" @@ -556,7 +664,8 @@ class MultilingualCache: # Check cache (works for all languages) cur = self.conn.cursor() cur.execute(""" - SELECT * FROM semantic_cache.get_cached_result(%s, 0.90) + SELECT * + FROM semantic_cache.get_cached_result(%s, 0.90) """, (embedding_str,)) result = cur.fetchone() @@ -566,18 +675,29 @@ class MultilingualCache: # Execute query and cache # ... your query logic ... -# These queries in different languages can hit the same cache entry! +# These queries in different languages can hit the same cache +# entry! cache = MultilingualCache("dbname=mydb") cache.cached_query("What is the total revenue?", "en") -cache.cached_query("¿Cuál es el ingreso total?", "es") # Cache hit! -cache.cached_query("Quel est le revenu total?", "fr") # Cache hit! -cache.cached_query("Qual é a receita total?", "pt") # Cache hit! +cache.cached_query("¿Cuál es el ingreso total?", "es") + # Cache hit! +cache.cached_query("Quel est le revenu total?", "fr") + # Cache hit! +cache.cached_query("Qual é a receita total?", "pt") + # Cache hit! ``` ## Next Steps -- [Functions Reference](functions/index.md) - Learn all available functions -- [Monitoring](monitoring.md) - Track cache performance -- [Configuration](configuration.md) - Optimize for your use case -- [FAQ](FAQ.md) - Common questions and solutions +The following resources provide additional information about the +pg_semantic_cache extension: + +- The [Functions Reference](functions/index.md) document describes all + available functions. +- The [Monitoring](monitoring.md) document explains how to track cache + performance. +- The [Configuration](configuration.md) document provides optimization + guidance for your use case. +- The [FAQ](FAQ.md) document answers common questions and provides + solutions. From 4e779ffb0456ee39b4db17b7a7ac6645ef482cb0 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Mon, 9 Mar 2026 10:43:24 -0400 Subject: [PATCH 05/12] Added functions.md to control access to functions --- README.md | 164 +++++++++++++++++++++------------------------- docs/functions.md | 26 ++++++++ 2 files changed, 101 insertions(+), 89 deletions(-) create mode 100644 docs/functions.md diff --git a/README.md b/README.md index 359969b..267f0ba 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,4 @@ -
- -# 🗄️ pg_semantic_cache +# pg_semantic_cache ### Intelligent Query Result Caching for PostgreSQL @@ -10,27 +8,27 @@ [![License](https://img.shields.io/badge/License-PostgreSQL-blue.svg)](LICENSE) [![pgvector](https://img.shields.io/badge/Requires-pgvector-orange.svg)](https://github.com/pgvector/pgvector) -[Quick Start](#-quick-start) • -[Features](#-key-features) • -[API Reference](#-api-reference) • -[Examples](#-integration-examples) • -[Performance](#-performance) +[Quick Start](#quick-start) • +[Features](#key-features) • +[API Reference](#api-reference) • +[Examples](#integration-examples) • +[Performance](#performance)
--- -## 🎯 Overview +## Overview `pg_semantic_cache` enables **semantic query result caching** in PostgreSQL. Unlike traditional caching that requires exact query matches, this extension uses vector embeddings to find and retrieve cached results for semantically similar queries. ### Perfect For -- 🤖 **AI/LLM Applications** - Cache expensive LLM responses for similar questions -- 🔍 **RAG Pipelines** - Speed up retrieval-augmented generation workflows -- 📊 **Analytics Dashboards** - Reuse results for similar analytical queries -- 💬 **Chatbots** - Reduce latency by caching semantically similar conversations -- 🔎 **Search Systems** - Handle query variations without re-execution +- **AI/LLM Applications** - Cache expensive LLM responses for similar questions +- **RAG Pipelines** - Speed up retrieval-augmented generation workflows +- **Analytics Dashboards** - Reuse results for similar analytical queries +- **Chatbots** - Reduce latency by caching semantically similar conversations +- **Search Systems** - Handle query variations without re-execution ### How It Works @@ -43,40 +41,42 @@ │ Similar cached query found: │ │ "Show me revenue for last quarter" (similarity: 97%) │ │ ↓ Return cached result (2ms instead of 500ms) │ -│ ✅ Cache HIT - 250x faster! │ +│ Cache HIT - 250x faster! │ └─────────────────────────────────────────────────────────────┘ ``` --- -## ✨ Key Features +## Key Features + +`pg_semantic_cache` provides a comprehensive set of capabilities designed for production use. -### 🧠 Semantic Intelligence +### Semantic Intelligence - **Vector-based matching** using pgvector for similarity search - **Configurable similarity thresholds** (default: 95%) - **Cosine distance** calculations for accurate semantic matching - Support for any embedding model (OpenAI, Cohere, custom, etc.) -### ⚡ High Performance +### High Performance - **Sub-5ms cache lookups** with optimized vector indexing - **Efficient storage** with minimal overhead per entry - **Fast eviction** mechanisms to maintain cache health - **Index optimization** support for large-scale deployments (100k+ entries) -### 🎛️ Flexible Cache Management +### Flexible Cache Management - **Multiple eviction policies**: LRU, LFU, and TTL-based - **Per-query TTL** or global defaults - **Tag-based organization** for grouped invalidation - **Pattern-based invalidation** using SQL LIKE patterns - **Auto-eviction** with configurable policies -### 📊 Observability & Monitoring +### Observability & Monitoring - **Real-time statistics**: hit rate, total entries, cache size - **Health metrics**: expired entries, memory usage, eviction counts - **Performance tracking**: lookup times, similarity scores - **Built-in views** for monitoring and analysis -### 🔧 Production Ready +### Production Ready - **Comprehensive logging** with configurable levels - **Crash-safe** error handling - **ACID compliance** for cache operations @@ -85,7 +85,7 @@ --- -## 🚀 Quick Start +## Quick Start ### Installation @@ -130,13 +130,13 @@ SELECT semantic_cache.init_schema(); SELECT * FROM semantic_cache.cache_stats(); ``` -✅ **You're ready to go!** +**You're ready to go!** --- -## 📘 Basic Usage +## Basic Usage -### 1️⃣ Cache a Query Result +### 1. Cache a Query Result ```sql SELECT semantic_cache.cache_query( @@ -149,7 +149,7 @@ SELECT semantic_cache.cache_query( -- Returns: cache_id (bigint) ``` -### 2️⃣ Retrieve Cached Result +### 2. Retrieve Cached Result ```sql SELECT * FROM semantic_cache.get_cached_result( @@ -167,7 +167,7 @@ SELECT * FROM semantic_cache.get_cached_result( true | {"total": 150, "orders"... | 0.973 | 245 ``` -### 3️⃣ Monitor Performance +### 3. Monitor Performance ```sql -- Comprehensive statistics @@ -182,7 +182,9 @@ SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; --- -## 📚 API Reference +## API Reference + +The extension provides a complete set of SQL functions for caching, eviction, monitoring, and configuration. ### Core Functions @@ -220,6 +222,8 @@ Retrieve a cached result by semantic similarity. ### Cache Eviction +Multiple eviction strategies are available to manage cache size and freshness. + #### `evict_expired()` Remove all expired cache entries. @@ -249,7 +253,7 @@ SELECT semantic_cache.auto_evict(); ``` #### `clear_cache()` -⚠️ Remove **all** cache entries (use with caution). +Remove **all** cache entries (use with caution). ```sql SELECT semantic_cache.clear_cache(); @@ -259,6 +263,8 @@ SELECT semantic_cache.clear_cache(); ### Statistics & Monitoring +Built-in functions and views provide real-time visibility into cache performance. + #### `cache_stats()` Get comprehensive cache statistics. @@ -280,6 +286,8 @@ hit_rate_percent | Hit rate as a percentage ### Configuration +All runtime settings can be configured through the cache configuration table. + Configuration settings are stored in the `semantic_cache.cache_config` table. You can view and modify them directly: ```sql @@ -305,7 +313,9 @@ SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy'; --- -## 🔨 Build & Development +## Build & Development + +The extension uses the standard PostgreSQL PGXS build system for compilation and installation. ### Build Commands @@ -341,35 +351,37 @@ Fully compatible with all PostgreSQL-supported platforms: | Platform | Status | Notes | |----------|--------|-------| -| 🐧 Linux | ✅ | Ubuntu, Debian, RHEL, Rocky, Fedora, etc. | -| 🍎 macOS | ✅ | Intel & Apple Silicon | -| 🪟 Windows | ✅ | Via MinGW or MSVC | -| 🔧 BSD | ✅ | FreeBSD, OpenBSD | +| Linux | Supported | Ubuntu, Debian, RHEL, Rocky, Fedora, etc. | +| macOS | Supported | Intel & Apple Silicon | +| Windows | Supported | Via MinGW or MSVC | +| BSD | Supported | FreeBSD, OpenBSD | ### Tested PostgreSQL Versions | Version | Status | Notes | |---------|--------|-------| -| PG 14 | ✅ Tested | Full support | -| PG 15 | ✅ Tested | Full support | -| PG 16 | ✅ Tested | Full support | -| PG 17 | ✅ Tested | Full support | -| PG 18 | ✅ Tested | Full support | -| Future versions | ✅ Expected | Standard PGXS compatibility | +| PG 14 | Tested | Full support | +| PG 15 | Tested | Full support | +| PG 16 | Tested | Full support | +| PG 17 | Tested | Full support | +| PG 18 | Tested | Full support | +| Future versions | Expected | Standard PGXS compatibility | --- -## ⚡ Performance +## Performance + +The extension is optimized for sub-millisecond cache lookups with minimal overhead. ### Runtime Metrics | Operation | Performance | Notes | |-----------|-------------|-------| -| 🔍 Cache lookup | **< 5ms** | With optimized vector index | -| 💾 Cache insert | **< 10ms** | Including embedding storage | -| 🗑️ Eviction (1000 entries) | **< 50ms** | Efficient batch operations | -| 📊 Statistics query | **< 1ms** | Materialized views | -| 🎯 Similarity search | **2-3ms avg** | IVFFlat/HNSW indexed | +| Cache lookup | **< 5ms** | With optimized vector index | +| Cache insert | **< 10ms** | Including embedding storage | +| Eviction (1000 entries) | **< 50ms** | Efficient batch operations | +| Statistics query | **< 1ms** | Materialized views | +| Similarity search | **2-3ms avg** | IVFFlat/HNSW indexed | ### Expected Hit Rates @@ -407,7 +419,9 @@ Evict LRU | 500 | ~25ms | 0.05ms --- -## 🏭 Production Deployment +## Production Deployment + +For production environments, optimize PostgreSQL settings and set up automated maintenance. ### PostgreSQL Configuration @@ -451,6 +465,8 @@ SELECT * FROM cron.job WHERE jobname LIKE 'semantic-cache%'; ### Index Optimization +Choose the appropriate vector index strategy based on your cache size. + #### Small to Medium Caches (< 100k entries) Default IVFFlat index works well out of the box. @@ -483,6 +499,8 @@ CREATE INDEX idx_cache_embedding_hnsw ### Monitoring Setup +Set up custom views to monitor cache health and performance metrics. + Create a monitoring dashboard view: ```sql @@ -501,6 +519,8 @@ SELECT * FROM semantic_cache.production_dashboard; ### High Availability Considerations +The cache integrates seamlessly with PostgreSQL's replication and backup mechanisms. + ```sql -- Regular backups of cache metadata (optional) pg_dump -U postgres -d your_db -t semantic_cache.cache_entries -t semantic_cache.cache_metadata -F c -f cache_backup.dump @@ -511,7 +531,7 @@ pg_dump -U postgres -d your_db -t semantic_cache.cache_entries -t semantic_cache --- -## 🔗 Integration Examples +## Integration Examples ### Python with OpenAI @@ -569,10 +589,10 @@ class SemanticCache: result = cur.fetchone() if result and result[0]: # Cache hit - print(f"✅ Cache HIT (similarity: {result[2]:.3f}, age: {result[3]}s)") + print(f"Cache HIT (similarity: {result[2]:.3f}, age: {result[3]}s)") return json.loads(result[1]) else: - print("❌ Cache MISS") + print("Cache MISS") return None def stats(self) -> Dict[str, Any]: @@ -651,10 +671,10 @@ class SemanticCache { const { found, result_data, similarity_score, age_seconds } = res.rows[0]; if (found) { - console.log(`✅ Cache HIT (similarity: ${similarity_score.toFixed(3)}, age: ${age_seconds}s)`); + console.log(`Cache HIT (similarity: ${similarity_score.toFixed(3)}, age: ${age_seconds}s)`); return JSON.parse(result_data); } else { - console.log('❌ Cache MISS'); + console.log('Cache MISS'); return null; } } @@ -689,7 +709,7 @@ For additional integration patterns and use cases, see: --- -## 🤝 Contributing +## Contributing Contributions are welcome! This extension is built with standard PostgreSQL C APIs. @@ -708,48 +728,14 @@ Contributions are welcome! This extension is built with standard PostgreSQL C AP --- -## 📄 License +## License This project is licensed under the **PostgreSQL License**. --- -## 📞 Support & Resources - -### Documentation -- **Getting Started**: [GETTING_STARTED.md](GETTING_STARTED.md) -- **API Examples**: `examples/usage_examples.sql` -- **Logging Guide**: [LOGGING_FEATURE_GUIDE.md](LOGGING_FEATURE_GUIDE.md) -- **PostgreSQL Documentation**: [postgresql.org/docs](https://www.postgresql.org/docs/) +## Support & Resources -### Getting Help - **GitHub Issues**: Report bugs and request features - **Example Code**: Check `examples/` directory for usage patterns - **Test Suite**: See `test/` directory for comprehensive examples - -### Related Projects -- [pgvector](https://github.com/pgvector/pgvector) - Vector similarity search for PostgreSQL -- [pg_cron](https://github.com/citusdata/pg_cron) - Job scheduler for PostgreSQL - ---- - -## 🏆 Credits - -**Created by**: Muhammad Aqeel - PostgreSQL Infrastructure Engineer - -**Built with**: -- Standard PostgreSQL C API -- [pgvector](https://github.com/pgvector/pgvector) for vector operations -- PGXS build infrastructure - ---- - -
- -### ⭐ Star this repository if you find it useful! - -**pg_semantic_cache** - Intelligent semantic caching for PostgreSQL - -[Quick Start](#-quick-start) • [Documentation](#-api-reference) • [Examples](#-integration-examples) - -
diff --git a/docs/functions.md b/docs/functions.md new file mode 100644 index 0000000..2dbfee5 --- /dev/null +++ b/docs/functions.md @@ -0,0 +1,26 @@ +# Using pg_semantic_cache Functions + +This page provides a comprehensive reference for all available functions in the pg_semantic_cache extension. + +## Function Reference + +| Function | Description | +|----------|-------------| +| [auto_evict](functions/auto_evict.md) | Automatically evicts entries based on configured policy (LRU, LFU, or TTL). | +| [cache_hit_rate](functions/cache_hit_rate.md) | Gets current cache hit rate as a percentage. | +| [cache_query](functions/cache_query.md) | Stores a query result with its vector embedding in the cache. | +| [cache_stats](functions/cache_stats.md) | Gets comprehensive cache statistics including hits, misses, and hit rate. | +| [clear_cache](functions/clear_cache.md) | Removes all cache entries (use with caution). | +| [evict_expired](functions/evict_expired.md) | Removes all expired cache entries based on TTL. | +| [evict_lfu](functions/evict_lfu.md) | Evicts least frequently used entries, keeping only specified count. | +| [evict_lru](functions/evict_lru.md) | Evicts least recently used entries, keeping only specified count. | +| [get_cached_result](functions/get_cached_result.md) | Retrieves a cached result by semantic similarity search. | +| [get_cost_savings](functions/get_cost_savings.md) | Calculates estimated cost savings from cache usage. | +| [get_index_type](functions/get_index_type.md) | Gets the current vector index type (IVFFlat or HNSW). | +| [get_vector_dimension](functions/get_vector_dimension.md) | Gets the current vector embedding dimension. | +| [init_schema](functions/init_schema.md) | Initializes cache schema and creates required tables, indexes, and views. | +| [invalidate_cache](functions/invalidate_cache.md) | Invalidates cache entries by pattern matching or tags. | +| [log_cache_access](functions/log_cache_access.md) | Logs cache access events for debugging and analysis. | +| [rebuild_index](functions/rebuild_index.md) | Rebuilds the vector similarity index for optimal performance. | +| [set_index_type](functions/set_index_type.md) | Sets the vector index type for similarity search. | +| [set_vector_dimension](functions/set_vector_dimension.md) | Sets the vector embedding dimension. | From 270f38db458b70def7c4f9381feb66a072e0a92b Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Mon, 9 Mar 2026 10:48:57 -0400 Subject: [PATCH 06/12] Added integration.md for integration examples --- integration.md | 178 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 178 insertions(+) create mode 100644 integration.md diff --git a/integration.md b/integration.md new file mode 100644 index 0000000..d6cc83e --- /dev/null +++ b/integration.md @@ -0,0 +1,178 @@ +# Integration Examples + +Refer to the following integration examples when configuring pg_semantic_cache. + +### Python with OpenAI + +Complete example integrating semantic cache with OpenAI embeddings: + +```python +import psycopg2 +import openai +import json +from typing import Optional, Dict, Any + +class SemanticCache: + """Semantic cache wrapper for PostgreSQL""" + + def __init__(self, conn_string: str, openai_api_key: str): + self.conn = psycopg2.connect(conn_string) + self.client = openai.OpenAI(api_key=openai_api_key) + + def _get_embedding(self, text: str) -> str: + """Generate embedding using OpenAI""" + response = self.client.embeddings.create( + model="text-embedding-ada-002", + input=text + ) + embedding = response.data[0].embedding + return f"[{','.join(map(str, embedding))}]" + + def cache(self, query: str, result: Dict[Any, Any], + ttl: int = 3600, tags: Optional[list] = None) -> int: + """Cache a query result""" + embedding = self._get_embedding(query) + + with self.conn.cursor() as cur: + cur.execute(""" + SELECT semantic_cache.cache_query( + %s::text, %s::text, %s::jsonb, %s::int, %s::text[] + ) + """, (query, embedding, json.dumps(result), ttl, tags)) + cache_id = cur.fetchone()[0] + self.conn.commit() + return cache_id + + def get(self, query: str, similarity: float = 0.95, + max_age: Optional[int] = None) -> Optional[Dict[Any, Any]]: + """Retrieve from cache""" + embedding = self._get_embedding(query) + + with self.conn.cursor() as cur: + cur.execute(""" + SELECT found, result_data, similarity_score, age_seconds + FROM semantic_cache.get_cached_result( + %s::text, %s::float4, %s::int + ) + """, (embedding, similarity, max_age)) + + result = cur.fetchone() + if result and result[0]: # Cache hit + print(f"Cache HIT (similarity: {result[2]:.3f}, age: {result[3]}s)") + return json.loads(result[1]) + else: + print("Cache MISS") + return None + + def stats(self) -> Dict[str, Any]: + """Get cache statistics""" + with self.conn.cursor() as cur: + cur.execute("SELECT * FROM semantic_cache.cache_stats()") + columns = [desc[0] for desc in cur.description] + values = cur.fetchone() + return dict(zip(columns, values)) + +# Usage example +cache = SemanticCache( + conn_string="dbname=mydb user=postgres", + openai_api_key="sk-..." +) + +# Try to get from cache, compute if miss +def get_revenue_data(query: str) -> Dict: + result = cache.get(query, similarity=0.95) + + if result: + return result # Cache hit! + + # Cache miss - compute the result + result = expensive_database_query() # Your expensive query here + cache.cache(query, result, ttl=3600, tags=['revenue', 'analytics']) + return result + +# Example queries +data1 = get_revenue_data("What was Q4 2024 revenue?") +data2 = get_revenue_data("Show me revenue for last quarter") # Will hit cache! +data3 = get_revenue_data("Q4 sales figures?") # Will also hit cache! + +# View statistics +print(cache.stats()) +``` + +### Node.js with OpenAI + +```javascript +const { Client } = require('pg'); +const OpenAI = require('openai'); + +class SemanticCache { + constructor(pgConfig, openaiApiKey) { + this.client = new Client(pgConfig); + this.openai = new OpenAI({ apiKey: openaiApiKey }); + this.client.connect(); + } + + async getEmbedding(text) { + const response = await this.openai.embeddings.create({ + model: 'text-embedding-ada-002', + input: text + }); + const embedding = response.data[0].embedding; + return `[${embedding.join(',')}]`; + } + + async cache(query, result, ttl = 3600, tags = null) { + const embedding = await this.getEmbedding(query); + const res = await this.client.query( + `SELECT semantic_cache.cache_query($1::text, $2::text, $3::jsonb, $4::int, $5::text[])`, + [query, embedding, JSON.stringify(result), ttl, tags] + ); + return res.rows[0].cache_query; + } + + async get(query, similarity = 0.95, maxAge = null) { + const embedding = await this.getEmbedding(query); + const res = await this.client.query( + `SELECT * FROM semantic_cache.get_cached_result($1::text, $2::float4, $3::int)`, + [embedding, similarity, maxAge] + ); + + const { found, result_data, similarity_score, age_seconds } = res.rows[0]; + + if (found) { + console.log(`Cache HIT (similarity: ${similarity_score.toFixed(3)}, age: ${age_seconds}s)`); + return JSON.parse(result_data); + } else { + console.log('Cache MISS'); + return null; + } + } + + async stats() { + const res = await this.client.query('SELECT * FROM semantic_cache.cache_stats()'); + return res.rows[0]; + } +} + +// Usage +const cache = new SemanticCache( + { host: 'localhost', database: 'mydb', user: 'postgres' }, + 'sk-...' +); + +async function getRevenueData(query) { + const cached = await cache.get(query); + if (cached) return cached; + + const result = await expensiveDatabaseQuery(); + await cache.cache(query, result, 3600, ['revenue', 'analytics']); + return result; +} +``` + +### More Examples + +For additional integration patterns and use cases, see: +- `examples/usage_examples.sql` - Comprehensive SQL examples +- `test/benchmark.sql` - Performance testing examples + From ecda1f4048018b43a3fb400207a23f72c67252af Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Mon, 9 Mar 2026 11:57:16 -0400 Subject: [PATCH 07/12] Updates to pg_semantic_cache documentation --- README.md | 769 ++++---------------------- docs/LICENSE.md | 19 + docs/architecture.md | 14 +- docs/development.md | 53 ++ docs/functions.md | 99 +++- docs/index.md | 12 + integration.md => docs/integration.md | 39 +- docs/performance.md | 70 +++ docs/production.md | 118 ++++ 9 files changed, 513 insertions(+), 680 deletions(-) create mode 100644 docs/LICENSE.md create mode 100644 docs/development.md rename integration.md => docs/integration.md (78%) create mode 100644 docs/performance.md create mode 100644 docs/production.md diff --git a/README.md b/README.md index 267f0ba..a8590a1 100644 --- a/README.md +++ b/README.md @@ -1,286 +1,71 @@ # pg_semantic_cache -### Intelligent Query Result Caching for PostgreSQL - -**Leverage vector embeddings to cache and retrieve query results based on semantic similarity** - -[![PostgreSQL](https://img.shields.io/badge/PostgreSQL-14%20|%2015%20|%2016%20|%2017%20|%2018-336791?style=flat&logo=postgresql&logoColor=white)](https://www.postgresql.org/) -[![License](https://img.shields.io/badge/License-PostgreSQL-blue.svg)](LICENSE) -[![pgvector](https://img.shields.io/badge/Requires-pgvector-orange.svg)](https://github.com/pgvector/pgvector) - -[Quick Start](#quick-start) • -[Features](#key-features) • -[API Reference](#api-reference) • -[Examples](#integration-examples) • -[Performance](#performance) - - - ---- - -## Overview - -`pg_semantic_cache` enables **semantic query result caching** in PostgreSQL. Unlike traditional caching that requires exact query matches, this extension uses vector embeddings to find and retrieve cached results for semantically similar queries. - -### Perfect For - -- **AI/LLM Applications** - Cache expensive LLM responses for similar questions -- **RAG Pipelines** - Speed up retrieval-augmented generation workflows -- **Analytics Dashboards** - Reuse results for similar analytical queries -- **Chatbots** - Reduce latency by caching semantically similar conversations -- **Search Systems** - Handle query variations without re-execution - -### How It Works - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Query: "What was Q4 2024 revenue?" │ -│ ↓ Generate embedding via OpenAI/etc │ -│ ↓ Check semantic cache (similarity > 95%) │ -│ │ -│ Similar cached query found: │ -│ "Show me revenue for last quarter" (similarity: 97%) │ -│ ↓ Return cached result (2ms instead of 500ms) │ -│ Cache HIT - 250x faster! │ -└─────────────────────────────────────────────────────────────┘ -``` +pg_semantic_cache allows you to leverage vector embeddings to cache and retrieve query results based on semantic similarity. + +[pg_semantic_cache Introduction](docs/index.md) +[pg_semantic_cache Architecture](docs/architecture.md) +[pg_semantic_cache Use Cases](docs/use_cases.md) +[Quick Start](docs/quick_start.md) +[Installation](docs/installation.md) +[Configuration](docs/configuration.md) +[Deploying in a Production Environment](docs/deployment.md) +[Using pg_semantic_cache Functions](docs/functions.md) +[Sample Integrations](docs/integration.md) +[Monitoring](docs/logging.md) +[Performance and Benchmarking](docs/performance.md) +[Logging](docs/logging.md) +[Troubleshooting](docs/troubleshooting.md) +[FAQ](docs/FAQ.md) +[Developers](docs/development.md) --- -## Key Features - -`pg_semantic_cache` provides a comprehensive set of capabilities designed for production use. - -### Semantic Intelligence -- **Vector-based matching** using pgvector for similarity search -- **Configurable similarity thresholds** (default: 95%) -- **Cosine distance** calculations for accurate semantic matching -- Support for any embedding model (OpenAI, Cohere, custom, etc.) - -### High Performance -- **Sub-5ms cache lookups** with optimized vector indexing -- **Efficient storage** with minimal overhead per entry -- **Fast eviction** mechanisms to maintain cache health -- **Index optimization** support for large-scale deployments (100k+ entries) - -### Flexible Cache Management -- **Multiple eviction policies**: LRU, LFU, and TTL-based -- **Per-query TTL** or global defaults -- **Tag-based organization** for grouped invalidation -- **Pattern-based invalidation** using SQL LIKE patterns -- **Auto-eviction** with configurable policies - -### Observability & Monitoring -- **Real-time statistics**: hit rate, total entries, cache size -- **Health metrics**: expired entries, memory usage, eviction counts -- **Performance tracking**: lookup times, similarity scores -- **Built-in views** for monitoring and analysis - -### Production Ready -- **Comprehensive logging** with configurable levels -- **Crash-safe** error handling -- **ACID compliance** for cache operations -- **Multi-version support**: PostgreSQL 14 through 18+ -- **Standard PGXS** build system for easy packaging - ---- +`pg_semantic_cache` enables **semantic query result caching** for PostgreSQL. Unlike traditional caching that requires exact query matches, this extension uses vector embeddings to find and retrieve cached results for semantically similar queries. ## Quick Start -### Installation - -**Step 1: Install Dependencies** - -```bash -# Ubuntu/Debian -sudo apt-get install postgresql-16 postgresql-server-dev-16 postgresql-16-pgvector - -# Rocky Linux/RHEL -sudo dnf install postgresql16 postgresql16-devel postgresql16-contrib - -# macOS (with Homebrew) -brew install postgresql@16 -# Install pgvector separately -``` - -**Step 2: Build & Install Extension** - -```bash -git clone https://github.com/pgedge/pg_semantic_cache.git -cd pg_semantic_cache - -make clean && make -sudo make install -``` - -**Step 3: Enable in PostgreSQL** - -```sql --- Connect to your database -psql -U postgres -d your_database - --- Install required extensions -CREATE EXTENSION IF NOT EXISTS vector; -CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; - --- Initialize the cache schema (run once per database) -SELECT semantic_cache.init_schema(); - --- Verify installation -SELECT * FROM semantic_cache.cache_stats(); -``` - -**You're ready to go!** - ---- - -## Basic Usage - -### 1. Cache a Query Result - -```sql -SELECT semantic_cache.cache_query( - query_text := 'SELECT * FROM orders WHERE status = ''completed''', - embedding := '[0.1, 0.2, 0.3, ...]'::text, -- From OpenAI, Cohere, etc. - result_data := '{"total": 150, "orders": [...]}'::jsonb, - ttl_seconds := 3600, -- 1 hour - tags := ARRAY['orders', 'analytics'] -- Optional tags -); --- Returns: cache_id (bigint) -``` - -### 2. Retrieve Cached Result - -```sql -SELECT * FROM semantic_cache.get_cached_result( - embedding := '[0.11, 0.19, 0.31, ...]'::text, -- Similar query embedding - similarity_threshold := 0.95, -- 95% similarity required - max_age_seconds := NULL -- Any age (optional) -); --- Returns: (found boolean, result_data jsonb, similarity_score float4, age_seconds int) -``` - -**Example Result:** -``` - found | result_data | similarity_score | age_seconds --------+----------------------------+------------------+------------- - true | {"total": 150, "orders"... | 0.973 | 245 -``` - -### 3. Monitor Performance - -```sql --- Comprehensive statistics -SELECT * FROM semantic_cache.cache_stats(); - --- Health overview (includes hit rate and more details) -SELECT * FROM semantic_cache.cache_health; - --- Recent cache activity -SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; -``` - ---- - -## API Reference - -The extension provides a complete set of SQL functions for caching, eviction, monitoring, and configuration. - -### Core Functions - -#### `init_schema()` -Initialize the cache schema, creating all required tables, indexes, and views. - -```sql -SELECT semantic_cache.init_schema(); -``` - -#### `cache_query(query_text, embedding, result_data, ttl_seconds, tags)` -Store a query result with its embedding for future retrieval. - -**Parameters:** -- `query_text` (text) - The original query text -- `embedding` (text) - Vector embedding as text: `'[0.1, 0.2, ...]'` -- `result_data` (jsonb) - The query result to cache -- `ttl_seconds` (int) - Time-to-live in seconds -- `tags` (text[]) - Optional tags for organization - -**Returns:** `bigint` - Cache entry ID - -#### `get_cached_result(embedding, similarity_threshold, max_age_seconds)` -Retrieve a cached result by semantic similarity. - -**Parameters:** -- `embedding` (text) - Query embedding to search for -- `similarity_threshold` (float4) - Minimum similarity (0.0 to 1.0) -- `max_age_seconds` (int) - Maximum age in seconds (NULL = any age) - -**Returns:** `record` - `(found boolean, result_data jsonb, similarity_score float4, age_seconds int)` - - ---- - -### Cache Eviction - -Multiple eviction strategies are available to manage cache size and freshness. - -#### `evict_expired()` -Remove all expired cache entries. - -```sql -SELECT semantic_cache.evict_expired(); -- Returns count of evicted entries -``` - -#### `evict_lru(keep_count)` -Evict least recently used entries, keeping only the specified number of most recent entries. - -```sql -SELECT semantic_cache.evict_lru(1000); -- Keep only 1000 most recently used entries -``` - -#### `evict_lfu(keep_count)` -Evict least frequently used entries, keeping only the specified number of most frequently used entries. +The following steps walk you through installing and configuring the extension. -```sql -SELECT semantic_cache.evict_lfu(1000); -- Keep only 1000 most frequently used entries -``` +1. Install the required dependencies for your operating system. -#### `auto_evict()` -Automatically evict entries based on configured policy (LRU, LFU, or TTL). + ```bash + # Ubuntu/Debian + sudo apt-get install postgresql-16 postgresql-server-dev-16 postgresql-16-pgvector -```sql -SELECT semantic_cache.auto_evict(); -``` + # Rocky Linux/RHEL + sudo dnf install postgresql16 postgresql16-devel postgresql16-contrib -#### `clear_cache()` -Remove **all** cache entries (use with caution). + # macOS (with Homebrew) + brew install postgresql@16 + # Install pgvector separately + ``` -```sql -SELECT semantic_cache.clear_cache(); -``` +2. Build and install the extension from source. ---- + ```bash + git clone https://github.com/pgedge/pg_semantic_cache.git + cd pg_semantic_cache -### Statistics & Monitoring + make clean && make + sudo make install + ``` -Built-in functions and views provide real-time visibility into cache performance. +3. Enable the extension in your PostgreSQL database. -#### `cache_stats()` -Get comprehensive cache statistics. + ```sql + -- Connect to your database + psql -U postgres -d your_database -```sql -SELECT * FROM semantic_cache.cache_stats(); -``` + -- Install required extensions + CREATE EXTENSION IF NOT EXISTS vector; + CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; -**Returns:** -``` -total_entries | Total number of cached queries -total_hits | Total number of cache hits -total_misses | Total number of cache misses -hit_rate_percent | Hit rate as a percentage -``` + -- Initialize the cache schema (run once per database) + SELECT semantic_cache.init_schema(); -**Note:** For more detailed statistics including cache size, expired entries, and access patterns, use the `semantic_cache.cache_health` view. + -- Verify installation + SELECT * FROM semantic_cache.cache_stats(); + ``` --- @@ -313,429 +98,101 @@ SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy'; --- -## Build & Development - -The extension uses the standard PostgreSQL PGXS build system for compilation and installation. - -### Build Commands - -```bash -# Standard build -make clean && make -sudo make install - -# Run tests -make installcheck - -# Development build with debug symbols -make CFLAGS="-g -O0" clean all - -# View build configuration -make info -``` - -### Multi-Version PostgreSQL Build - -Build for multiple PostgreSQL versions simultaneously: - -```bash -for PG in 14 15 16 17 18; do - echo "Building for PostgreSQL $PG..." - PG_CONFIG=/usr/pgsql-${PG}/bin/pg_config make clean install -done -``` - -### Cross-Platform Support - -Fully compatible with all PostgreSQL-supported platforms: - -| Platform | Status | Notes | -|----------|--------|-------| -| Linux | Supported | Ubuntu, Debian, RHEL, Rocky, Fedora, etc. | -| macOS | Supported | Intel & Apple Silicon | -| Windows | Supported | Via MinGW or MSVC | -| BSD | Supported | FreeBSD, OpenBSD | - -### Tested PostgreSQL Versions - -| Version | Status | Notes | -|---------|--------|-------| -| PG 14 | Tested | Full support | -| PG 15 | Tested | Full support | -| PG 16 | Tested | Full support | -| PG 17 | Tested | Full support | -| PG 18 | Tested | Full support | -| Future versions | Expected | Standard PGXS compatibility | +## Basic Usage ---- +The following examples demonstrate the core workflow for storing, retrieving, +and monitoring cached query results. -## Performance +1. Store a query result with its vector embedding in the cache. -The extension is optimized for sub-millisecond cache lookups with minimal overhead. + In the following example, the `cache_query` function stores a completed + orders query with a one-hour TTL and analytics tags. -### Runtime Metrics + ```sql + SELECT semantic_cache.cache_query( + query_text := 'SELECT * FROM orders WHERE status = ''completed''', + embedding := '[0.1, 0.2, 0.3, ...]'::text, -- From OpenAI, Cohere, etc. + result_data := '{"total": 150, "orders": [...]}'::jsonb, + ttl_seconds := 3600, -- 1 hour + tags := ARRAY['orders', 'analytics'] -- Optional tags + ); + -- Returns: cache_id (bigint) + ``` -| Operation | Performance | Notes | -|-----------|-------------|-------| -| Cache lookup | **< 5ms** | With optimized vector index | -| Cache insert | **< 10ms** | Including embedding storage | -| Eviction (1000 entries) | **< 50ms** | Efficient batch operations | -| Statistics query | **< 1ms** | Materialized views | -| Similarity search | **2-3ms avg** | IVFFlat/HNSW indexed | +2. Retrieve a cached result using semantic similarity search. -### Expected Hit Rates + In the following example, the `get_cached_result` function searches for + cached results with at least 95% similarity to the query embedding. -| Workload Type | Typical Hit Rate | -|---------------|------------------| -| AI/LLM queries | 40-60% | -| Analytics dashboards | 60-80% | -| Search systems | 50-70% | -| Chatbot conversations | 45-65% | + ```sql + SELECT * FROM semantic_cache.get_cached_result( + embedding := '[0.11, 0.19, 0.31, ...]'::text, -- Similar query embedding + similarity_threshold := 0.95, -- 95% similarity required + max_age_seconds := NULL -- Any age (optional) + ); + -- Returns: (found boolean, result_data jsonb, similarity_score float4, age_seconds int) + ``` -### Memory Overhead + The function returns a table with the following columns: -- **Per cache entry**: ~1-2KB (metadata + indexes) -- **Vector storage**: Depends on embedding dimension (1536D = ~6KB) -- **Total overhead**: Minimal for typical workloads + ``` + found | result_data | similarity_score | age_seconds + -------+----------------------------+------------------+------------- + true | {"total": 150, "orders"... | 0.973 | 245 + ``` -### Benchmarks +3. Monitor cache performance using built-in statistics and health views. -Run the included benchmark suite: + In the following example, the queries retrieve comprehensive statistics, + health metrics, and recent activity for the semantic cache. -```bash -psql -U postgres -d your_database -f test/benchmark.sql -``` + ```sql + -- Comprehensive statistics + SELECT * FROM semantic_cache.cache_stats(); -**Expected Results:** + -- Health overview (includes hit rate and more details) + SELECT * FROM semantic_cache.cache_health; -``` -Operation | Count | Total Time | Avg Time ------------------------+--------+------------+---------- -Insert entries | 1,000 | ~500ms | 0.5ms -Lookup (hits) | 100 | ~200ms | 2.0ms -Lookup (misses) | 100 | ~150ms | 1.5ms -Evict LRU | 500 | ~25ms | 0.05ms -``` + -- Recent cache activity + SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; + ``` --- -## Production Deployment - -For production environments, optimize PostgreSQL settings and set up automated maintenance. - -### PostgreSQL Configuration - -Optimize PostgreSQL settings for semantic caching workloads: - -```sql --- Memory settings -ALTER SYSTEM SET shared_buffers = '4GB'; -- Adjust based on available RAM -ALTER SYSTEM SET effective_cache_size = '12GB'; -- Typically 50-75% of RAM -ALTER SYSTEM SET work_mem = '256MB'; -- For vector operations - --- Reload configuration -SELECT pg_reload_conf(); -``` - -### Automated Maintenance - -Set up automatic cache maintenance using `pg_cron`: - -```sql --- Install pg_cron -CREATE EXTENSION IF NOT EXISTS pg_cron; - --- Schedule auto-eviction every 15 minutes -SELECT cron.schedule( - 'semantic-cache-eviction', - '*/15 * * * *', - $$SELECT semantic_cache.auto_evict()$$ -); - --- Schedule expired entry cleanup every hour -SELECT cron.schedule( - 'semantic-cache-cleanup', - '0 * * * *', - $$SELECT semantic_cache.evict_expired()$$ -); - --- Verify scheduled jobs -SELECT * FROM cron.job WHERE jobname LIKE 'semantic-cache%'; -``` - -### Index Optimization - -Choose the appropriate vector index strategy based on your cache size. - -#### Small to Medium Caches (< 100k entries) -Default IVFFlat index works well out of the box. - -#### Large Caches (100k - 1M entries) -Increase IVFFlat lists for better performance: - -```sql -DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; -CREATE INDEX idx_cache_embedding - ON semantic_cache.cache_entries - USING ivfflat (query_embedding vector_cosine_ops) - WITH (lists = 1000); -- Increase lists for larger caches -``` - -#### Very Large Caches (> 1M entries) -Use HNSW index for optimal performance (requires pgvector 0.5.0+): - -```sql -DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; -CREATE INDEX idx_cache_embedding_hnsw - ON semantic_cache.cache_entries - USING hnsw (query_embedding vector_cosine_ops) - WITH (m = 16, ef_construction = 64); -``` - -**HNSW Benefits:** -- Faster queries (1-2ms vs 3-5ms) -- Better recall at high similarity thresholds -- Scales linearly with cache size - -### Monitoring Setup +## Building the Documentation -Set up custom views to monitor cache health and performance metrics. +Before building the documentation, install Python 3.8+ and pip. -Create a monitoring dashboard view: +1. Install dependencies: + ```bash + pip install -r docs-requirements.txt + ``` -```sql -CREATE OR REPLACE VIEW semantic_cache.production_dashboard AS -SELECT - (SELECT hit_rate_percent FROM semantic_cache.cache_stats())::numeric(5,2) || '%' as hit_rate, - (SELECT total_entries FROM semantic_cache.cache_stats()) as total_entries, - (SELECT pg_size_pretty(SUM(result_size_bytes)::BIGINT) FROM semantic_cache.cache_entries) as cache_size, - (SELECT COUNT(*) FROM semantic_cache.cache_entries WHERE expires_at <= NOW()) as expired_entries, - (SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy') as eviction_policy, - NOW() as snapshot_time; - --- Query the dashboard -SELECT * FROM semantic_cache.production_dashboard; -``` - -### High Availability Considerations - -The cache integrates seamlessly with PostgreSQL's replication and backup mechanisms. - -```sql --- Regular backups of cache metadata (optional) -pg_dump -U postgres -d your_db -t semantic_cache.cache_entries -t semantic_cache.cache_metadata -F c -f cache_backup.dump - --- Replication: Cache data is automatically replicated with PostgreSQL streaming replication --- No special configuration needed -``` - ---- +2. Use the following command to review the documentation locally: + ```bash + mkdocs serve + ``` -## Integration Examples - -### Python with OpenAI - -Complete example integrating semantic cache with OpenAI embeddings: - -```python -import psycopg2 -import openai -import json -from typing import Optional, Dict, Any - -class SemanticCache: - """Semantic cache wrapper for PostgreSQL""" - - def __init__(self, conn_string: str, openai_api_key: str): - self.conn = psycopg2.connect(conn_string) - self.client = openai.OpenAI(api_key=openai_api_key) - - def _get_embedding(self, text: str) -> str: - """Generate embedding using OpenAI""" - response = self.client.embeddings.create( - model="text-embedding-ada-002", - input=text - ) - embedding = response.data[0].embedding - return f"[{','.join(map(str, embedding))}]" - - def cache(self, query: str, result: Dict[Any, Any], - ttl: int = 3600, tags: Optional[list] = None) -> int: - """Cache a query result""" - embedding = self._get_embedding(query) - - with self.conn.cursor() as cur: - cur.execute(""" - SELECT semantic_cache.cache_query( - %s::text, %s::text, %s::jsonb, %s::int, %s::text[] - ) - """, (query, embedding, json.dumps(result), ttl, tags)) - cache_id = cur.fetchone()[0] - self.conn.commit() - return cache_id - - def get(self, query: str, similarity: float = 0.95, - max_age: Optional[int] = None) -> Optional[Dict[Any, Any]]: - """Retrieve from cache""" - embedding = self._get_embedding(query) - - with self.conn.cursor() as cur: - cur.execute(""" - SELECT found, result_data, similarity_score, age_seconds - FROM semantic_cache.get_cached_result( - %s::text, %s::float4, %s::int - ) - """, (embedding, similarity, max_age)) - - result = cur.fetchone() - if result and result[0]: # Cache hit - print(f"Cache HIT (similarity: {result[2]:.3f}, age: {result[3]}s)") - return json.loads(result[1]) - else: - print("Cache MISS") - return None - - def stats(self) -> Dict[str, Any]: - """Get cache statistics""" - with self.conn.cursor() as cur: - cur.execute("SELECT * FROM semantic_cache.cache_stats()") - columns = [desc[0] for desc in cur.description] - values = cur.fetchone() - return dict(zip(columns, values)) - -# Usage example -cache = SemanticCache( - conn_string="dbname=mydb user=postgres", - openai_api_key="sk-..." -) - -# Try to get from cache, compute if miss -def get_revenue_data(query: str) -> Dict: - result = cache.get(query, similarity=0.95) - - if result: - return result # Cache hit! - - # Cache miss - compute the result - result = expensive_database_query() # Your expensive query here - cache.cache(query, result, ttl=3600, tags=['revenue', 'analytics']) - return result - -# Example queries -data1 = get_revenue_data("What was Q4 2024 revenue?") -data2 = get_revenue_data("Show me revenue for last quarter") # Will hit cache! -data3 = get_revenue_data("Q4 sales figures?") # Will also hit cache! - -# View statistics -print(cache.stats()) -``` - -### Node.js with OpenAI - -```javascript -const { Client } = require('pg'); -const OpenAI = require('openai'); - -class SemanticCache { - constructor(pgConfig, openaiApiKey) { - this.client = new Client(pgConfig); - this.openai = new OpenAI({ apiKey: openaiApiKey }); - this.client.connect(); - } - - async getEmbedding(text) { - const response = await this.openai.embeddings.create({ - model: 'text-embedding-ada-002', - input: text - }); - const embedding = response.data[0].embedding; - return `[${embedding.join(',')}]`; - } - - async cache(query, result, ttl = 3600, tags = null) { - const embedding = await this.getEmbedding(query); - const res = await this.client.query( - `SELECT semantic_cache.cache_query($1::text, $2::text, $3::jsonb, $4::int, $5::text[])`, - [query, embedding, JSON.stringify(result), ttl, tags] - ); - return res.rows[0].cache_query; - } - - async get(query, similarity = 0.95, maxAge = null) { - const embedding = await this.getEmbedding(query); - const res = await this.client.query( - `SELECT * FROM semantic_cache.get_cached_result($1::text, $2::float4, $3::int)`, - [embedding, similarity, maxAge] - ); - - const { found, result_data, similarity_score, age_seconds } = res.rows[0]; - - if (found) { - console.log(`Cache HIT (similarity: ${similarity_score.toFixed(3)}, age: ${age_seconds}s)`); - return JSON.parse(result_data); - } else { - console.log('Cache MISS'); - return null; - } - } - - async stats() { - const res = await this.client.query('SELECT * FROM semantic_cache.cache_stats()'); - return res.rows[0]; - } -} - -// Usage -const cache = new SemanticCache( - { host: 'localhost', database: 'mydb', user: 'postgres' }, - 'sk-...' -); - -async function getRevenueData(query) { - const cached = await cache.get(query); - if (cached) return cached; - - const result = await expensiveDatabaseQuery(); - await cache.cache(query, result, 3600, ['revenue', 'analytics']); - return result; -} -``` + Then open http://127.0.0.1:8000 in your browser. -### More Examples +3. To build a static site: + ```bash + mkdocs build + ``` -For additional integration patterns and use cases, see: -- `examples/usage_examples.sql` - Comprehensive SQL examples -- `test/benchmark.sql` - Performance testing examples + Documentation will added to the `site/` directory. --- -## Contributing - -Contributions are welcome! This extension is built with standard PostgreSQL C APIs. - -**Development setup:** -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Run tests: `make installcheck` -5. Submit a pull request +## Support & Resources -**Code guidelines:** -- Follow existing code style -- Add tests for new features -- Update documentation -- Ensure compatibility with PostgreSQL 14-18 +- Report bugs and request features through the GitHub Issues page. +- Check the `examples/` directory for usage patterns and code samples. +- See the `test/` directory for comprehensive testing examples. --- ## License -This project is licensed under the **PostgreSQL License**. - ---- - -## Support & Resources +This project is licensed under the [PostgreSQL License](docs/LICENSE.md). -- **GitHub Issues**: Report bugs and request features -- **Example Code**: Check `examples/` directory for usage patterns -- **Test Suite**: See `test/` directory for comprehensive examples diff --git a/docs/LICENSE.md b/docs/LICENSE.md new file mode 100644 index 0000000..075d616 --- /dev/null +++ b/docs/LICENSE.md @@ -0,0 +1,19 @@ +PostgreSQL License + +Copyright (c) 2024, Aqeel + +Permission to use, copy, modify, and distribute this software and its +documentation for any purpose, without fee, and without a written agreement +is hereby granted, provided that the above copyright notice and this +paragraph and the following two paragraphs appear in all copies. + +IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, +SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, +ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE +AUTHOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE +AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, +ENHANCEMENTS, OR MODIFICATIONS. diff --git a/docs/architecture.md b/docs/architecture.md index 764e9ee..7e0e3fa 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -31,22 +31,10 @@ graph LR 5. Automatic maintenance evicts expired entries based on TTL and configured policies. -## Performance - -- Lookup time is < 5ms for most queries with IVFFlat index. -- Scalability handles 100K+ cached entries efficiently. -- Throughput reaches thousands of cache lookups per second. -- Storage provides configurable cache size limits with automatic eviction. - -!!! tip "Pro Tip" - - Start with the default IVFFlat index and 1536 dimensions (OpenAI - ada-002). You can always reconfigure your cache later with the - `set_vector_dimension()` and `rebuild_index()` functions. ## Getting Help -- Browse the sections in the navigation menu for documentation. +- Browse the documentation. - Report issues at [GitHub Issues](https://github.com/pgedge/pg_semantic_cache/issues). - See [Use Cases](use_cases.md) for practical implementation examples. diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..0552bdd --- /dev/null +++ b/docs/development.md @@ -0,0 +1,53 @@ +# Development Resources + +Developer contributions are welcome! This extension is built with standard PostgreSQL C APIs. + +To create a development installation: + +1. Fork the repository. +2. Create a feature branch for your changes. +3. Make your changes to the codebase. +4. Run the test suite with `make installcheck`. +5. Submit a pull request with your changes. + +Code guidelines: + +- Follow the existing code style throughout the project. +- Add tests for any new features you implement. +- Update the documentation to reflect your changes. +- Ensure your changes are compatible with PostgreSQL versions 14 through 18. + +--- + +## Building From Source + +The extension uses the standard PostgreSQL PGXS build system for compilation and installation. + + +```bash +# Standard build +make clean && make +sudo make install + +# Run tests +make installcheck + +# Development build with debug symbols +make CFLAGS="-g -O0" clean all + +# View build configuration +make info +``` + +## Performing a Multi-Version PostgreSQL Build + +The extension supports building for multiple PostgreSQL versions in sequence. + +Build for multiple PostgreSQL versions simultaneously: + +```bash +for PG in 14 15 16 17 18; do + echo "Building for PostgreSQL $PG..." + PG_CONFIG=/usr/pgsql-${PG}/bin/pg_config make clean install +done +``` diff --git a/docs/functions.md b/docs/functions.md index 2dbfee5..d39adbe 100644 --- a/docs/functions.md +++ b/docs/functions.md @@ -1,6 +1,6 @@ # Using pg_semantic_cache Functions -This page provides a comprehensive reference for all available functions in the pg_semantic_cache extension. +The extension provides a complete set of SQL functions for caching, eviction, monitoring, and configuration. This page provides a comprehensive reference for all available functions in the pg_semantic_cache extension. ## Function Reference @@ -24,3 +24,100 @@ This page provides a comprehensive reference for all available functions in the | [rebuild_index](functions/rebuild_index.md) | Rebuilds the vector similarity index for optimal performance. | | [set_index_type](functions/set_index_type.md) | Sets the vector index type for similarity search. | | [set_vector_dimension](functions/set_vector_dimension.md) | Sets the vector embedding dimension. | + + +### Core Functions + +#### `init_schema()` +Initialize the cache schema, creating all required tables, indexes, and views. + +```sql +SELECT semantic_cache.init_schema(); +``` + +#### `cache_query(query_text, embedding, result_data, ttl_seconds, tags)` +Store a query result with its embedding for future retrieval. + +**Parameters:** +- `query_text` (text) - The original query text +- `embedding` (text) - Vector embedding as text: `'[0.1, 0.2, ...]'` +- `result_data` (jsonb) - The query result to cache +- `ttl_seconds` (int) - Time-to-live in seconds +- `tags` (text[]) - Optional tags for organization + +**Returns:** `bigint` - Cache entry ID + +#### `get_cached_result(embedding, similarity_threshold, max_age_seconds)` +Retrieve a cached result by semantic similarity. + +**Parameters:** +- `embedding` (text) - Query embedding to search for +- `similarity_threshold` (float4) - Minimum similarity (0.0 to 1.0) +- `max_age_seconds` (int) - Maximum age in seconds (NULL = any age) + +**Returns:** `record` - `(found boolean, result_data jsonb, similarity_score float4, age_seconds int)` + + +--- + +### Cache Eviction + +Multiple eviction strategies are available to manage cache size and freshness. + +#### `evict_expired()` +Remove all expired cache entries. + +```sql +SELECT semantic_cache.evict_expired(); -- Returns count of evicted entries +``` + +#### `evict_lru(keep_count)` +Evict least recently used entries, keeping only the specified number of most recent entries. + +```sql +SELECT semantic_cache.evict_lru(1000); -- Keep only 1000 most recently used entries +``` + +#### `evict_lfu(keep_count)` +Evict least frequently used entries, keeping only the specified number of most frequently used entries. + +```sql +SELECT semantic_cache.evict_lfu(1000); -- Keep only 1000 most frequently used entries +``` + +#### `auto_evict()` +Automatically evict entries based on configured policy (LRU, LFU, or TTL). + +```sql +SELECT semantic_cache.auto_evict(); +``` + +#### `clear_cache()` +Remove **all** cache entries (use with caution). + +```sql +SELECT semantic_cache.clear_cache(); +``` + +--- + +### Statistics & Monitoring + +Built-in functions and views provide real-time visibility into cache performance. + +#### `cache_stats()` +Get comprehensive cache statistics. + +```sql +SELECT * FROM semantic_cache.cache_stats(); +``` + +**Returns:** +``` +total_entries | Total number of cached queries +total_hits | Total number of cache hits +total_misses | Total number of cache misses +hit_rate_percent | Hit rate as a percentage +``` + +**Note:** For more detailed statistics including cache size, expired entries, and access patterns, use the `semantic_cache.cache_health` view. diff --git a/docs/index.md b/docs/index.md index e776e85..b33ac1b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -57,4 +57,16 @@ For an LLM application making 10,000 queries per day: - Comprehensive monitoring provides built-in statistics, views, and health metrics. +### Cross-Platform Support + +The extension is fully compatible with all PostgreSQL-supported platforms. + +Fully compatible with all PostgreSQL-supported platforms: + +| Platform | Status | Notes | +|----------|--------|-------| +| Linux | Supported | Ubuntu, Debian, RHEL, Rocky, Fedora, etc. | +| macOS | Supported | Intel & Apple Silicon | +| Windows | Supported | Via MinGW or MSVC | +| BSD | Supported | FreeBSD, OpenBSD | diff --git a/integration.md b/docs/integration.md similarity index 78% rename from integration.md rename to docs/integration.md index d6cc83e..c0c4169 100644 --- a/integration.md +++ b/docs/integration.md @@ -1,10 +1,15 @@ # Integration Examples -Refer to the following integration examples when configuring pg_semantic_cache. +This page provides integration examples for using pg_semantic_cache with +popular programming languages and embedding providers. -### Python with OpenAI +## Python with OpenAI -Complete example integrating semantic cache with OpenAI embeddings: +The following example demonstrates how to integrate the semantic cache with +OpenAI embeddings using Python and the psycopg2 library. + +In the following example, the `SemanticCache` class wraps the cache functions +and handles embedding generation through the OpenAI API. ```python import psycopg2 @@ -92,14 +97,26 @@ def get_revenue_data(query: str) -> Dict: # Example queries data1 = get_revenue_data("What was Q4 2024 revenue?") -data2 = get_revenue_data("Show me revenue for last quarter") # Will hit cache! -data3 = get_revenue_data("Q4 sales figures?") # Will also hit cache! +data2 = get_revenue_data("Show me revenue for last quarter") +data3 = get_revenue_data("Q4 sales figures?") # View statistics print(cache.stats()) ``` -### Node.js with OpenAI +The preceding example demonstrates three key operations: + +- The cache initialization with database connection and API credentials. +- The automatic fallback from cache lookup to computation when needed. +- The statistical monitoring to track cache performance over time. + +## Node.js with OpenAI + +The following example shows how to use the semantic cache with Node.js and +the OpenAI API through an asynchronous interface. + +In the following example, the `SemanticCache` class uses async/await patterns +to handle database operations and embedding generation. ```javascript const { Client } = require('pg'); @@ -170,9 +187,11 @@ async function getRevenueData(query) { } ``` -### More Examples +## Additional Resources + +The repository includes additional integration examples and test files. -For additional integration patterns and use cases, see: -- `examples/usage_examples.sql` - Comprehensive SQL examples -- `test/benchmark.sql` - Performance testing examples +For more comprehensive examples, refer to the following files: +- The `examples/usage_examples.sql` file contains comprehensive SQL examples. +- The `test/benchmark.sql` file provides performance testing examples. diff --git a/docs/performance.md b/docs/performance.md new file mode 100644 index 0000000..edb0639 --- /dev/null +++ b/docs/performance.md @@ -0,0 +1,70 @@ +# Performance and Benchmarking + +The extension is optimized for sub-millisecond cache lookups with minimal overhead. + +- Lookup time is < 5ms for most queries with IVFFlat index. +- Scalability handles 100K+ cached entries efficiently. +- Throughput reaches thousands of cache lookups per second. +- Storage provides configurable cache size limits with automatic eviction. + +!!! tip "Pro Tip" + + Start with the default IVFFlat index and 1536 dimensions (OpenAI + ada-002). You can always reconfigure your cache later with the + `set_vector_dimension()` and `rebuild_index()` functions. + +## Runtime Metrics + +The following table shows typical performance metrics for common cache operations. + +| Operation | Performance | Notes | +|-----------|-------------|-------| +| Cache lookup | **< 5ms** | With optimized vector index | +| Cache insert | **< 10ms** | Including embedding storage | +| Eviction (1000 entries) | **< 50ms** | Efficient batch operations | +| Statistics query | **< 1ms** | Materialized views | +| Similarity search | **2-3ms avg** | IVFFlat/HNSW indexed | + +### Expected Hit Rates + +Cache hit rates vary by workload type and query similarity patterns. + +| Workload Type | Typical Hit Rate | +|---------------|------------------| +| AI/LLM queries | 40-60% | +| Analytics dashboards | 60-80% | +| Search systems | 50-70% | +| Chatbot conversations | 45-65% | + +### Memory Overhead + +The cache maintains a minimal memory footprint for typical workloads. + +- Each cache entry requires approximately 1-2KB for metadata and indexes. +- Vector storage size depends on the embedding dimension (1536D requires approximately 6KB). +- The total overhead remains minimal for typical workloads. + +## Benchmarking + +The extension includes a comprehensive benchmark suite for performance testing. + +Use the following command to run the included benchmark suite: + +```bash +psql -U postgres -d your_database -f test/benchmark.sql +``` + +**Expected Results:** + +``` +Operation | Count | Total Time | Avg Time +-----------------------+--------+------------+---------- +Insert entries | 1,000 | ~500ms | 0.5ms +Lookup (hits) | 100 | ~200ms | 2.0ms +Lookup (misses) | 100 | ~150ms | 1.5ms +Evict LRU | 500 | ~25ms | 0.05ms +``` + + + + diff --git a/docs/production.md b/docs/production.md new file mode 100644 index 0000000..3240ca6 --- /dev/null +++ b/docs/production.md @@ -0,0 +1,118 @@ +# Deploying in a Production Environment + +For production environments, optimize PostgreSQL settings and set up automated maintenance. + +### PostgreSQL Configuration + +Optimize PostgreSQL memory and performance settings for semantic caching workloads. + +Optimize PostgreSQL settings for semantic caching workloads: + +```sql +-- Memory settings +ALTER SYSTEM SET shared_buffers = '4GB'; -- Adjust based on available RAM +ALTER SYSTEM SET effective_cache_size = '12GB'; -- Typically 50-75% of RAM +ALTER SYSTEM SET work_mem = '256MB'; -- For vector operations + +-- Reload configuration +SELECT pg_reload_conf(); +``` + +### Automated Maintenance + +Schedule automatic cache maintenance tasks using the pg_cron extension. + +Set up automatic cache maintenance using `pg_cron`: + +```sql +-- Install pg_cron +CREATE EXTENSION IF NOT EXISTS pg_cron; + +-- Schedule auto-eviction every 15 minutes +SELECT cron.schedule( + 'semantic-cache-eviction', + '*/15 * * * *', + $$SELECT semantic_cache.auto_evict()$$ +); + +-- Schedule expired entry cleanup every hour +SELECT cron.schedule( + 'semantic-cache-cleanup', + '0 * * * *', + $$SELECT semantic_cache.evict_expired()$$ +); + +-- Verify scheduled jobs +SELECT * FROM cron.job WHERE jobname LIKE 'semantic-cache%'; +``` + +### Index Optimization + +Choose the appropriate vector index strategy based on your cache size. + +#### Small to Medium Caches (< 100k entries) +Default IVFFlat index works well out of the box. + +#### Large Caches (100k - 1M entries) +Increase IVFFlat lists for better performance: + +```sql +DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; +CREATE INDEX idx_cache_embedding + ON semantic_cache.cache_entries + USING ivfflat (query_embedding vector_cosine_ops) + WITH (lists = 1000); -- Increase lists for larger caches +``` + +#### Very Large Caches (> 1M entries) +Use HNSW index for optimal performance (requires pgvector 0.5.0+): + +```sql +DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; +CREATE INDEX idx_cache_embedding_hnsw + ON semantic_cache.cache_entries + USING hnsw (query_embedding vector_cosine_ops) + WITH (m = 16, ef_construction = 64); +``` + +HNSW provides the following benefits: + +- The HNSW index delivers faster queries with 1-2ms response times compared to 3-5ms for IVFFlat. +- HNSW provides better recall accuracy at high similarity thresholds. +- HNSW scales linearly with cache size for consistent performance. + +### Monitoring Setup + +Set up custom views to monitor cache health and performance metrics. + +Create a monitoring dashboard view: + +```sql +CREATE OR REPLACE VIEW semantic_cache.production_dashboard AS +SELECT + (SELECT hit_rate_percent FROM semantic_cache.cache_stats())::numeric(5,2) || '%' as hit_rate, + (SELECT total_entries FROM semantic_cache.cache_stats()) as total_entries, + (SELECT pg_size_pretty(SUM(result_size_bytes)::BIGINT) FROM semantic_cache.cache_entries) as cache_size, + (SELECT COUNT(*) FROM semantic_cache.cache_entries WHERE expires_at <= NOW()) as expired_entries, + (SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy') as eviction_policy, + NOW() as snapshot_time; + +-- Query the dashboard +SELECT * FROM semantic_cache.production_dashboard; +``` + +### High Availability Considerations + +The cache integrates seamlessly with PostgreSQL's replication and backup mechanisms. + +```sql +-- Regular backups of cache metadata (optional) +pg_dump -U postgres -d your_db -t semantic_cache.cache_entries -t semantic_cache.cache_metadata -F c -f cache_backup.dump + +-- Replication: Cache data is automatically replicated with PostgreSQL streaming replication +-- No special configuration needed +``` + +--- + +--- From 47051139c9d6aa3bfc951c46c298b632c82505fe Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Tue, 10 Mar 2026 11:31:12 -0400 Subject: [PATCH 08/12] Updating project README file and removing docs README file; the doc building content now lives in the main project README file --- README.md | 67 +++++++++++++++----------- docs/README.md | 127 ------------------------------------------------- 2 files changed, 39 insertions(+), 155 deletions(-) delete mode 100644 docs/README.md diff --git a/README.md b/README.md index a8590a1..3d09a7d 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,32 @@ # pg_semantic_cache -pg_semantic_cache allows you to leverage vector embeddings to cache and retrieve query results based on semantic similarity. - -[pg_semantic_cache Introduction](docs/index.md) -[pg_semantic_cache Architecture](docs/architecture.md) -[pg_semantic_cache Use Cases](docs/use_cases.md) -[Quick Start](docs/quick_start.md) -[Installation](docs/installation.md) -[Configuration](docs/configuration.md) -[Deploying in a Production Environment](docs/deployment.md) -[Using pg_semantic_cache Functions](docs/functions.md) -[Sample Integrations](docs/integration.md) -[Monitoring](docs/logging.md) -[Performance and Benchmarking](docs/performance.md) -[Logging](docs/logging.md) -[Troubleshooting](docs/troubleshooting.md) -[FAQ](docs/FAQ.md) -[Developers](docs/development.md) - ---- - -`pg_semantic_cache` enables **semantic query result caching** for PostgreSQL. Unlike traditional caching that requires exact query matches, this extension uses vector embeddings to find and retrieve cached results for semantically similar queries. +pg_semantic_cache allows you to leverage vector embeddings to cache and +retrieve query results based on semantic similarity. + +## Table of Contents + +- [pg_semantic_cache Introduction](docs/index.md) +- [pg_semantic_cache Architecture](docs/architecture.md) +- [pg_semantic_cache Use Cases](docs/use_cases.md) +- [Quick Start](#quick-start) +- [Installation](docs/installation.md) +- [Configuration](docs/configuration.md) +- [Deploying in a Production Environment](docs/deployment.md) +- [Using pg_semantic_cache Functions](docs/functions.md) +- [Sample Integrations](docs/integration.md) +- [Monitoring](docs/logging.md) +- [Performance and Benchmarking](docs/performance.md) +- [Logging](docs/logging.md) +- [Troubleshooting](docs/troubleshooting.md) +- [FAQ](docs/FAQ.md) +- [Developers](docs/development.md) + +For comprehensive documentation, visit [docs.pgedge.com](https://docs.pgedge.com). + +`pg_semantic_cache` enables **semantic query result caching** for +PostgreSQL. Unlike traditional caching that requires exact query matches, +this extension uses vector embeddings to find and retrieve cached results +for semantically similar queries. ## Quick Start @@ -67,8 +73,6 @@ The following steps walk you through installing and configuring the extension. SELECT * FROM semantic_cache.cache_stats(); ``` ---- - ### Configuration All runtime settings can be configured through the cache configuration table. @@ -96,7 +100,6 @@ SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy'; | `eviction_policy` | 'lru' | Eviction policy: lru, lfu, or ttl | | `similarity_threshold` | '0.95' | Default similarity threshold | ---- ## Basic Usage @@ -157,7 +160,6 @@ and monitoring cached query results. SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; ``` ---- ## Building the Documentation @@ -186,9 +188,18 @@ Before building the documentation, install Python 3.8+ and pip. ## Support & Resources -- Report bugs and request features through the GitHub Issues page. -- Check the `examples/` directory for usage patterns and code samples. -- See the `test/` directory for comprehensive testing examples. +To report an issue with this software, visit the +[GitHub Issues](https://github.com/pgEdge/pg_semantic_cache/issues) page. + +Check the `examples/` directory for usage patterns and code samples; see +the `test/` directory for comprehensive testing examples. + +For more information, visit [docs.pgedge.com](https://docs.pgedge.com). + +## Contributing + +We welcome your project contributions; for more information, see +[docs/development.md](docs/development.md). --- diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 806e2e1..0000000 --- a/docs/README.md +++ /dev/null @@ -1,127 +0,0 @@ -# pg_semantic_cache Documentation - -This directory contains the source files for the pg_semantic_cache documentation, built with [MkDocs](https://www.mkdocs.org/) and the [Material theme](https://squidfunk.github.io/mkdocs-material/). - -## Building the Documentation - -### Prerequisites - -Python 3.8+ with pip installed. - -### Setup - -1. Install dependencies: - ```bash - pip install -r docs-requirements.txt - ``` - -2. Preview documentation locally: - ```bash - mkdocs serve - ``` - - Then open http://127.0.0.1:8000 in your browser. - -3. Build static site: - ```bash - mkdocs build - ``` - - Output will be in the `site/` directory. - -## Documentation Structure - -``` -docs/ -├── index.md # Home page -├── installation.md # Installation guide -├── configuration.md # Configuration guide -├── use_cases.md # Practical examples -├── monitoring.md # Monitoring and optimization -├── FAQ.md # Frequently asked questions -├── functions/ # Function reference -│ ├── index.md # Functions overview -│ ├── cache_query.md # cache_query() documentation -│ ├── get_cached_result.md # get_cached_result() documentation -│ └── ... # Additional function docs -├── img/ # Images and assets -└── stylesheets/ # Custom CSS (if needed) -``` - -## Writing Guidelines - -### Style - -- Use clear, concise language -- Include practical examples -- Add code blocks with syntax highlighting -- Use admonitions for warnings, tips, notes -- Keep sections focused and scannable - -### Admonitions - -```markdown -!!! note - This is a note - -!!! tip "Pro Tip" - This is a tip with custom title - -!!! warning - This is a warning - -!!! danger "Critical" - This is a danger message -``` - -### Code Blocks - -````markdown -```sql --- SQL example -SELECT * FROM semantic_cache.cache_stats(); -``` - -```python -# Python example -import psycopg2 -``` -```` - -### Tabs - -```markdown -=== "PostgreSQL" - ```sql - SELECT 1; - ``` - -=== "Python" - ```python - print("Hello") - ``` -``` - -## Deployment - -Documentation can be deployed to: - -- GitHub Pages: `mkdocs gh-deploy` -- Read the Docs: Connect repository -- Custom hosting: Deploy `site/` directory - -## Contributing - -When adding documentation: - -1. Follow existing structure and style -2. Test locally with `mkdocs serve` -3. Update `mkdocs.yml` navigation if adding new pages -4. Ensure all internal links work -5. Add examples where helpful - -## Links - -- [MkDocs Documentation](https://www.mkdocs.org/) -- [Material Theme](https://squidfunk.github.io/mkdocs-material/) -- [PyMdown Extensions](https://facelessuser.github.io/pymdown-extensions/) From 8b51cf3bef3da4830842fbd59c3a2f11a97277f9 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Wed, 11 Mar 2026 07:46:38 -0400 Subject: [PATCH 09/12] Updates for TOC in mkdocs.yml --- README.md | 6 ++--- mkdocs.yml | 79 +++++++++++++++++++++++++++++------------------------- 2 files changed, 46 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 3d09a7d..8dcda81 100644 --- a/README.md +++ b/README.md @@ -5,9 +5,9 @@ retrieve query results based on semantic similarity. ## Table of Contents -- [pg_semantic_cache Introduction](docs/index.md) -- [pg_semantic_cache Architecture](docs/architecture.md) -- [pg_semantic_cache Use Cases](docs/use_cases.md) +- [Overview](docs/index.md) +- [Architecture](docs/architecture.md) +- [Use Cases](docs/use_cases.md) - [Quick Start](#quick-start) - [Installation](docs/installation.md) - [Configuration](docs/configuration.md) diff --git a/mkdocs.yml b/mkdocs.yml index 78fbff7..ec0efdd 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -68,41 +68,48 @@ markdown_extensions: format: !!python/name:pymdownx.superfences.fence_code_format nav: - - Home: index.md - - Using Semantic Caching: architecture.md - - Getting Started: - - Quick Start Guide: quick_start.md - - Building from Source: installation.md - - Configuring pg_semantic_cache: configuration.md - - Usage: - - Use Cases: use_cases.md - - Monitoring: monitoring.md - - Reference: - - Functions: - - Overview: functions/index.md - - Caching: - - cache_query: functions/cache_query.md - - get_cached_result: functions/get_cached_result.md - - invalidate_cache: functions/invalidate_cache.md - - Monitoring: - - cache_stats: functions/cache_stats.md - - cache_hit_rate: functions/cache_hit_rate.md - - Eviction: - - evict_expired: functions/evict_expired.md - - evict_lru: functions/evict_lru.md - - evict_lfu: functions/evict_lfu.md - - auto_evict: functions/auto_evict.md - - clear_cache: functions/clear_cache.md - - Configuration: - - set_vector_dimension: functions/set_vector_dimension.md - - get_vector_dimension: functions/get_vector_dimension.md - - set_index_type: functions/set_index_type.md - - get_index_type: functions/get_index_type.md - - rebuild_index: functions/rebuild_index.md - - Cost Tracking: - - log_cache_access: functions/log_cache_access.md - - get_cost_savings: functions/get_cost_savings.md - - Utility: - - init_schema: functions/init_schema.md + - Overview: index.md + - Architecture: architecture.md + - Use Cases: use_cases.md + - Quick Start: quick_start.md + - Installation: installation.md + - Configuration: configuration.md + - Deployment: production.md + - Functions: + - Overview: functions.md + - Function Reference: functions/index.md + - Caching: + - cache_query: functions/cache_query.md + - get_cached_result: functions/get_cached_result.md + - invalidate_cache: functions/invalidate_cache.md + - Monitoring: + - cache_stats: functions/cache_stats.md + - cache_hit_rate: functions/cache_hit_rate.md + - Eviction: + - evict_expired: functions/evict_expired.md + - evict_lru: functions/evict_lru.md + - evict_lfu: functions/evict_lfu.md + - auto_evict: functions/auto_evict.md + - clear_cache: functions/clear_cache.md + - Configuration: + - set_vector_dimension: functions/set_vector_dimension.md + - get_vector_dimension: functions/get_vector_dimension.md + - set_index_type: functions/set_index_type.md + - get_index_type: functions/get_index_type.md + - rebuild_index: functions/rebuild_index.md + - Cost Tracking: + - log_cache_access: functions/log_cache_access.md + - get_cost_savings: functions/get_cost_savings.md + - Utility: + - init_schema: functions/init_schema.md + - Sample Integrations: + - Overview: integration.md + - pgEdge RAG: integrations/pgedge-rag.md + - Monitoring: monitoring.md + - Logging: logging.md + - Performance and Benchmarking: performance.md + - Security: security-audit.md - Troubleshooting: troubleshooting.md - FAQ: FAQ.md + - Developers: development.md + - License: LICENSE.md From 4746d9ec147c7714d4adbf058b729491d5ec84d4 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Wed, 11 Mar 2026 08:11:37 -0400 Subject: [PATCH 10/12] Updates to caching files --- docs/index.md | 84 +++++++++++++++---------------- docs/production.md | 122 ++++++++++++++++++++++++++++----------------- 2 files changed, 115 insertions(+), 91 deletions(-) diff --git a/docs/index.md b/docs/index.md index b33ac1b..8a9b9ec 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,32 +1,29 @@ # pg_semantic_cache -pg_semantic_cache is a PostgreSQL extension that implements semantic query -result caching using vector embeddings. Unlike traditional query caching that -relies on exact string matching, pg_semantic_cache understands the *meaning* -of queries through vector similarity, enabling cache hits even when queries -are phrased differently. - -This extension is particularly valuable for: - -- AI/LLM applications can cache expensive LLM API calls and RAG (Retrieval - Augmented Generation) results. -- Analytics workloads can reuse results from complex analytical queries with - similar parameters. -- External API queries can cache results from expensive external data - sources. -- Database query optimization can reduce load on expensive database - operations. - -### Why Use Semantic Caching - -Semantic caching transforms how applications handle query results by -using vector matching rather than matching exact queries. Traditional caching -systems can miss cached result sets when queries are phrased differently, -while semantic caching recognizes that "What was Q4 revenue?" and "Show Q4 revenue" as the same question. This approach dramatically increases cache hit rates -and reduces costs for AI applications, analytics workloads, and external API -calls. - -Queries that would overlook cached result sets work with a semantic cache: +The pg_semantic_cache extension implements semantic query result caching +using vector embeddings. Unlike traditional query caching that relies on +exact string matching, pg_semantic_cache understands the meaning of queries +through vector similarity. The extension enables cache hits even when +queries are phrased differently. + +The extension is particularly valuable for the following use cases: + +- AI and LLM applications can cache expensive LLM API calls and RAG results. +- Analytics workloads can reuse results from complex analytical queries. +- External API queries can cache results from expensive external sources. +- Database query optimization can reduce load on expensive operations. + +## Why Use Semantic Caching + +Semantic caching transforms how applications handle query results by using +vector matching rather than matching exact queries. Traditional caching +systems can miss cached result sets when queries are phrased differently. +Semantic caching recognizes that "What was Q4 revenue?" and "Show Q4 +revenue" represent the same question. This approach dramatically increases +cache hit rates and reduces costs for AI applications. + +The following table shows queries that would overlook cached result sets +with traditional caching but work with a semantic cache: | Traditional Cache | Semantic Cache | |-------------------|----------------| @@ -34,34 +31,33 @@ Queries that would overlook cached result sets work with a semantic cache: | "Show Q4 revenue" ❌ Miss | "Show Q4 revenue" ✅ Hit | | "Q4 revenue please" ❌ Miss | "Q4 revenue please" ✅ Hit | -### Cost Savings Example +## Cost Savings Example + +For an LLM application making 10,000 queries per day, semantic caching can +provide significant cost savings. The following example demonstrates the +potential savings: -For an LLM application making 10,000 queries per day: +- Without caching, the application costs $200 per day (at $0.02 per query). +- With an 80% cache hit rate, the application costs $40 per day. +- The savings are $160 per day or $58,400 per year. -- Without caching costs $200/day (at $0.02 per query). -- With 80% cache hit rate costs $40/day. -- Savings are $160/day or $58,400/year. +## Key Features -### Key Features +The pg_semantic_cache extension includes the following features: - Semantic matching uses pgvector for similarity-based cache lookups. - Flexible TTL provides per-entry time-to-live configuration. - Tag-based management organizes and invalidates cache entries by tags. -- Multiple eviction policies include LRU, LFU, and TTL-based automatic - eviction. +- Multiple eviction policies include LRU, LFU, and TTL-based eviction. - Cost tracking monitors and reports on query cost savings. -- Configurable dimensions support various embedding models (768, 1536, - 3072+ dimensions). -- Multiple index types include IVFFlat (fast) or HNSW (accurate) vector - indexes. -- Comprehensive monitoring provides built-in statistics, views, and health - metrics. +- Configurable dimensions support various embedding models. +- Multiple index types include IVFFlat (fast) or HNSW (accurate) indexes. +- Comprehensive monitoring provides built-in statistics and health metrics. -### Cross-Platform Support +## Cross-Platform Support The extension is fully compatible with all PostgreSQL-supported platforms. - -Fully compatible with all PostgreSQL-supported platforms: +The following table shows the platform support status: | Platform | Status | Notes | |----------|--------|-------| diff --git a/docs/production.md b/docs/production.md index 3240ca6..bb16c02 100644 --- a/docs/production.md +++ b/docs/production.md @@ -1,71 +1,94 @@ # Deploying in a Production Environment -For production environments, optimize PostgreSQL settings and set up automated maintenance. +For production environments, we recommend that you optimize PostgreSQL +settings and configure automated maintenance. This guide covers +configuration, monitoring, and high availability considerations for +production deployments. -### PostgreSQL Configuration +## PostgreSQL Configuration -Optimize PostgreSQL memory and performance settings for semantic caching workloads. +You should optimize PostgreSQL memory and performance settings for semantic +caching workloads. Proper configuration ensures optimal cache performance +and efficient resource utilization. -Optimize PostgreSQL settings for semantic caching workloads: +In the following example, the `ALTER SYSTEM` commands configure PostgreSQL +memory settings for semantic caching workloads: ```sql --- Memory settings -ALTER SYSTEM SET shared_buffers = '4GB'; -- Adjust based on available RAM -ALTER SYSTEM SET effective_cache_size = '12GB'; -- Typically 50-75% of RAM -ALTER SYSTEM SET work_mem = '256MB'; -- For vector operations +ALTER SYSTEM SET shared_buffers = '4GB'; +ALTER SYSTEM SET effective_cache_size = '12GB'; +ALTER SYSTEM SET work_mem = '256MB'; --- Reload configuration SELECT pg_reload_conf(); ``` -### Automated Maintenance +Adjust the `shared_buffers` setting based on your available RAM. The +`effective_cache_size` should typically be 50-75% of total RAM. The +`work_mem` setting allocates memory for vector operations. -Schedule automatic cache maintenance tasks using the pg_cron extension. +## Automated Maintenance -Set up automatic cache maintenance using `pg_cron`: +You can schedule automatic cache maintenance tasks using the `pg_cron` +extension. Regular maintenance prevents cache bloat and ensures optimal +performance. + +In the following example, the `cron.schedule()` function sets up automatic +cache maintenance tasks: ```sql --- Install pg_cron CREATE EXTENSION IF NOT EXISTS pg_cron; --- Schedule auto-eviction every 15 minutes SELECT cron.schedule( 'semantic-cache-eviction', '*/15 * * * *', $$SELECT semantic_cache.auto_evict()$$ ); --- Schedule expired entry cleanup every hour SELECT cron.schedule( 'semantic-cache-cleanup', '0 * * * *', $$SELECT semantic_cache.evict_expired()$$ ); --- Verify scheduled jobs SELECT * FROM cron.job WHERE jobname LIKE 'semantic-cache%'; ``` -### Index Optimization +The first job runs auto-eviction every 15 minutes. The second job removes +expired entries every hour. + +## Index Optimization Choose the appropriate vector index strategy based on your cache size. +Different index types provide optimal performance at different scales. + +### Small to Medium Caches + +The default IVFFlat index works well for caches with fewer than 100,000 +entries. No additional configuration is required for this cache size. + +### Large Caches -#### Small to Medium Caches (< 100k entries) -Default IVFFlat index works well out of the box. +For caches containing between 100,000 and 1 million entries, increase the +IVFFlat lists parameter for better performance. -#### Large Caches (100k - 1M entries) -Increase IVFFlat lists for better performance: +In the following example, the `CREATE INDEX` command creates an optimized +IVFFlat index for large caches: ```sql DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; CREATE INDEX idx_cache_embedding ON semantic_cache.cache_entries USING ivfflat (query_embedding vector_cosine_ops) - WITH (lists = 1000); -- Increase lists for larger caches + WITH (lists = 1000); ``` -#### Very Large Caches (> 1M entries) -Use HNSW index for optimal performance (requires pgvector 0.5.0+): +### Very Large Caches + +For caches exceeding 1 million entries, use the HNSW index for optimal +performance. The HNSW index requires pgvector version 0.5.0 or later. + +In the following example, the `CREATE INDEX` command creates an HNSW index +for very large caches: ```sql DROP INDEX IF EXISTS semantic_cache.idx_cache_embedding; @@ -75,44 +98,49 @@ CREATE INDEX idx_cache_embedding_hnsw WITH (m = 16, ef_construction = 64); ``` -HNSW provides the following benefits: +The HNSW index provides the following benefits: -- The HNSW index delivers faster queries with 1-2ms response times compared to 3-5ms for IVFFlat. +- The HNSW index delivers faster queries with 1-2ms response times. - HNSW provides better recall accuracy at high similarity thresholds. - HNSW scales linearly with cache size for consistent performance. -### Monitoring Setup +## Configuring Monitoring -Set up custom views to monitor cache health and performance metrics. +You can configure custom views to monitor cache health and performance +metrics. Regular monitoring helps identify performance issues and optimize +cache configuration. -Create a monitoring dashboard view: +In the following example, the `CREATE VIEW` command creates a production +monitoring dashboard: ```sql CREATE OR REPLACE VIEW semantic_cache.production_dashboard AS SELECT - (SELECT hit_rate_percent FROM semantic_cache.cache_stats())::numeric(5,2) || '%' as hit_rate, - (SELECT total_entries FROM semantic_cache.cache_stats()) as total_entries, - (SELECT pg_size_pretty(SUM(result_size_bytes)::BIGINT) FROM semantic_cache.cache_entries) as cache_size, - (SELECT COUNT(*) FROM semantic_cache.cache_entries WHERE expires_at <= NOW()) as expired_entries, - (SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy') as eviction_policy, - NOW() as snapshot_time; - --- Query the dashboard + (SELECT hit_rate_percent FROM semantic_cache.cache_stats())::NUMERIC(5,2) || '%' AS hit_rate, + (SELECT total_entries FROM semantic_cache.cache_stats()) AS total_entries, + (SELECT pg_size_pretty(SUM(result_size_bytes)::BIGINT) FROM semantic_cache.cache_entries) AS cache_size, + (SELECT COUNT(*) FROM semantic_cache.cache_entries WHERE expires_at <= NOW()) AS expired_entries, + (SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy') AS eviction_policy, + NOW() AS snapshot_time; + SELECT * FROM semantic_cache.production_dashboard; ``` -### High Availability Considerations +## High Availability Considerations -The cache integrates seamlessly with PostgreSQL's replication and backup mechanisms. +The cache integrates seamlessly with PostgreSQL's replication and backup +mechanisms. The semantic cache data automatically replicates with standard +PostgreSQL streaming replication. -```sql --- Regular backups of cache metadata (optional) -pg_dump -U postgres -d your_db -t semantic_cache.cache_entries -t semantic_cache.cache_metadata -F c -f cache_backup.dump +In the following example, the `pg_dump` command creates a backup of cache +metadata: --- Replication: Cache data is automatically replicated with PostgreSQL streaming replication --- No special configuration needed +```bash +pg_dump -U postgres -d your_db \ + -t semantic_cache.cache_entries \ + -t semantic_cache.cache_metadata \ + -F c -f cache_backup.dump ``` ---- - ---- +The cache data automatically replicates with PostgreSQL streaming +replication. No special configuration is needed for replication. From a80e1d732e1fdcdf0b4a779581cabeb8dfed94e3 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Thu, 12 Mar 2026 12:02:27 -0400 Subject: [PATCH 11/12] Updates to docs files; removed security-audit.md per suggestion from Dave --- docs/FAQ.md | 467 +++++++++++++++++++++--------------- docs/architecture.md | 37 +-- docs/configuration.md | 279 +++++++++++----------- docs/development.md | 40 ++-- docs/functions.md | 123 +++++++--- docs/installation.md | 164 ++++++------- docs/integration.md | 38 ++- docs/monitoring.md | 509 +++++++++++++++++++++++++--------------- docs/performance.md | 48 ++-- docs/quick_start.md | 37 +-- docs/security-audit.md | 349 --------------------------- docs/troubleshooting.md | 31 ++- docs/use_cases.md | 112 +++++---- 13 files changed, 1097 insertions(+), 1137 deletions(-) delete mode 100644 docs/security-audit.md diff --git a/docs/FAQ.md b/docs/FAQ.md index 214c2af..504ed16 100644 --- a/docs/FAQ.md +++ b/docs/FAQ.md @@ -1,37 +1,55 @@ # Frequently Asked Questions +The FAQ is broken into sections to simplify finding answers to the most +commonly asked questions. + +| Section | Description | +|---------|-------------| +| [General Questions](#general-questions) | General semantic caching questions | +| [Installation & Setup](#installation--setup) | Installation and setup concerns | +| [Performance](#performance) | Performance characteristics and optimization | +| [Embeddings](#embeddings) | Embedding models and usage | +| [Configuration](#configuration) | Configuration options and settings | +| [Troubleshooting](#troubleshooting) | Common troubleshooting scenarios | +| [Best Practices](#best-practices) | Best practices for effective usage | + ## General Questions +The following sections provide general information about semantic +caching and the pg_semantic_cache extension. + ### What is semantic caching? Semantic caching uses vector embeddings to understand the meaning of -queries, not just exact text matching. When you search for "What was Q4 -revenue?", the cache can return results for semantically similar queries -like "Show Q4 revenue" or "Q4 revenue please" even though the exact text -is different. +queries, not just exact text matching. When you search for "What was +Q4 revenue?", the cache can return results for semantically similar +queries like "Show Q4 revenue" or "Q4 revenue please" even though +the exact text is different. -Traditional caching requires exact string matches, while semantic caching -matches based on similarity scores (typically 90-98%). +Traditional caching requires exact string matches, while semantic +caching matches based on similarity scores (typically 90-98%). ### Why use pg_semantic_cache instead of a traditional cache like Redis? -Use pg_semantic_cache when: +Use pg_semantic_cache when you need one of the following +capabilities: -- Queries are phrased differently but mean the same thing (LLM - applications, natural language queries). +- Queries are phrased differently but mean the same thing, such as + in LLM applications or natural language queries. - You need semantic understanding of query similarity. -- You're already using PostgreSQL and want tight integration. +- You are already using PostgreSQL and want tight integration. - You need persistent caching with complex querying capabilities. -Use traditional caching (Redis, Memcached) when: +Use traditional caching solutions such as Redis or Memcached when +you need one of the following capabilities: - You need exact key-value matching. - Sub-millisecond latency is critical. - Queries are deterministic and rarely vary. - You need distributed caching across multiple services. -Use both: pg_semantic_cache for semantic matching + Redis for hot-path -exact matches! +You can use both pg_semantic_cache for semantic matching and Redis +for hot-path exact matches. ### How does it compare to application-level caching? @@ -49,19 +67,23 @@ caching: ### Is it production-ready? -Yes! pg_semantic_cache is production-ready and has the following +Yes, pg_semantic_cache is production-ready and has the following characteristics: -- Written in C using stable PostgreSQL APIs -- Tested with PostgreSQL 14-18 -- Used in production environments -- Small, focused codebase (~900 lines) -- No complex dependencies (just pgvector) + +- The extension is written in C using stable PostgreSQL APIs. +- The extension is tested with PostgreSQL 14-18. +- The extension is used in production environments. +- The extension has a small, focused codebase of about 900 lines. +- The extension has no complex dependencies other than pgvector. ## Installation & Setup +The following sections address common installation and setup +concerns for pg_semantic_cache. + ### Do I need to install pgvector separately? -Yes, pgvector is a required dependency. Install it before +Yes, pgvector is a required dependency. Install it before installing pg_semantic_cache: ```bash @@ -77,16 +99,15 @@ make && sudo make install ### Can I use it with managed PostgreSQL services? -It depends on the service: - -- Self-hosted PostgreSQL: Yes -- AWS RDS: Yes (if you can install extensions) -- Azure Database for PostgreSQL: Yes (flexible server) -- Google Cloud SQL: Check extension support -- Supabase: Yes (pgvector supported) -- Neon: Yes (pgvector supported) +It depends on the service; check if your provider supports custom C +extensions and pgvector: -Check if your provider supports custom C extensions and pgvector. +- Self-hosted PostgreSQL: Yes. +- AWS RDS: Yes, if you can install extensions. +- Azure Database for PostgreSQL: Yes, on flexible server. +- Google Cloud SQL: Check extension support. +- Supabase: Yes, pgvector is supported. +- Neon: Yes, pgvector is supported. ### What PostgreSQL versions are supported? @@ -107,56 +128,65 @@ ALTER EXTENSION pg_semantic_cache UPDATE TO '0.4.0'; ## Performance +The following sections address performance characteristics and +optimization strategies for pg_semantic_cache. + ### How fast are cache lookups? Cache lookups are very fast, with the following performance characteristics: -Target: < 5ms for most queries +Target performance is less than 5ms for most queries. + +Typical performance characteristics: -Typical Performance: +- IVFFlat index: 2-5ms. +- HNSW index: 1-3ms. +- Without index: 50-500ms (not recommended). -- IVFFlat index: 2-5ms -- HNSW index: 1-3ms -- Without index: 50-500ms (don't do this!) +Factors affecting speed include the following: -Factors affecting speed: +- Cache size, where more entries result in slightly slower lookups. +- Vector dimension, such as 1536 versus 3072. +- Index type and parameters. +- PostgreSQL configuration, particularly work_mem. -- Cache size (more entries = slightly slower) -- Vector dimension (1536 vs 3072) -- Index type and parameters -- PostgreSQL configuration (work_mem) +In the following example, the `\timing` command measures the lookup +speed: ```sql --- Test your lookup speed \timing on SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); ``` ### How much storage does it use? -Storage requirements vary based on vector dimensions and result sizes: +Storage requirements vary based on vector dimensions and result +sizes. -Storage per entry: +Storage per entry includes the following: -- Vector embedding: ~6KB (1536 dimensions) -- Result data: Varies (your cached JSONB) -- Metadata: ~200 bytes -- Total: 6KB + your data size +- Vector embedding requires approximately 6KB for 1536 dimensions. +- Result data varies based on your cached JSONB. +- Metadata requires approximately 200 bytes. +- Total storage is 6KB plus your data size. -Example: +Example storage requirements: -- 10K entries with 10KB results each = ~160MB -- 100K entries with 5KB results each = ~1.1GB +- 10K entries with 10KB results each require approximately 160MB. +- 100K entries with 5KB results each require approximately 1.1GB. ### What's the maximum cache size? -There's no hard limit, but consider the following practical +There is no hard limit, but consider the following practical considerations: -- < 100K entries: Excellent performance with default settings -- 100K - 1M entries: Increase IVFFlat lists parameter -- > 1M entries: Consider partitioning or HNSW index +- Fewer than 100K entries provide excellent performance with + default settings. +- Between 100K and 1M entries require increasing the IVFFlat lists + parameter. +- More than 1M entries require considering partitioning or the HNSW + index. Use the following command to configure max size: @@ -170,33 +200,40 @@ WHERE key = 'max_cache_size_mb'; Yes, but consider the following factors: -- Large results (> 1MB) consume more storage -- Serializing/deserializing large JSONB has overhead -- Consider caching aggregated results instead of full datasets +- Large results greater than 1MB consume more storage. +- Serializing and deserializing large JSONB has overhead. +- Consider caching aggregated results instead of full datasets. + +In the following example, caching aggregated results instead of full +datasets reduces storage overhead: ```sql -- Don't cache this: -SELECT * FROM huge_table; -- 100MB result +SELECT * FROM huge_table; -- Cache this instead: SELECT COUNT(*), AVG(value), summary_stats -FROM huge_table; -- 1KB result +FROM huge_table; ``` ## Embeddings +The following sections address embedding models and their use with +pg_semantic_cache. + ### What embedding models can I use? -Any embedding model that produces fixed-dimension vectors: +Any embedding model that produces fixed-dimension vectors works with +the extension. -Popular Models: +Popular models include the following: -- OpenAI text-embedding-ada-002 (1536 dim) -- OpenAI text-embedding-3-small (1536 dim) -- OpenAI text-embedding-3-large (3072 dim) -- Cohere embed-english-v3.0 (1024 dim) -- Sentence Transformers all-MiniLM-L6-v2 (384 dim) -- Sentence Transformers all-mpnet-base-v2 (768 dim) +- OpenAI text-embedding-ada-002 with 1536 dimensions. +- OpenAI text-embedding-3-small with 1536 dimensions. +- OpenAI text-embedding-3-large with 3072 dimensions. +- Cohere embed-english-v3.0 with 1024 dimensions. +- Sentence Transformers all-MiniLM-L6-v2 with 384 dimensions. +- Sentence Transformers all-mpnet-base-v2 with 768 dimensions. Use the following commands to configure dimension: @@ -207,14 +244,14 @@ SELECT semantic_cache.rebuild_index(); ### Do I need to generate embeddings myself? -Yes. pg_semantic_cache stores and searches embeddings, but doesn't +Yes, pg_semantic_cache stores and searches embeddings, but does not generate them. -Typical workflow: +The typical workflow includes the following steps: -1. Generate embedding using your chosen model/API -2. Pass embedding to `cache_query()` or `get_cached_result()` -3. Extension handles similarity search +1. Generate embedding using your chosen model or API. +2. Pass embedding to `cache_query()` or `get_cached_result()`. +3. The extension handles similarity search. See [Use Cases](use_cases.md) for integration examples. @@ -222,14 +259,12 @@ See [Use Cases](use_cases.md) for integration examples. Yes, but you need to rebuild the cache: +In the following example, changing the vector dimension and +rebuilding the index clears all cached data: + ```sql --- Change dimension SELECT semantic_cache.set_vector_dimension(3072); - --- Rebuild (WARNING: clears all cached data) SELECT semantic_cache.rebuild_index(); - --- Re-cache entries with new embeddings ``` ### What similarity threshold should I use? @@ -237,64 +272,80 @@ SELECT semantic_cache.rebuild_index(); Use the following recommendations to select an appropriate similarity threshold: -- 0.98-0.99: Nearly identical queries (financial data, strict matching) -- 0.95-0.97: Very similar queries (recommended starting point) -- 0.90-0.94: Similar queries (good for exploratory queries) -- 0.85-0.89: Somewhat related (use with caution) -- < 0.85: Too lenient (likely irrelevant results) - -Start with 0.95 and adjust based on your hit rate: - -- Hit rate too low? Lower threshold (0.92) -- Getting irrelevant results? Raise threshold (0.97) +- Values from 0.98 to 0.99 match nearly identical queries, suitable + for financial data or strict matching. +- Values from 0.95 to 0.97 match very similar queries and provide a + recommended starting point. +- Values from 0.90 to 0.94 match similar queries and work well for + exploratory queries. +- Values from 0.85 to 0.89 match somewhat related queries and should + be used with caution. +- Values less than 0.85 are too lenient and likely produce + irrelevant results. + +Start with 0.95 and adjust based on your hit rate by lowering the +threshold to 0.92 if the hit rate is too low, or raising the +threshold to 0.97 if you get irrelevant results. ## Configuration +The following sections address configuration options and settings +for pg_semantic_cache. + ### How do I choose between IVFFlat and HNSW? -Choose the index type based on your workload characteristics: +Choose the index type based on your workload characteristics. -Use IVFFlat (default) when: +Use IVFFlat (default) when you have one of the following +requirements: -- Cache updates frequently -- Build time matters -- < 100K entries -- Good enough recall (95%+) +- Cache updates frequently. +- Build time matters. +- Fewer than 100K entries. +- Good enough recall of 95% or higher. -Use HNSW when: +Use HNSW when you have one of the following requirements: -- Maximum accuracy needed -- Cache mostly read-only -- Have pgvector 0.5.0+ -- Can afford slower builds +- Maximum accuracy is needed. +- Cache is mostly read-only. +- You have pgvector 0.5.0 or later. +- You can afford slower builds. + +In the following example, the `set_index_type` function switches to +the HNSW index: ```sql --- Switch to HNSW SELECT semantic_cache.set_index_type('hnsw'); SELECT semantic_cache.rebuild_index(); ``` ### What TTL should I set? -The TTL depends on your data freshness requirements: +The TTL depends on your data freshness requirements. + +In the following example, different TTL values are set based on data +freshness requirements: ```sql -- Real-time data (stock prices, weather) -ttl_seconds := 60 -- 1 minute +ttl_seconds := 60 -- Dynamic data (user dashboards, reports) -ttl_seconds := 1800 -- 30 minutes +ttl_seconds := 1800 -- Semi-static data (analytics, LLM responses) -ttl_seconds := 7200 -- 2 hours +ttl_seconds := 7200 -- Static data (reference data) -ttl_seconds := NULL -- Never expires +ttl_seconds := NULL ``` ### How often should I run maintenance? -Follow this recommended maintenance schedule: +Follow this recommended maintenance schedule. + +In the following example, different maintenance operations run at +scheduled intervals: ```sql -- Every 15 minutes: Evict expired entries @@ -307,7 +358,8 @@ SELECT semantic_cache.auto_evict(); ANALYZE semantic_cache.cache_entries; ``` -Set up with pg_cron: +In the following example, pg_cron schedules the cache eviction: + ```sql SELECT cron.schedule('cache-evict', '*/15 * * * *', 'SELECT semantic_cache.evict_expired()'); @@ -315,53 +367,76 @@ SELECT cron.schedule('cache-evict', '*/15 * * * *', ## Troubleshooting +The following sections address common troubleshooting scenarios and +their solutions. + ### Why is my hit rate so low? -Low hit rates typically have one of the following common causes: +Low hit rates typically have one of the following common causes. -1. Threshold too high - ```sql - -- Lower from 0.95 to 0.90 - SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90); - ``` +In the following example, lowering the threshold from 0.95 to 0.90 +may improve hit rates: -2. TTL too short - ```sql - -- Check average entry lifetime - SELECT AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) / 3600 - as avg_age_hours - FROM semantic_cache.cache_entries; - ``` +```sql +SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90); +``` + +In the following example, checking the average entry lifetime helps +determine if TTL is too short: + +```sql +SELECT AVG(EXTRACT(EPOCH FROM (NOW() - created_at))) / 3600 + as avg_age_hours +FROM semantic_cache.cache_entries; +``` -3. Poor embedding quality - - Use better embedding model - - Ensure consistent embedding generation +Poor embedding quality can also cause low hit rates; use a better +embedding model and ensure consistent embedding generation. -4. Cache too small - ```sql - -- Check if entries being evicted too quickly - SELECT * FROM semantic_cache.cache_stats(); - ``` +In the following example, checking cache statistics helps determine +if the cache is too small: + +```sql +SELECT * FROM semantic_cache.cache_stats(); +``` ### Cache lookups are returning no results -Use the following debugging steps to troubleshoot this issue: +Use the following debugging steps to troubleshoot this issue. + +In the following example, checking if the cache has entries is the +first debugging step: ```sql --- 1. Check cache has entries SELECT COUNT(*) FROM semantic_cache.cache_entries; +``` + +In the following example, checking for expired entries helps +identify if entries are being evicted: --- 2. Check for expired entries +```sql SELECT COUNT(*) FROM semantic_cache.cache_entries WHERE expires_at IS NULL OR expires_at > NOW(); +``` --- 3. Try very low threshold +In the following example, trying a very low threshold helps +determine if the similarity threshold is too high: + +```sql SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.70); +``` --- 4. Check vector dimension +In the following example, checking the vector dimension ensures the +embedding dimensions match: + +```sql SELECT semantic_cache.get_vector_dimension(); +``` + +In the following example, manually checking similarity helps +identify the closest matches: --- 5. Manually check similarity +```sql SELECT query_text, (1 - (query_embedding <=> '[...]'::vector)) as similarity @@ -372,34 +447,37 @@ LIMIT 5; ### Extension won't load -If you encounter the following error: +If you encounter the following error, the extension control file is +missing: ```sql ERROR: could not open extension control file ``` -Use this solution: +Use this solution to check the installation and reinstall if +necessary: + ```bash -# Check installation ls -l $(pg_config --sharedir)/extension/pg_semantic_cache* -# Reinstall cd pg_semantic_cache sudo make install -# Verify pgvector installed ls -l $(pg_config --pkglibdir)/vector.so ``` ### Build errors -If you encounter the following build error: +If you encounter the following build error, PostgreSQL development +headers are missing: ```bash fatal error: postgres.h: No such file or directory ``` -Use this solution: +Use this solution to install the development headers for your +platform: + ```bash # Debian/Ubuntu sudo apt-get install postgresql-server-dev-17 @@ -414,81 +492,98 @@ export PATH="/opt/homebrew/opt/postgresql@17/bin:$PATH" ### Out of memory errors -If you encounter the following error: +If you encounter the following out of memory error, try one of the +solutions that follow: ```sql ERROR: out of memory ``` -Try one of these solutions: +In the following example, increasing work_mem provides more memory +for vector operations: + +```sql +SET work_mem = '512MB'; +``` -1. Increase work_mem - ```sql - SET work_mem = '512MB'; - ``` +In the following example, reducing cache size by keeping only 5K +entries frees memory: -2. Reduce cache size - ```sql - SELECT semantic_cache.evict_lru(5000); -- Keep only 5K entries - ``` +```sql +SELECT semantic_cache.evict_lru(5000); +``` -3. Lower vector dimension - ```sql - SELECT semantic_cache.set_vector_dimension(768); -- Use smaller model - SELECT semantic_cache.rebuild_index(); - ``` +In the following example, lowering the vector dimension to 768 +reduces memory requirements: + +```sql +SELECT semantic_cache.set_vector_dimension(768); +SELECT semantic_cache.rebuild_index(); +``` ## Best Practices +The following questions provide guidance on best practices for using +pg_semantic_cache effectively. + ### Should I cache everything? -No! Cache queries that are: +No, you should cache queries selectively. -- Expensive (slow execution) -- Frequently repeated (similar queries) -- Tolerant of slight staleness -- Semantically searchable +Cache queries that have the following characteristics: -Don't cache: +- Expensive queries with slow execution. +- Frequently repeated queries with similar phrasing. +- Queries that are tolerant of slight staleness. +- Queries that are semantically searchable. -- Simple key-value lookups (use Redis) -- Real-time critical data -- Unique, one-off queries -- Queries that must be current +Do not cache the following types of queries: + +- Simple key-value lookups where Redis is more appropriate. +- Real-time critical data. +- Unique, one-off queries. +- Queries that must return current data. ### How do I test if caching helps? -Use the following approach to measure the performance improvement from -caching: +Use the following approach to measure the performance improvement +from caching. + +In the following example, measuring query time without cache +establishes a baseline: ```sql --- Measure query time without cache \timing on SELECT expensive_query(); --- Time: 450.234 ms +``` --- With cache (first call - miss) -SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); --- Time: 3.456 ms (cache miss) + 450.234 ms (execution) +In the following example, the first call to get_cached_result is a +cache miss and executes the query: --- With cache (subsequent calls - hit) +```sql SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); --- Time: 3.456 ms (cache hit) +``` + +In the following example, subsequent calls to get_cached_result are +cache hits and return much faster: --- Speedup: 450 / 3.5 = 128x faster +```sql +SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.95); ``` ### Should I use tags? -Yes! Tags are useful for: +Yes, tags are useful for the following purposes: -- Organization: Group by feature (`ARRAY['dashboard', 'sales']`) -- Bulk invalidation: `invalidate_cache(tag := 'user_123')` -- Analytics: `SELECT * FROM semantic_cache.cache_by_tag` -- Debugging: Find entries by category +- Organization allows you to group by feature. +- Bulk invalidation allows you to invalidate all entries with a tag. +- Analytics allows you to query entries by tag. +- Debugging allows you to find entries by category. + +In the following example, the `cache_query` function tags entries +with application name, feature name, and user ID: ```sql --- Tag everything SELECT semantic_cache.cache_query( query_text, embedding, @@ -501,9 +596,11 @@ SELECT semantic_cache.cache_query( ## Still Have Questions? -Contact us through the following channels: +Review our documentation at the [pgEdge website](https://docs.pgedge.com/). + +Contact us through the following channels for additional support: -- GitHub Issues: [Report bugs or ask - questions](https://github.com/pgedge/pg_semantic_cache/issues) -- Discussions: [Community - discussions](https://github.com/pgedge/pg_semantic_cache/discussions) +- Use GitHub Issues to report bugs or ask questions at + https://github.com/pgedge/pg_semantic_cache/issues +- Use GitHub Discussions for community discussions at + https://github.com/pgedge/pg_semantic_cache/discussions diff --git a/docs/architecture.md b/docs/architecture.md index 7e0e3fa..f6ea3ca 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,15 +1,20 @@ # Architecture -pg_semantic_cache is implemented in pure C using the PostgreSQL extension API -(PGXS), providing: +The pg_semantic_cache extension is implemented in pure C using the PostgreSQL +extension API (PGXS). This implementation approach provides several benefits +for performance and compatibility. -- Small binary size of ~100KB vs 2-5MB for Rust-based extensions. -- Fast build times of 10-30 seconds vs 2-5 minutes. +The extension provides the following benefits: + +- The small binary size is approximately 100KB versus 2-5MB for Rust versions. +- Fast build times range from 10-30 seconds versus 2-5 minutes for Rust. - Immediate compatibility works with new PostgreSQL versions immediately. - Standard packaging is compatible with all PostgreSQL packaging tools. ## How It Works +The following diagram illustrates the semantic cache workflow: + ```mermaid graph LR A[Query] --> B[Generate Embedding] @@ -20,23 +25,25 @@ graph LR F --> G[Return Result] ``` -1. Generate an embedding by converting your query text into a vector embedding - using your preferred model (OpenAI, Cohere, etc.). -2. Check the cache by searching for semantically similar cached queries using - cosine similarity. +The semantic cache operates through the following workflow: + +1. The application generates an embedding by converting query text into a + vector embedding using a preferred model (OpenAI, Cohere, etc.). +2. The extension checks the cache by searching for semantically similar + cached queries using cosine similarity. 3. On a cache hit, if a similar query exists above the similarity threshold, - the cached result is returned. -4. On a cache miss, the actual query is executed and the result is cached with - its embedding for future use. + the extension returns the cached result. +4. On a cache miss, the extension executes the actual query and caches the + result with the embedding for future use. 5. Automatic maintenance evicts expired entries based on TTL and configured policies. - ## Getting Help -- Browse the documentation. -- Report issues at - [GitHub Issues](https://github.com/pgedge/pg_semantic_cache/issues). +The following resources are available for assistance: + +- Browse the [documentation](https://docs.pgedge.com/) for detailed information. +- Report issues at [GitHub Issues](https://github.com/pgedge/pg_semantic_cache/issues). - See [Use Cases](use_cases.md) for practical implementation examples. - Check the [FAQ](FAQ.md) for answers to common questions. diff --git a/docs/configuration.md b/docs/configuration.md index c357dad..21223b6 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1,13 +1,13 @@ # Configuration -This guide describes how to configure pg_semantic_cache for your use case, -including vector dimensions, index types, and cache behavior. +This guide describes how to configure pg_semantic_cache for your use +case, including vector dimensions, index types, and cache behavior. !!! tip "Start Simple" - When configuring semantic caching, begin with simple defaults (1536 - dimensions, IVFFlat, 0.95 threshold) and adjust your system based on - monitoring. + When configuring semantic caching, begin with simple defaults such + as 1536 dimensions, IVFFlat index, and 0.95 threshold, and adjust + your system based on monitoring. !!! warning "Test Before Production" @@ -16,9 +16,9 @@ including vector dimensions, index types, and cache behavior. ## Vector Dimensions -The extension supports configurable embedding dimensions to match your -chosen embedding model. pg_semantic_cache supports the following dimensions -and associated models: +The extension supports configurable embedding dimensions to match +your chosen embedding model. The pg_semantic_cache extension supports +the following dimensions and associated models: | Dimension | Common Models | |-----------|---------------| @@ -32,95 +32,96 @@ and associated models: !!! warning "Rebuild Required" - Changing dimensions requires rebuilding the index, which clears all - cached data. + Changing dimensions requires rebuilding the index, which clears + all cached data. + +In the following example, the `set_vector_dimension` function changes +the vector dimension to 768, and the `rebuild_index` function applies +the change: ```sql --- Set vector dimension (default: 1536) SELECT semantic_cache.set_vector_dimension(768); - --- Rebuild index to apply changes (WARNING: clears cache) SELECT semantic_cache.rebuild_index(); - --- Verify new dimension SELECT semantic_cache.get_vector_dimension(); ``` ### Initial Setup For Custom Dimensions -If you know your embedding model before installation: +If you know your embedding model before installation, configure the +dimensions immediately after creating the extension. + +In the following example, the dimensions are set to 768 right after +creating the extension: ```sql --- Right after CREATE EXTENSION CREATE EXTENSION pg_semantic_cache; - --- Immediately configure dimensions SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); - --- Now start caching ``` ## Vector Index Types -Choose between IVFFlat (fast, approximate) or HNSW (accurate, slower -build). +Choose between IVFFlat for fast approximate searches or HNSW for +accurate searches with slower build times. ### IVFFlat Index (Default) -Best for most use cases - fast lookups with good recall. +The IVFFlat index is best for most use cases and provides fast +lookups with good recall. -Characteristics: -- Lookup Speed: Very fast (< 5ms typical) -- Build Time: Fast -- Recall: Good (95%+) -- Memory: Moderate -- Best For: Production caches with frequent updates +The index provides: + +- very fast lookups (typically under 5ms). +- fast build times. +- excellent recall (95% or higher). +- moderate memory usage. + +This index is best for production caches with frequent updates. + +In the following example, the `set_index_type` function sets the +index type to IVFFlat: ```sql --- Set index type SELECT semantic_cache.set_index_type('ivfflat'); SELECT semantic_cache.rebuild_index(); ``` -IVFFlat Parameters (set during `init_schema()`): +In the following example, the IVFFlat index is configured with 1000 +lists for caches with 100K to 1M entries: ```sql --- Default configuration -lists = 100 -- For < 100K entries - --- For larger caches, increase lists --- Adjust in the init_schema() function or manually: - DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding; CREATE INDEX idx_cache_entries_embedding ON semantic_cache.cache_entries USING ivfflat (query_embedding vector_cosine_ops) -WITH (lists = 1000); -- For 100K-1M entries +WITH (lists = 1000); ``` ### HNSW Index -More accurate but slower to build - requires pgvector 0.5.0+. +The HNSW index is more accurate but slower to build and requires +pgvector 0.5.0 or later. + +Characteristics include the following: -Characteristics: -- Lookup Speed: Fast (1-3ms typical) -- Build Time: Slower -- Recall: Excellent (98%+) -- Memory: Higher -- Best For: Read-heavy caches with infrequent updates +- Lookup Speed is fast at 1-3ms typically. +- Build Time is slower. +- Recall is excellent at 98% or higher. +- Memory usage is higher. +- Best For read-heavy caches with infrequent updates. + +In the following example, the `set_index_type` function sets the +index type to HNSW: ```sql --- Set index type (requires pgvector 0.5.0+) SELECT semantic_cache.set_index_type('hnsw'); SELECT semantic_cache.rebuild_index(); ``` -HNSW Parameters: +In the following example, the HNSW index is configured with `m=16` and +`ef_construction=64` for optimal performance: ```sql --- Adjust manually for optimal performance - DROP INDEX IF EXISTS semantic_cache.idx_cache_entries_embedding; CREATE INDEX idx_cache_entries_embedding ON semantic_cache.cache_entries @@ -130,17 +131,20 @@ WITH (m = 16, ef_construction = 64); ### Index Comparison +The following table compares the performance characteristics of IVFFlat +and HNSW indexes: + | Feature | IVFFlat | HNSW | |---------|---------|------| -| Speed | ⚡⚡⚡ | ⚡⚡ | -| Accuracy | ✓✓ | ✓✓✓ | -| Build Time | ⚡⚡⚡ | ⚡ | -| Memory | 💾 | 💾💾 | +| Speed | Very Fast | Fast | +| Accuracy | Good | Excellent | +| Build Time | Very Fast | Slow | +| Memory | Moderate | High | | Updates | Fast | Slower | ## Cache Configuration -The extension stores configuration in the +The extension stores configuration details in the `semantic_cache.cache_config` table. ### View Current Configuration @@ -153,71 +157,74 @@ SELECT * FROM semantic_cache.cache_config ORDER BY key; ### Key Configuration Parameters -Use the following configuration parameters to control cache settings: +Use the following configuration parameters to control cache settings. #### max_cache_size_mb -Use max_cache_size_mb to specify the maximum cache size in megabytes +Use `max_cache_size_mb` to specify the maximum cache size in megabytes before auto-eviction triggers. +In the following example, the maximum cache size is set to 2GB: + ```sql --- Set to 2GB UPDATE semantic_cache.cache_config SET value = '2000' WHERE key = 'max_cache_size_mb'; - --- Or default: 1000 MB ``` #### default_ttl_seconds -Use default_ttl_seconds to specify the default time-to-live for cached -entries (can be overridden per query). +Use `default_ttl_seconds` to specify the default time-to-live for +cached entries, which can be overridden per query. + +In the following example, the default TTL is set to 2 hours: ```sql --- Set default to 2 hours UPDATE semantic_cache.cache_config SET value = '7200' WHERE key = 'default_ttl_seconds'; - --- Default: 3600 (1 hour) ``` #### eviction_policy Use eviction_policy to specify the automatic eviction strategy when -cache size limit is reached. +the cache size limit is reached. + +In the following example, the eviction policy is set to LRU: ```sql --- Options: 'lru', 'lfu', 'ttl' UPDATE semantic_cache.cache_config SET value = 'lru' WHERE key = 'eviction_policy'; ``` -Eviction Policies: +Eviction policies include the following options: -- lru: Least Recently Used - evicts oldest accessed entries -- lfu: Least Frequently Used - evicts least accessed entries -- ttl: Time To Live - evicts entries closest to expiration +- The lru policy evicts the least recently used entries. +- The lfu policy evicts the least frequently used entries. +- The ttl policy evicts entries closest to expiration. #### similarity_threshold -Use similarity_threshold to specify the default similarity threshold for -cache hits (0.0 - 1.0). +Use similarity_threshold to specify the default similarity threshold +for cache hits, with values from 0.0 to 1.0. + +In the following example, the similarity threshold is set to 0.98 for +more strict matching: ```sql --- More strict matching (fewer hits, more accurate) UPDATE semantic_cache.cache_config SET value = '0.98' WHERE key = 'similarity_threshold'; +``` + +In the following example, the similarity threshold is set to 0.90 for +more lenient matching: --- More lenient matching (more hits, less accurate) +```sql UPDATE semantic_cache.cache_config SET value = '0.90' WHERE key = 'similarity_threshold'; - --- Default: 0.95 (recommended) ``` ## Production Configurations @@ -227,30 +234,30 @@ production environment. ### High-Throughput Configuration -Use the following configuration for applications with thousands of queries -per second: +Use the following configuration options for applications with thousands of +queries per second. + +In the following example, the cache is configured for high throughput +with IVFFlat index, large cache size, LRU eviction, and short TTL: ```sql --- Use IVFFlat with optimized lists SELECT semantic_cache.set_index_type('ivfflat'); SELECT semantic_cache.rebuild_index(); --- Increase cache size UPDATE semantic_cache.cache_config SET value = '5000' WHERE key = 'max_cache_size_mb'; --- Use LRU for fast eviction UPDATE semantic_cache.cache_config SET value = 'lru' WHERE key = 'eviction_policy'; --- Shorter TTL to keep cache fresh UPDATE semantic_cache.cache_config SET value = '1800' WHERE key = 'default_ttl_seconds'; ``` -PostgreSQL settings: +In the following example, PostgreSQL is configured with settings +optimized for high throughput: + ```ini -# postgresql.conf shared_buffers = 8GB effective_cache_size = 24GB work_mem = 512MB @@ -259,71 +266,71 @@ maintenance_work_mem = 2GB ### High-Accuracy Configuration -Use the following configuration for applications requiring maximum precision: +Use the following configuration for applications requiring maximum +precision. + +In the following example, the cache is configured for high accuracy +with HNSW index, strict similarity threshold, and longer TTL: ```sql --- Use HNSW for best recall SELECT semantic_cache.set_index_type('hnsw'); SELECT semantic_cache.rebuild_index(); --- Strict similarity threshold UPDATE semantic_cache.cache_config SET value = '0.98' WHERE key = 'similarity_threshold'; --- Longer TTL for stable results UPDATE semantic_cache.cache_config SET value = '14400' WHERE key = 'default_ttl_seconds'; ``` ### LLM/AI Application Configuration -Use the following configuration settings to optimize caching for expensive AI -API calls: +Use the following configuration settings to optimize caching for +expensive AI API calls. + +In the following example, the cache is configured for LLM +applications with OpenAI ada-002 dimensions, balanced threshold, long +TTL, and large cache size: ```sql --- OpenAI ada-002 dimensions SELECT semantic_cache.set_vector_dimension(1536); SELECT semantic_cache.rebuild_index(); --- Balance between accuracy and coverage UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold'; --- Cache longer (AI responses stable) UPDATE semantic_cache.cache_config SET value = '7200' WHERE key = 'default_ttl_seconds'; --- Large cache for many queries UPDATE semantic_cache.cache_config SET value = '10000' WHERE key = 'max_cache_size_mb'; ``` ### Analytics Query Configuration -The following configuration is well-suited for caching expensive analytical -queries: +The following configuration is well-suited for caching expensive +analytical queries. + +In the following example, the cache is configured for analytics with +standard dimensions, moderate threshold, short TTL, and LFU policy: ```sql --- Use standard dimensions SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); --- Moderate similarity (query variations common) UPDATE semantic_cache.cache_config SET value = '0.90' WHERE key = 'similarity_threshold'; --- Short TTL (data changes frequently) UPDATE semantic_cache.cache_config SET value = '900' WHERE key = 'default_ttl_seconds'; --- LFU policy (popular queries cached longer) UPDATE semantic_cache.cache_config SET value = 'lfu' WHERE key = 'eviction_policy'; ``` ## Monitoring Configuration Impact -Use the following commands to monitor your semantic cache. +You can use system queries to optimize cache usage. ### Check Index Performance @@ -344,38 +351,41 @@ WHERE schemaname = 'semantic_cache'; ### Measure Lookup Times -Use the following commands to measure lookup performance: +Use the following commands to measure lookup performance. + +In the following example, the `\timing` command enables timing before +testing lookup performance: ```sql --- Enable timing \timing on - --- Test lookup SELECT * FROM semantic_cache.get_cached_result( '[0.1, 0.2, ...]'::text, 0.95 ); ``` -Target: < 5ms for most queries +Target performance is less than 5ms for most queries. ### Cache Hit Rate -Use the following query to monitor cache hit rate: +Use the following query to monitor cache hit rate. + +In the following example, the `cache_stats` function monitors the +cache hit rate: ```sql --- Monitor hit rate with current config SELECT * FROM semantic_cache.cache_stats(); ``` -Target: > 70% for effective caching +Target hit rate is greater than 70% for effective caching. ### Tuning Checklist Follow this checklist when tuning your cache configuration: - Choose a dimension matching your embedding model. -- Select an index type based on workload (IVFFlat for most cases). +- Select an index type based on workload, using IVFFlat for most + cases. - Set a similarity threshold based on accuracy requirements. - Configure cache size based on available memory. - Choose an eviction policy matching access patterns. @@ -384,55 +394,48 @@ Follow this checklist when tuning your cache configuration: ### Common Mistakes -The following common mistakes have simple remediations: +The following common mistakes have simple remediations. #### Using Wrong Dimensions -```sql --- Extension configured for 1536, but sending 768-dim --- vectors --- Result: Error or poor performance -``` +If the extension is configured for 1536 dimensions but you send 768 +dimension vectors, the result is an error or poor performance. + +You should use matching model dimensions. -You should use matching model dimensions: +In the following example, the vector dimension is set to match the +model: ```sql --- Match your model SELECT semantic_cache.set_vector_dimension(768); SELECT semantic_cache.rebuild_index(); ``` #### Too Strict Threshold -```sql -UPDATE semantic_cache.cache_config SET value = '0.99' -WHERE key = 'similarity_threshold'; --- Result: Very low hit rate -``` +If the similarity threshold is set too high at 0.99, the result is a +very low hit rate. + +Use a more balanced threshold. -Use a more balanced threshold: +In the following example, the threshold is set to 0.93 to allow +reasonable variation: ```sql UPDATE semantic_cache.cache_config SET value = '0.93' WHERE key = 'similarity_threshold'; --- Allows reasonable variation ``` #### Forgetting To Rebuild +If you set the vector dimension but forget to rebuild the index, the +old index is still in use. You should rebuild your cache to use the new index. + +In the following example, the index is rebuilt after changing the +dimension: + ```sql SELECT semantic_cache.set_vector_dimension(768); --- Forgot: SELECT semantic_cache.rebuild_index(); --- Result: Old index still in use! +SELECT semantic_cache.rebuild_index(); ``` -Rebuild your cache to use the new index! - -## Next Steps - -- [Functions Reference](functions/index.md) - Learn about all - configuration functions. -- [Monitoring](monitoring.md) - Track performance and tune - configuration. -- [Use Cases](use_cases.md) - See configuration examples in - practice. diff --git a/docs/development.md b/docs/development.md index 0552bdd..eef64b3 100644 --- a/docs/development.md +++ b/docs/development.md @@ -1,49 +1,59 @@ # Development Resources -Developer contributions are welcome! This extension is built with standard PostgreSQL C APIs. +We welcome developer contributions to the pg_semantic_cache extension. The +extension is built with standard PostgreSQL C APIs and follows the +PostgreSQL extension development model. -To create a development installation: +## Contributing to the Project -1. Fork the repository. +To create a development installation and contribute to the project, follow +these steps in order: + +1. Fork the repository on GitHub. 2. Create a feature branch for your changes. 3. Make your changes to the codebase. 4. Run the test suite with `make installcheck`. 5. Submit a pull request with your changes. -Code guidelines: +The following guidelines apply to all code contributions: - Follow the existing code style throughout the project. - Add tests for any new features you implement. - Update the documentation to reflect your changes. -- Ensure your changes are compatible with PostgreSQL versions 14 through 18. - ---- +- Ensure your changes are compatible with PostgreSQL 14 through 18. ## Building From Source -The extension uses the standard PostgreSQL PGXS build system for compilation and installation. +The extension uses the standard PostgreSQL PGXS build system for +compilation and installation. The PGXS system provides a consistent build +environment across all PostgreSQL versions. +In the following example, the `make` commands build and install the +extension from source: ```bash -# Standard build make clean && make sudo make install -# Run tests make installcheck -# Development build with debug symbols make CFLAGS="-g -O0" clean all -# View build configuration make info ``` -## Performing a Multi-Version PostgreSQL Build +The first command builds the extension. The second command runs the test +suite. The third command creates a development build with debug symbols. +The fourth command displays the build configuration. + +## Building for Multiple PostgreSQL Versions -The extension supports building for multiple PostgreSQL versions in sequence. +The extension supports building for multiple PostgreSQL versions in +sequence. This approach is useful for package maintainers and multi-version +testing environments. -Build for multiple PostgreSQL versions simultaneously: +In the following example, the `for` loop builds the extension for +PostgreSQL versions 14 through 18: ```bash for PG in 14 15 16 17 18; do diff --git a/docs/functions.md b/docs/functions.md index d39adbe..acce092 100644 --- a/docs/functions.md +++ b/docs/functions.md @@ -1,44 +1,59 @@ # Using pg_semantic_cache Functions -The extension provides a complete set of SQL functions for caching, eviction, monitoring, and configuration. This page provides a comprehensive reference for all available functions in the pg_semantic_cache extension. +The extension provides a complete set of SQL functions for caching, +eviction, monitoring, and configuration. This page provides a +comprehensive reference for all available functions in the +pg_semantic_cache extension. ## Function Reference +The pg_semantic_cache extension includes the following functions: + | Function | Description | |----------|-------------| | [auto_evict](functions/auto_evict.md) | Automatically evicts entries based on configured policy (LRU, LFU, or TTL). | -| [cache_hit_rate](functions/cache_hit_rate.md) | Gets current cache hit rate as a percentage. | -| [cache_query](functions/cache_query.md) | Stores a query result with its vector embedding in the cache. | +| [cache_hit_rate](functions/cache_hit_rate.md) | Gets the current cache hit rate as a percentage. | +| [cache_query](functions/cache_query.md) | Stores a query result with the vector embedding in the cache. | | [cache_stats](functions/cache_stats.md) | Gets comprehensive cache statistics including hits, misses, and hit rate. | | [clear_cache](functions/clear_cache.md) | Removes all cache entries (use with caution). | | [evict_expired](functions/evict_expired.md) | Removes all expired cache entries based on TTL. | -| [evict_lfu](functions/evict_lfu.md) | Evicts least frequently used entries, keeping only specified count. | -| [evict_lru](functions/evict_lru.md) | Evicts least recently used entries, keeping only specified count. | +| [evict_lfu](functions/evict_lfu.md) | Evicts least frequently used entries, keeping only the specified count. | +| [evict_lru](functions/evict_lru.md) | Evicts least recently used entries, keeping only the specified count. | | [get_cached_result](functions/get_cached_result.md) | Retrieves a cached result by semantic similarity search. | -| [get_cost_savings](functions/get_cost_savings.md) | Calculates estimated cost savings from cache usage. | +| [get_cost_savings](functions/get_cost_savings.md) | Calculates the estimated cost savings from cache usage. | | [get_index_type](functions/get_index_type.md) | Gets the current vector index type (IVFFlat or HNSW). | | [get_vector_dimension](functions/get_vector_dimension.md) | Gets the current vector embedding dimension. | -| [init_schema](functions/init_schema.md) | Initializes cache schema and creates required tables, indexes, and views. | +| [init_schema](functions/init_schema.md) | Initializes the cache schema and creates required tables, indexes, and views. | | [invalidate_cache](functions/invalidate_cache.md) | Invalidates cache entries by pattern matching or tags. | | [log_cache_access](functions/log_cache_access.md) | Logs cache access events for debugging and analysis. | | [rebuild_index](functions/rebuild_index.md) | Rebuilds the vector similarity index for optimal performance. | | [set_index_type](functions/set_index_type.md) | Sets the vector index type for similarity search. | | [set_vector_dimension](functions/set_vector_dimension.md) | Sets the vector embedding dimension. | +## Core Functions + +The core functions initialize the cache and manage query storage and +retrieval. + +### init_schema() -### Core Functions +The `init_schema()` function initializes the cache schema and creates all +required tables, indexes, and views. -#### `init_schema()` -Initialize the cache schema, creating all required tables, indexes, and views. +In the following example, the `init_schema()` function sets up the +semantic cache infrastructure: ```sql SELECT semantic_cache.init_schema(); ``` -#### `cache_query(query_text, embedding, result_data, ttl_seconds, tags)` -Store a query result with its embedding for future retrieval. +### cache_query(query_text, embedding, result_data, ttl_seconds, tags) + +The `cache_query()` function stores a query result with the corresponding +vector embedding for future retrieval. **Parameters:** + - `query_text` (text) - The original query text - `embedding` (text) - Vector embedding as text: `'[0.1, 0.2, ...]'` - `result_data` (jsonb) - The query result to cache @@ -47,72 +62,105 @@ Store a query result with its embedding for future retrieval. **Returns:** `bigint` - Cache entry ID -#### `get_cached_result(embedding, similarity_threshold, max_age_seconds)` -Retrieve a cached result by semantic similarity. +### get_cached_result(embedding, similarity_threshold, max_age_seconds) + +The `get_cached_result()` function retrieves a cached result by semantic +similarity. **Parameters:** + - `embedding` (text) - Query embedding to search for - `similarity_threshold` (float4) - Minimum similarity (0.0 to 1.0) - `max_age_seconds` (int) - Maximum age in seconds (NULL = any age) -**Returns:** `record` - `(found boolean, result_data jsonb, similarity_score float4, age_seconds int)` +**Returns:** `record` - `(found boolean, result_data jsonb, +similarity_score float4, age_seconds int)` +## Cache Eviction ---- +Multiple eviction strategies are available to manage cache size and +freshness. The extension supports TTL-based, LRU, LFU, and automatic +eviction policies. -### Cache Eviction +### evict_expired() -Multiple eviction strategies are available to manage cache size and freshness. +The `evict_expired()` function removes all expired cache entries. -#### `evict_expired()` -Remove all expired cache entries. +In the following example, the `evict_expired()` function removes entries +that have exceeded their TTL: ```sql -SELECT semantic_cache.evict_expired(); -- Returns count of evicted entries +SELECT semantic_cache.evict_expired(); ``` -#### `evict_lru(keep_count)` -Evict least recently used entries, keeping only the specified number of most recent entries. +The function returns the count of evicted entries. + +### evict_lru(keep_count) + +The `evict_lru()` function evicts least recently used entries and keeps +only the specified number of most recent entries. + +In the following example, the `evict_lru()` function keeps only the 1000 +most recently used entries: ```sql -SELECT semantic_cache.evict_lru(1000); -- Keep only 1000 most recently used entries +SELECT semantic_cache.evict_lru(1000); ``` -#### `evict_lfu(keep_count)` -Evict least frequently used entries, keeping only the specified number of most frequently used entries. +### evict_lfu(keep_count) + +The `evict_lfu()` function evicts least frequently used entries and keeps +only the specified number of most frequently used entries. + +In the following example, the `evict_lfu()` function keeps only the 1000 +most frequently used entries: ```sql -SELECT semantic_cache.evict_lfu(1000); -- Keep only 1000 most frequently used entries +SELECT semantic_cache.evict_lfu(1000); ``` -#### `auto_evict()` -Automatically evict entries based on configured policy (LRU, LFU, or TTL). +### auto_evict() + +The `auto_evict()` function automatically evicts entries based on the +configured policy (LRU, LFU, or TTL). + +In the following example, the `auto_evict()` function applies the +configured eviction policy: ```sql SELECT semantic_cache.auto_evict(); ``` -#### `clear_cache()` -Remove **all** cache entries (use with caution). +### clear_cache() + +The `clear_cache()` function removes all cache entries. Use this function +with caution in production environments. + +In the following example, the `clear_cache()` function removes all cached +entries: ```sql SELECT semantic_cache.clear_cache(); ``` ---- +## Statistics and Monitoring -### Statistics & Monitoring +Built-in functions and views provide real-time visibility into cache +performance. The extension tracks hits, misses, and overall cache health. -Built-in functions and views provide real-time visibility into cache performance. +### cache_stats() -#### `cache_stats()` -Get comprehensive cache statistics. +The `cache_stats()` function returns comprehensive cache statistics. + +In the following example, the `cache_stats()` function retrieves current +cache performance metrics: ```sql SELECT * FROM semantic_cache.cache_stats(); ``` **Returns:** + ``` total_entries | Total number of cached queries total_hits | Total number of cache hits @@ -120,4 +168,5 @@ total_misses | Total number of cache misses hit_rate_percent | Hit rate as a percentage ``` -**Note:** For more detailed statistics including cache size, expired entries, and access patterns, use the `semantic_cache.cache_health` view. +For more detailed statistics including cache size, expired entries, and +access patterns, use the `semantic_cache.cache_health` view. diff --git a/docs/installation.md b/docs/installation.md index 0173931..ab0bc07 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -1,10 +1,7 @@ # Installation -This guide covers installing pg_semantic_cache from source on various platforms. - -## Prerequisites - -Before installing pg_semantic_cache, you must install: +You can install pg_semantic_cache from source on various platforms. Before +installing pg_semantic_cache, you must install: - PostgreSQL: Version 14, 15, 16, 17, or 18 - pgvector: Must be installed before pg_semantic_cache @@ -12,10 +9,10 @@ Before installing pg_semantic_cache, you must install: - make: GNU Make or compatible - PostgreSQL Development Headers: Required for building extensions -### Platform-Specific Packages +## Platform-Specific Packages -Use the following platform-specific commands to ensure that your host is -prepared for pg_semantic_cache: +Use the following platform-specific commands to ensure that your host +is prepared for pg_semantic_cache: === "Debian/Ubuntu" ```bash @@ -66,8 +63,22 @@ prepared for pg_semantic_cache: ## Building from Source -After installing the prerequisites, build pg_semantic_cache using the standard -PostgreSQL extension build commands. +You can use the following command to check your build environment and +configuration settings before compiling. + +```bash +make info +``` + +Output includes the following information: + +- The PostgreSQL version and paths. +- The compiler and flags. +- The installation directories. +- The extension version. + +After configuring your build environment, build pg_semantic_cache using the +standard PostgreSQL extension build commands: ```bash # Clone the repository @@ -82,12 +93,15 @@ make sudo make install ``` -### Multi-Version PostgreSQL +A development build includes verbose output and debugging information; to +perform a development build, use the following command: -Use PG_CONFIG to target specific PostgreSQL versions when multiple versions -are installed. +```bash +make dev-install +``` -If you have multiple PostgreSQL versions installed: +If you have multiple PostgreSQL versions installed, you can use PG_CONFIG to +target specific PostgreSQL versions when multiple versions are installed. ```bash # Specify pg_config explicitly @@ -99,33 +113,10 @@ for PG in 14 15 16 17 18; do done ``` -### Development Build - -Development builds include verbose output and debugging information. - -For development with verbose output: - -```bash -make dev-install -``` - -### View Build Configuration +### Verifying the Installation -Check your build environment and configuration settings before compiling. - -```bash -make info -``` - -Output includes: -- PostgreSQL version and paths -- Compiler and flags -- Installation directories -- Extension version - -## Verifying Installation - -After installation completes, verify that all extension files are in place. +After the installation completes, verify that all extension files are in +place. Check for the extension files: @@ -147,40 +138,11 @@ Use the following command to confirm that pgvector is installed: ls -lh $(pg_config --pkglibdir)/vector.so ``` -## PostgreSQL Configuration - -Optimize PostgreSQL settings for better performance with semantic caching. - -### Update postgresql.conf - -pg_semantic_cache works out of the box without special configuration, but for -optimal performance with large caches: - -```ini -# Recommended for production with large caches -shared_buffers = 4GB # 25% of RAM -effective_cache_size = 12GB # 75% of RAM -work_mem = 256MB # For vector operations -maintenance_work_mem = 1GB # For index creation - -# Enable query timing (optional, for monitoring) -track_io_timing = on -``` - -Restart PostgreSQL after making configuration changes: - -```bash -# Systemd -sudo systemctl restart postgresql - -# Or using pg_ctl -pg_ctl restart -D /var/lib/postgresql/data -``` - ## Creating the Extension -Create the extension in your PostgreSQL database to begin using semantic -caching. Open the psql command line, and run the following commands: +Create the extension in your PostgreSQL database to begin using +semantic caching. Open the psql command line, and run the following +commands: ```sql -- Connect to your database @@ -193,7 +155,9 @@ CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; -- Verify installation -SELECT extname, extversion FROM pg_extension WHERE extname IN ('vector', 'pg_semantic_cache'); +SELECT extname, extversion +FROM pg_extension +WHERE extname IN ('vector', 'pg_semantic_cache'); ``` Expected output: @@ -204,9 +168,10 @@ Expected output: pg_semantic_cache | 0.1.0-beta3 ``` -### Verify Schema Creation +### Verifying Schema Creation -Check that the semantic_cache schema and tables were created successfully. +Check that the semantic_cache schema and tables were created +successfully. ```sql -- Check that schema and tables were created @@ -216,12 +181,40 @@ Check that the semantic_cache schema and tables were created successfully. SELECT * FROM semantic_cache.cache_health; ``` +## Optimizing the PostgreSQL Configuration -## Testing Installation +You can optimize PostgreSQL settings for better performance with semantic +caching by updating the `postgresql.conf` file. -Validate your installation by running the test suite or manual tests. +The pg_semantic_cache extension works out of the box without special +configuration, but for optimal performance with large caches use the +following settings: -Run the included test suite: +```ini +# Recommended for production with large caches +shared_buffers = 4GB # 25% of RAM +effective_cache_size = 12GB # 75% of RAM +work_mem = 256MB # For vector operations +maintenance_work_mem = 1GB # For index creation + +# Enable query timing (optional, for monitoring) +track_io_timing = on +``` + +Restart PostgreSQL after making configuration changes: + +```bash +# Systemd +sudo systemctl restart postgresql + +# Or using pg_ctl +pg_ctl restart -D /var/lib/postgresql/data +``` + +## Testing the Installation + +Validate your installation by running the test suite or manual tests. You can +use the following command to run the included test suite: ```bash # Requires a running PostgreSQL instance @@ -240,10 +233,8 @@ Or run manual tests: ## Uninstalling -You can remove pg_semantic_cache from your database and system when it is no -longer needed. - -Use the following command to remove the extension from your database: +You can remove pg_semantic_cache from your database and system when it +is no longer needed. Use the following command: ```sql DROP EXTENSION IF EXISTS pg_semantic_cache CASCADE; @@ -256,8 +247,9 @@ cd pg_semantic_cache sudo make uninstall ``` -This removes: -- Shared library (`.so` file) -- Control file -- SQL installation files +This removes the following files: + +- The shared library file with the .so extension. +- The control file. +- The SQL installation files. diff --git a/docs/integration.md b/docs/integration.md index c0c4169..7cdd833 100644 --- a/docs/integration.md +++ b/docs/integration.md @@ -1,15 +1,17 @@ -# Integration Examples +# Sample Integrations This page provides integration examples for using pg_semantic_cache with -popular programming languages and embedding providers. +popular programming languages and embedding providers. The examples +demonstrate cache integration patterns for Python and Node.js applications. ## Python with OpenAI The following example demonstrates how to integrate the semantic cache with -OpenAI embeddings using Python and the psycopg2 library. +OpenAI embeddings using Python and the `psycopg2` library. The integration +provides a simple Python class that wraps the cache operations. -In the following example, the `SemanticCache` class wraps the cache functions -and handles embedding generation through the OpenAI API. +In the following example, the `SemanticCache` class wraps the cache +functions and handles embedding generation through the OpenAI API: ```python import psycopg2 @@ -62,7 +64,7 @@ class SemanticCache: """, (embedding, similarity, max_age)) result = cur.fetchone() - if result and result[0]: # Cache hit + if result and result[0]: print(f"Cache HIT (similarity: {result[2]:.3f}, age: {result[3]}s)") return json.loads(result[1]) else: @@ -77,34 +79,29 @@ class SemanticCache: values = cur.fetchone() return dict(zip(columns, values)) -# Usage example cache = SemanticCache( conn_string="dbname=mydb user=postgres", openai_api_key="sk-..." ) -# Try to get from cache, compute if miss def get_revenue_data(query: str) -> Dict: result = cache.get(query, similarity=0.95) if result: - return result # Cache hit! + return result - # Cache miss - compute the result - result = expensive_database_query() # Your expensive query here + result = expensive_database_query() cache.cache(query, result, ttl=3600, tags=['revenue', 'analytics']) return result -# Example queries data1 = get_revenue_data("What was Q4 2024 revenue?") data2 = get_revenue_data("Show me revenue for last quarter") data3 = get_revenue_data("Q4 sales figures?") -# View statistics print(cache.stats()) ``` -The preceding example demonstrates three key operations: +The preceding example demonstrates the following key operations: - The cache initialization with database connection and API credentials. - The automatic fallback from cache lookup to computation when needed. @@ -113,10 +110,11 @@ The preceding example demonstrates three key operations: ## Node.js with OpenAI The following example shows how to use the semantic cache with Node.js and -the OpenAI API through an asynchronous interface. +the OpenAI API through an asynchronous interface. The Node.js integration +uses modern async/await patterns for clean asynchronous code. -In the following example, the `SemanticCache` class uses async/await patterns -to handle database operations and embedding generation. +In the following example, the `SemanticCache` class uses async/await +patterns to handle database operations and embedding generation: ```javascript const { Client } = require('pg'); @@ -171,7 +169,6 @@ class SemanticCache { } } -// Usage const cache = new SemanticCache( { host: 'localhost', database: 'mydb', user: 'postgres' }, 'sk-...' @@ -189,9 +186,8 @@ async function getRevenueData(query) { ## Additional Resources -The repository includes additional integration examples and test files. - -For more comprehensive examples, refer to the following files: +The repository includes additional integration examples and test files; see +the following resources for more comprehensive examples: - The `examples/usage_examples.sql` file contains comprehensive SQL examples. - The `test/benchmark.sql` file provides performance testing examples. diff --git a/docs/monitoring.md b/docs/monitoring.md index e724aec..f87c72a 100644 --- a/docs/monitoring.md +++ b/docs/monitoring.md @@ -1,15 +1,21 @@ # Monitoring -Comprehensive guide to monitoring and optimizing pg_semantic_cache performance. +This guide provides comprehensive information about monitoring and +optimizing pg_semantic_cache performance. ## Quick Health Check +The following sections describe how to perform a quick health check on +your semantic cache. + +In the following example, the `cache_health` view provides an overview +of cache performance metrics: + ```sql --- View overall cache health SELECT * FROM semantic_cache.cache_health; ``` -**Sample Output:** +The query produces output similar to the following example: ``` total_entries | expired_entries | total_size | avg_access_count | total_hits | total_misses | hit_rate_pct ---------------+-----------------+------------+------------------+------------+--------------+-------------- @@ -18,9 +24,16 @@ SELECT * FROM semantic_cache.cache_health; ## Key Metrics -### 1. Cache Hit Rate +The following sections describe the key metrics for monitoring cache +performance and effectiveness. + +### Cache Hit Rate -The most important metric for cache effectiveness. +The cache hit rate is the most important metric for measuring cache +effectiveness. + +In the following example, the query calculates the current hit rate +with a rating based on performance thresholds: ```sql -- Get current hit rate @@ -30,26 +43,30 @@ SELECT (total_hits + total_misses) as total_queries, hit_rate_percent, CASE - WHEN hit_rate_percent >= 80 THEN '🟢 Excellent' - WHEN hit_rate_percent >= 60 THEN '🟡 Good' - WHEN hit_rate_percent >= 40 THEN '🟠 Fair' - ELSE '🔴 Poor' + WHEN hit_rate_percent >= 80 THEN 'Excellent' + WHEN hit_rate_percent >= 60 THEN 'Good' + WHEN hit_rate_percent >= 40 THEN 'Fair' + ELSE 'Poor' END as rating FROM semantic_cache.cache_stats(); ``` -**Target Hit Rates:** -- LLM/AI: 70-85% -- Analytics: 60-75% -- API Caching: 75-90% -- Real-time Data: 40-60% +The following list shows target hit rates for different use cases: + +- LLM and AI applications should achieve 70 to 85 percent hit rates. +- Analytics workloads should achieve 60 to 75 percent hit rates. +- API caching should achieve 75 to 90 percent hit rates. +- Real-time data should achieve 40 to 60 percent hit rates. + +### Cache Size and Growth -### 2. Cache Size and Growth +The cache size and growth metrics help you monitor storage usage and +identify growth trends. -Monitor storage usage and growth trends. +In the following example, the query calculates the current cache size +and entry count statistics: ```sql --- Current size and entry count SELECT COUNT(*) as total_entries, pg_size_pretty(SUM(result_size_bytes)::BIGINT) as total_size, @@ -59,38 +76,41 @@ SELECT FROM semantic_cache.cache_entries; ``` -**Track Growth:** +In the following example, the queries create a tracking table and log +cache size over time to identify growth trends: + ```sql --- Create size tracking table CREATE TABLE IF NOT EXISTS monitoring.cache_size_history ( timestamp TIMESTAMPTZ DEFAULT NOW(), entry_count BIGINT, total_bytes BIGINT ); - --- Log current size INSERT INTO monitoring.cache_size_history (entry_count, total_bytes) SELECT COUNT(*), SUM(result_size_bytes) FROM semantic_cache.cache_entries; --- View growth trend SELECT timestamp, entry_count, pg_size_pretty(total_bytes) as size, - entry_count - LAG(entry_count) OVER (ORDER BY timestamp) as entry_delta, - pg_size_pretty((total_bytes - LAG(total_bytes) OVER (ORDER BY timestamp))::BIGINT) as size_delta + entry_count - LAG(entry_count) OVER (ORDER BY timestamp) + as entry_delta, + pg_size_pretty((total_bytes - LAG(total_bytes) + OVER (ORDER BY timestamp))::BIGINT) as size_delta FROM monitoring.cache_size_history ORDER BY timestamp DESC LIMIT 20; ``` -### 3. Access Patterns +### Access Patterns + +The access pattern metrics help you understand which cache entries are +most valuable to your application. -Understand which entries are most valuable. +In the following example, the query identifies the most frequently +accessed cache entries: ```sql --- Most accessed entries SELECT id, LEFT(query_text, 60) as query_preview, @@ -104,9 +124,10 @@ ORDER BY access_count DESC LIMIT 20; ``` -**Access Distribution:** +In the following example, the query groups cache entries by access +frequency to show the distribution of cache usage: + ```sql --- Group entries by access frequency SELECT CASE WHEN access_count = 0 THEN '0 (Never)' @@ -123,12 +144,15 @@ GROUP BY 1 ORDER BY 1; ``` -### 4. Entry Age and Freshness +### Entry Age and Freshness + +The entry age metrics help you monitor how old cached entries are and +identify stale data. -Monitor how old cached entries are. +In the following example, the query groups cache entries by age to +show the distribution of entry freshness: ```sql --- Age distribution SELECT CASE WHEN age_minutes < 5 THEN '< 5 min' @@ -152,51 +176,71 @@ ORDER BY 1; ## Built-in Monitoring Views +The extension provides several built-in views for monitoring cache +performance and health. + ### cache_health -Real-time cache health metrics. +The `cache_health` view provides real-time cache health metrics. + +In the following example, the query retrieves the current cache health +status: ```sql SELECT * FROM semantic_cache.cache_health; ``` -Includes: -- Total entries and expired entries -- Total cache size -- Average access count -- Hit/miss statistics -- Hit rate percentage +The view includes: + +- the total entries and expired entries. +- the total cache size in megabytes. +- the average access count per entry. +- hit and miss statistics. +- the hit rate percentage. ### recent_cache_activity -Most recently accessed entries. +The `recent_cache_activity` view shows the most recently accessed cache +entries. + +In the following example, the query retrieves the ten most recently +accessed cache entries: ```sql SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; ``` -Shows: -- Query preview (first 80 chars) -- Access count -- Timestamps (created, last accessed, expires) -- Result size +The view shows: + +- a query preview with the first 80 characters. +- the access count for each entry. +- timestamps for creation, last access, and expiration. +- the result size in bytes. ### cache_by_tag -Entries grouped by tag. +The `cache_by_tag` view shows cache entries grouped by tag. + +In the following example, the query retrieves cache statistics grouped +by tag: ```sql SELECT * FROM semantic_cache.cache_by_tag; ``` -Useful for: -- Understanding cache composition -- Identifying which features use cache most -- Targeted invalidation planning +The view is useful for: + +- understanding cache composition by feature. +- identifying which features use the cache most. +- planning targeted invalidation strategies. ### cache_access_summary -Hourly access statistics with cost savings. +The `cache_access_summary` view provides hourly access statistics with +cost savings information. + +In the following example, the query retrieves hourly access statistics +for the last 24 hours: ```sql SELECT * FROM semantic_cache.cache_access_summary @@ -206,7 +250,11 @@ LIMIT 24; ### cost_savings_daily -Daily cost savings breakdown. +The `cost_savings_daily` view provides a daily breakdown of cost +savings from cache hits. + +In the following example, the query retrieves daily cost savings for +the last 30 days: ```sql SELECT * FROM semantic_cache.cost_savings_daily @@ -216,7 +264,11 @@ LIMIT 30; ### top_cached_queries -Top queries by cost savings. +The `top_cached_queries` view shows the queries that provide the +greatest cost savings. + +In the following example, the query retrieves the ten queries with the +highest cost savings: ```sql SELECT * FROM semantic_cache.top_cached_queries @@ -225,26 +277,33 @@ LIMIT 10; ## Performance Monitoring +The following sections describe how to monitor cache performance and +optimize query execution. + ### Query Performance -Track how fast cache lookups are. +The query performance metrics help you track how fast cache lookups +execute. + +In the following example, the timing is enabled and a cache lookup is +tested with a random embedding vector: ```sql --- Enable timing \timing on --- Test lookup speed SELECT * FROM semantic_cache.get_cached_result( - (SELECT array_agg(random()::float4)::text FROM generate_series(1, 1536)), + (SELECT array_agg(random()::float4)::text + FROM generate_series(1, 1536)), 0.95 ); - --- Expected: < 5ms ``` -**Benchmarking:** +Target performance is less than 5ms for most queries. + +In the following example, the benchmark code measures average cache +lookup time over 100 iterations: + ```sql --- Benchmark cache lookups DO $$ DECLARE start_time TIMESTAMPTZ; @@ -266,16 +325,20 @@ BEGIN end_time := clock_timestamp(); RAISE NOTICE 'Average lookup time: % ms', - ROUND((EXTRACT(MILLISECONDS FROM (end_time - start_time)) / 100)::NUMERIC, 2); + ROUND((EXTRACT(MILLISECONDS FROM (end_time - start_time)) + / 100)::NUMERIC, 2); END $$; ``` ### Index Performance -Monitor vector index effectiveness. +The index performance metrics help you monitor vector index +effectiveness and usage. + +In the following example, the query checks index usage statistics for +the semantic cache schema: ```sql --- Check index usage SELECT schemaname, tablename, @@ -289,16 +352,18 @@ WHERE schemaname = 'semantic_cache' ORDER BY idx_scan DESC; ``` -**Index Statistics:** +In the following example, the query retrieves detailed index +statistics including tuples per scan: + ```sql --- Detailed index info SELECT i.indexrelname as index_name, t.tablename as table_name, pg_size_pretty(pg_relation_size(i.indexrelid)) as index_size, idx_scan as scans, idx_tup_read as tuples_read, - ROUND(idx_tup_read::NUMERIC / NULLIF(idx_scan, 0), 2) as tuples_per_scan + ROUND(idx_tup_read::NUMERIC / NULLIF(idx_scan, 0), 2) + as tuples_per_scan FROM pg_stat_user_indexes i JOIN pg_stat_user_tables t ON i.relid = t.relid WHERE i.schemaname = 'semantic_cache'; @@ -306,8 +371,13 @@ WHERE i.schemaname = 'semantic_cache'; ### PostgreSQL Statistics +The PostgreSQL statistics views provide detailed information about +table and index operations. + +In the following example, the query retrieves table statistics for the +semantic cache schema: + ```sql --- Table statistics SELECT schemaname, tablename, @@ -326,10 +396,18 @@ WHERE schemaname = 'semantic_cache'; ## Alerting +The following sections describe how to set up automated alerts for +cache health monitoring. + ### Set Up Alerts +The alert function monitors cache health and returns warnings when +metrics fall outside acceptable ranges. + +In the following example, the function creates a monitoring alert +system that checks for common cache health issues: + ```sql --- Create alert function CREATE OR REPLACE FUNCTION monitoring.check_cache_alerts() RETURNS TABLE( alert_level TEXT, @@ -338,7 +416,6 @@ RETURNS TABLE( metric_value NUMERIC ) AS $$ BEGIN - -- Alert: Low hit rate RETURN QUERY SELECT 'WARNING'::TEXT, @@ -348,7 +425,6 @@ BEGIN FROM semantic_cache.cache_stats() WHERE hit_rate_percent < 60; - -- Alert: Cache too large RETURN QUERY SELECT 'WARNING'::TEXT, @@ -356,9 +432,8 @@ BEGIN 'Cache size exceeding 80% of limit'::TEXT, (SUM(result_size_bytes) / 1024 / 1024)::NUMERIC FROM semantic_cache.cache_entries - HAVING SUM(result_size_bytes) / 1024 / 1024 > 800; -- If max is 1000MB + HAVING SUM(result_size_bytes) / 1024 / 1024 > 800; - -- Alert: Too many expired entries RETURN QUERY SELECT 'INFO'::TEXT, @@ -367,9 +442,9 @@ BEGIN COUNT(*)::NUMERIC FROM semantic_cache.cache_entries WHERE expires_at <= NOW() - HAVING COUNT(*) > (SELECT COUNT(*) * 0.1 FROM semantic_cache.cache_entries); + HAVING COUNT(*) > (SELECT COUNT(*) * 0.1 + FROM semantic_cache.cache_entries); - -- Alert: No activity RETURN QUERY SELECT 'CRITICAL'::TEXT, @@ -378,33 +453,38 @@ BEGIN 0::NUMERIC FROM semantic_cache.cache_entries WHERE last_accessed_at < NOW() - INTERVAL '1 hour' - HAVING COUNT(*) = (SELECT COUNT(*) FROM semantic_cache.cache_entries); + HAVING COUNT(*) = (SELECT COUNT(*) + FROM semantic_cache.cache_entries); END; $$ LANGUAGE plpgsql; --- Check for alerts SELECT * FROM monitoring.check_cache_alerts(); ``` ### Schedule Alert Checks +You can use pg_cron to schedule regular alert checks and notifications. + +In the following example, the pg_cron schedule checks for cache alerts +every 15 minutes: + ```sql --- With pg_cron (if available) SELECT cron.schedule( 'cache-alerts', - '*/15 * * * *', -- Every 15 minutes + '*/15 * * * *', $$ DO $$ DECLARE alert RECORD; BEGIN - FOR alert IN SELECT * FROM monitoring.check_cache_alerts() LOOP + FOR alert IN + SELECT * FROM monitoring.check_cache_alerts() + LOOP RAISE WARNING '[%] %: % (value: %)', alert.alert_level, alert.alert_type, alert.message, alert.metric_value; - -- Add your notification logic here (email, Slack, etc.) END LOOP; END $$; $$ @@ -413,12 +493,18 @@ SELECT cron.schedule( ## Integration with Monitoring Tools -### Prometheus/Grafana +The following sections describe how to integrate cache metrics with +external monitoring tools. + +### Prometheus and Grafana + +You can export cache metrics in Prometheus format for visualization in +Grafana. -Export metrics in Prometheus format. +In the following example, the function exports cache statistics in +Prometheus text format: ```sql --- Create metrics export function CREATE OR REPLACE FUNCTION monitoring.prometheus_metrics() RETURNS TEXT AS $$ DECLARE @@ -427,32 +513,44 @@ DECLARE BEGIN SELECT * INTO stats FROM semantic_cache.cache_stats(); - result := result || '# HELP cache_entries_total Total number of cached entries' || E'\n'; + result := result || '# HELP cache_entries_total Total entries' + || E'\n'; result := result || '# TYPE cache_entries_total gauge' || E'\n'; - result := result || 'cache_entries_total ' || stats.total_entries || E'\n'; + result := result || 'cache_entries_total ' || stats.total_entries + || E'\n'; - result := result || '# HELP cache_hits_total Total cache hits' || E'\n'; + result := result || '# HELP cache_hits_total Total cache hits' + || E'\n'; result := result || '# TYPE cache_hits_total counter' || E'\n'; result := result || 'cache_hits_total ' || stats.total_hits || E'\n'; - result := result || '# HELP cache_misses_total Total cache misses' || E'\n'; + result := result || '# HELP cache_misses_total Total cache misses' + || E'\n'; result := result || '# TYPE cache_misses_total counter' || E'\n'; - result := result || 'cache_misses_total ' || stats.total_misses || E'\n'; + result := result || 'cache_misses_total ' || stats.total_misses + || E'\n'; - result := result || '# HELP cache_hit_rate Cache hit rate percentage' || E'\n'; + result := result || '# HELP cache_hit_rate Cache hit rate percent' + || E'\n'; result := result || '# TYPE cache_hit_rate gauge' || E'\n'; - result := result || 'cache_hit_rate ' || stats.hit_rate_percent || E'\n'; + result := result || 'cache_hit_rate ' || stats.hit_rate_percent + || E'\n'; RETURN result; END; $$ LANGUAGE plpgsql; --- Export metrics SELECT monitoring.prometheus_metrics(); ``` ### Application Logging +You can integrate cache metrics into your application logging and +monitoring infrastructure. + +In the following example, the Python code logs cache metrics to +application logs and optionally sends them to a metrics service: + ```python import psycopg2 import logging @@ -468,118 +566,149 @@ def log_cache_metrics(): stats = cur.fetchone() logger.info( - "Cache Stats - Entries: %d, Hits: %d, Misses: %d, Hit Rate: %.2f%%", + "Cache Stats - Entries: %d, Hits: %d, Misses: %d, " + + "Hit Rate: %.2f%%", stats[0], stats[1], stats[2], stats[3] ) - - # Also log to metrics service (DataDog, New Relic, etc.) - # metrics.gauge('cache.entries', stats[0]) - # metrics.counter('cache.hits', stats[1]) - # metrics.counter('cache.misses', stats[2]) - # metrics.gauge('cache.hit_rate', stats[3]) ``` ## Optimization Guidelines -### When Hit Rate is Low (< 60%) - -1. **Lower similarity threshold** - ```sql - -- Try 0.90 instead of 0.95 - SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90); - ``` - -2. **Check TTL settings** - ```sql - -- Entries expiring too quickly? - SELECT COUNT(*), AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) - FROM semantic_cache.cache_entries - WHERE expires_at IS NOT NULL; - ``` - -3. **Verify embedding quality** - ```sql - -- Look at similarity scores - SELECT - query_text, - (1 - (query_embedding <=> (SELECT query_embedding FROM semantic_cache.cache_entries LIMIT 1))) as similarity - FROM semantic_cache.cache_entries - ORDER BY similarity DESC - LIMIT 10; - ``` +The following sections provide guidelines for optimizing cache +performance based on common issues. + +### When Hit Rate is Low + +If your cache hit rate is below 60 percent, use the following +optimization strategies. + +In the following example, the similarity threshold is lowered to 0.90 +to allow more cache hits: + +```sql +SELECT * FROM semantic_cache.get_cached_result('[...]'::text, 0.90); +``` + +In the following example, the query checks if entries are expiring too +quickly by calculating the average TTL: + +```sql +SELECT COUNT(*), AVG(EXTRACT(EPOCH FROM (expires_at - created_at))) +FROM semantic_cache.cache_entries +WHERE expires_at IS NOT NULL; +``` + +In the following example, the query examines similarity scores to +verify embedding quality: + +```sql +SELECT + query_text, + (1 - (query_embedding <=> + (SELECT query_embedding + FROM semantic_cache.cache_entries + LIMIT 1))) as similarity +FROM semantic_cache.cache_entries +ORDER BY similarity DESC +LIMIT 10; +``` ### When Cache Size is Growing Too Fast -1. **Reduce TTL** - ```sql - -- Cache for shorter periods - UPDATE semantic_cache.cache_config - SET value = '1800' -- 30 minutes instead of 1 hour - WHERE key = 'default_ttl_seconds'; - ``` - -2. **Enable aggressive eviction** - ```sql - -- Lower max size - UPDATE semantic_cache.cache_config - SET value = '500' - WHERE key = 'max_cache_size_mb'; - - -- Run auto-eviction - SELECT semantic_cache.auto_evict(); - ``` - -3. **Remove low-value entries** - ```sql - -- Delete entries with 0 accesses older than 1 hour - DELETE FROM semantic_cache.cache_entries - WHERE access_count = 0 - AND created_at < NOW() - INTERVAL '1 hour'; - ``` - -### When Lookups are Slow (> 10ms) - -1. **Rebuild index with more lists** (for IVFFlat) - ```sql - DROP INDEX semantic_cache.idx_cache_entries_embedding; - CREATE INDEX idx_cache_entries_embedding - ON semantic_cache.cache_entries - USING ivfflat (query_embedding vector_cosine_ops) - WITH (lists = 1000); - ``` - -2. **Consider HNSW index** - ```sql - SELECT semantic_cache.set_index_type('hnsw'); - SELECT semantic_cache.rebuild_index(); - ``` - -3. **Increase work_mem** - ```sql - -- In postgresql.conf or session - SET work_mem = '512MB'; - ``` +If your cache is growing faster than expected, use the following +optimization strategies. + +In the following example, the TTL is reduced to 30 minutes to expire +entries more quickly: + +```sql +UPDATE semantic_cache.cache_config +SET value = '1800' +WHERE key = 'default_ttl_seconds'; +``` + +In the following example, the maximum cache size is reduced and +auto-eviction is triggered: + +```sql +UPDATE semantic_cache.cache_config +SET value = '500' +WHERE key = 'max_cache_size_mb'; + +SELECT semantic_cache.auto_evict(); +``` + +In the following example, entries with zero accesses that are older +than one hour are deleted: + +```sql +DELETE FROM semantic_cache.cache_entries +WHERE access_count = 0 + AND created_at < NOW() - INTERVAL '1 hour'; +``` + +### When Lookups are Slow + +If cache lookups are taking more than 10ms, use the following +optimization strategies. + +In the following example, the IVFFlat index is rebuilt with more lists +for better performance on larger caches: + +```sql +DROP INDEX semantic_cache.idx_cache_entries_embedding; +CREATE INDEX idx_cache_entries_embedding +ON semantic_cache.cache_entries +USING ivfflat (query_embedding vector_cosine_ops) +WITH (lists = 1000); +``` + +In the following example, the index type is switched to HNSW for +better query performance: + +```sql +SELECT semantic_cache.set_index_type('hnsw'); +SELECT semantic_cache.rebuild_index(); +``` + +In the following example, the work_mem setting is increased to provide +more memory for vector operations: + +```sql +SET work_mem = '512MB'; +``` ## Regular Maintenance Checklist -Daily: -- [ ] Check hit rate: `SELECT * FROM semantic_cache.cache_stats()` -- [ ] Review cache size: `SELECT * FROM semantic_cache.cache_health` -- [ ] Clear expired: `SELECT semantic_cache.evict_expired()` +The following checklist provides recommended maintenance tasks at +different intervals. -Weekly: -- [ ] Review top queries: `SELECT * FROM semantic_cache.recent_cache_activity` -- [ ] Check for alerts: `SELECT * FROM monitoring.check_cache_alerts()` -- [ ] Analyze tables: `ANALYZE semantic_cache.cache_entries` +Daily tasks include: -Monthly: -- [ ] Review configuration settings -- [ ] Optimize index if needed -- [ ] Archive old access logs -- [ ] Review cost savings: `SELECT * FROM semantic_cache.get_cost_savings(30)` +- checking the hit rate with `cache_stats` function. +- reviewing the cache size with `cache_health` view. +- clearing expired entries with `evict_expired` function. + +Weekly tasks include: + +- reviewing top queries with `recent_cache_activity` view. +- checking for alerts with `check_cache_alerts` function. +- analyzing tables with the ANALYZE command. + +Monthly tasks include: + +- reviewing configuration settings for optimization opportunities. +- optimizing the index if needed based on cache size. +- archiving old access logs to prevent table bloat. +- reviewing cost savings with `get_cost_savings` function. ## See Also -- [Functions Reference](functions/index.md) - All monitoring functions -- [Configuration](configuration.md) - Tuning parameters -- [Use Cases](use_cases.md) - Monitoring patterns in practice +The following resources provide additional information: + +- the [Functions Reference](functions/index.md) document describes all + monitoring functions. +- the [Configuration](configuration.md) document explains tuning + parameters. +- the [Use Cases](use_cases.md) document provides monitoring patterns + in practice. diff --git a/docs/performance.md b/docs/performance.md index edb0639..919aa1f 100644 --- a/docs/performance.md +++ b/docs/performance.md @@ -1,9 +1,13 @@ # Performance and Benchmarking -The extension is optimized for sub-millisecond cache lookups with minimal overhead. +The extension is optimized for sub-millisecond cache lookups with minimal +overhead. The following sections describe performance characteristics, +benchmarking results, and optimization strategies for production workloads. -- Lookup time is < 5ms for most queries with IVFFlat index. -- Scalability handles 100K+ cached entries efficiently. +The extension provides the following performance characteristics: + +- Lookup time is less than 5ms for most queries with an IVFFlat index. +- Scalability handles more than 100,000 cached entries efficiently. - Throughput reaches thousands of cache lookups per second. - Storage provides configurable cache size limits with automatic eviction. @@ -15,19 +19,21 @@ The extension is optimized for sub-millisecond cache lookups with minimal overhe ## Runtime Metrics -The following table shows typical performance metrics for common cache operations. +The following table shows typical performance metrics for common cache +operations: | Operation | Performance | Notes | |-----------|-------------|-------| -| Cache lookup | **< 5ms** | With optimized vector index | -| Cache insert | **< 10ms** | Including embedding storage | -| Eviction (1000 entries) | **< 50ms** | Efficient batch operations | -| Statistics query | **< 1ms** | Materialized views | -| Similarity search | **2-3ms avg** | IVFFlat/HNSW indexed | +| Cache lookup | < 5ms | With optimized vector index | +| Cache insert | < 10ms | Including embedding storage | +| Eviction (1000 entries) | < 50ms | Efficient batch operations | +| Statistics query | < 1ms | Materialized views | +| Similarity search | 2-3ms avg | IVFFlat/HNSW indexed | -### Expected Hit Rates +## Expected Hit Rates -Cache hit rates vary by workload type and query similarity patterns. +Cache hit rates vary by workload type and query similarity patterns. The +following table shows typical hit rates for different workload types: | Workload Type | Typical Hit Rate | |---------------|------------------| @@ -36,25 +42,30 @@ Cache hit rates vary by workload type and query similarity patterns. | Search systems | 50-70% | | Chatbot conversations | 45-65% | -### Memory Overhead +## Memory Overhead -The cache maintains a minimal memory footprint for typical workloads. +The cache maintains a minimal memory footprint for typical workloads. The +following list describes the memory requirements for cache entries: - Each cache entry requires approximately 1-2KB for metadata and indexes. -- Vector storage size depends on the embedding dimension (1536D requires approximately 6KB). +- Vector storage size depends on the embedding dimension. +- A 1536-dimension vector requires approximately 6KB of storage. - The total overhead remains minimal for typical workloads. ## Benchmarking -The extension includes a comprehensive benchmark suite for performance testing. +The extension includes a comprehensive benchmark suite for performance +testing. The benchmark suite exercises all major cache operations and +reports detailed timing information. -Use the following command to run the included benchmark suite: +In the following example, the `psql` command runs the included benchmark +suite: ```bash psql -U postgres -d your_database -f test/benchmark.sql ``` -**Expected Results:** +The benchmark produces results similar to the following output: ``` Operation | Count | Total Time | Avg Time @@ -65,6 +76,3 @@ Lookup (misses) | 100 | ~150ms | 1.5ms Evict LRU | 500 | ~25ms | 0.05ms ``` - - - diff --git a/docs/quick_start.md b/docs/quick_start.md index 5df92c2..707aa85 100644 --- a/docs/quick_start.md +++ b/docs/quick_start.md @@ -1,16 +1,17 @@ # Quick Start -The steps that follow are designed to get you started with semantic caching -quickly and easily. Before using pg_semantic_cache, you must install: +The steps that follow are designed to get you started with semantic +caching quickly and easily. Before using pg_semantic_cache, you must +install the following components: -- PostgreSQL 14, 15, 16, 17, or 18 -- the pgvector extension -- a C compiler (gcc or clang) -- PostgreSQL development headers +- PostgreSQL version 14, 15, 16, 17, or 18. +- The pgvector extension. +- A C compiler such as gcc or clang. +- PostgreSQL development headers. ## Installation -Use the following commands to build the extension from the Github +Use the following commands to build the extension from the GitHub repository: ```bash @@ -24,8 +25,8 @@ make sudo make install ``` -After building the extension, you need to install and create the extensions -you'll be using: +After building the extension, you need to install and create the +extensions you will be using: ```sql -- Install required extensions @@ -38,23 +39,25 @@ SELECT * FROM semantic_cache.cache_health; ### Using pg_semantic_cache -Use the following commands to add a result set to a cache, and then query the -cache with a similar query: +Use the following commands to add a result set to a cache, and then +query the cache with a similar query: + +In the following example, the `cache_query` function stores a query +result with its embedding, and the `get_cached_result` function +retrieves a semantically similar cached result: ```sql --- Cache a query result with its embedding SELECT semantic_cache.cache_query( query_text := 'What was our Q4 2024 revenue?', - query_embedding := '[0.123, 0.456, ...]'::text, -- From embedding model + query_embedding := '[0.123, 0.456, ...]'::text, result_data := '{"answer": "Q4 2024 revenue was $2.4M"}'::jsonb, - ttl_seconds := 1800, -- 30 minutes + ttl_seconds := 1800, tags := ARRAY['llm', 'revenue'] ); --- Retrieve with a semantically similar query SELECT * FROM semantic_cache.get_cached_result( - query_embedding := '[0.124, 0.455, ...]'::text, -- Slightly different - similarity_threshold := 0.95 -- 95% similarity required + query_embedding := '[0.124, 0.455, ...]'::text, + similarity_threshold := 0.95 ); ``` diff --git a/docs/security-audit.md b/docs/security-audit.md deleted file mode 100644 index 9b110fd..0000000 --- a/docs/security-audit.md +++ /dev/null @@ -1,349 +0,0 @@ -# Security Audit Report - pg_semantic_cache Extension - -**Date**: 2024-12-18 -**Version**: 0.1.0-beta3 -**Auditor**: Security Review Process - ---- - -## Executive Summary - -This security audit reviews the pg_semantic_cache PostgreSQL extension for common vulnerabilities including SQL injection, buffer overflows, input validation issues, and PostgreSQL-specific security concerns. - -**Overall Risk Level**: LOW-MEDIUM -**Critical Issues**: 1 (SQL Injection vulnerability) -**High Issues**: 0 -**Medium Issues**: 2 -**Low Issues**: 3 - ---- - -## Critical Issues - -### 1. SQL Injection Vulnerability in Query Construction - -**Location**: `pg_semantic_cache.c:365-373` (evict_lru), `pg_semantic_cache.c:400-408` (evict_lfu), `pg_semantic_cache.c:260-270` (get_cached_result) - -**Issue**: User-controlled integer is used in SQL query construction via `appendStringInfo` without proper parameterization. - -```c -// VULNERABLE CODE (lines 366-373) -appendStringInfo(&buf, - "DELETE FROM semantic_cache.cache_entries " - "WHERE id NOT IN (" - " SELECT id FROM semantic_cache.cache_entries " - " ORDER BY last_accessed_at DESC " - " LIMIT %d" // ← Integer format, but what if keep_count is manipulated? - ")", - keep_count); -``` - -**Risk**: While `%d` format specifier for integers provides some protection, the value is not properly validated before use. - -**Status**: ✅ **MITIGATED** - Input validation added (lines 357-363, 392-398) -```c -if (keep_count < 0) - elog(ERROR, "evict_lru: keep_count must be non-negative"); -``` - -**Remaining Concern**: No upper bound validation (could cause performance issues with very large values) - -**Recommendation**: -```c -if (keep_count < 0 || keep_count > 1000000) - elog(ERROR, "evict_lru: keep_count must be between 0 and 1000000"); -``` - ---- - -### 2. String Escaping Function - Potential Issues - -**Location**: `pg_semantic_cache.c:45-76` (pg_escape_string) - -**Issue**: Custom string escaping function instead of using PostgreSQL's built-in functions. - -```c -static char * -pg_escape_string(const char *str) -{ - size_t len = strlen(str); - char *result = palloc(len * 2 + 3); // Allocates enough space - // ... manual escaping logic -} -``` - -**Risk**: Custom escaping is error-prone. PostgreSQL provides `quote_literal_cstr()` for this purpose. - -**Status**: ⚠️ **NEEDS REVIEW** - -**Recommendation**: Replace with PostgreSQL's built-in functions: -```c -#include "utils/quote.h" -// Use quote_literal_cstr(str) instead of pg_escape_string(str) -``` - ---- - -## High Issues - -None identified. - ---- - -## Medium Issues - -### 3. Unbounded String Concatenation - -**Location**: `pg_semantic_cache.c:161-200` (cache_query) - -**Issue**: Large JSONB results are converted to strings and concatenated without size limits. - -```c -rstr = JsonbToCString(NULL, &result->root, VARSIZE(result)); -// No size check before using rstr -``` - -**Risk**: Very large JSONB documents could cause memory exhaustion. - -**Recommendation**: -```c -if (strlen(rstr) > 10 * 1024 * 1024) // 10MB limit - elog(ERROR, "Result data too large (max 10MB)"); -``` - -### 4. Missing NULL Checks in Helper Functions - -**Location**: `pg_semantic_cache.c:38-43` (execute_sql) - -**Issue**: No NULL check on input parameter. - -```c -static void execute_sql(const char *query) -{ - int ret = SPI_execute(query, false, 0); - // What if query is NULL? -} -``` - -**Recommendation**: -```c -static void execute_sql(const char *query) -{ - if (query == NULL) - elog(ERROR, "execute_sql: query is NULL"); - int ret = SPI_execute(query, false, 0); - if (ret < 0) - elog(ERROR, "SPI_execute failed: %d", ret); -} -``` - ---- - -## Low Issues - -### 5. Memory Leak Potential - -**Location**: `pg_semantic_cache.c:165-238` (cache_query) - -**Issue**: If error occurs after memory allocation, `pfree()` calls may be skipped. - -**Current Code**: -```c -qstr = text_to_cstring(query_text); -estr = text_to_cstring(emb_text); -rstr = JsonbToCString(NULL, &result->root, VARSIZE(result)); -// ... lots of code that could error ... -pfree(qstr); // Only freed at the end -``` - -**Status**: ✅ **LOW RISK** - PostgreSQL's memory context system will clean up on error. - -**Recommendation**: Use `PG_TRY/PG_CATCH` for explicit cleanup in critical paths. - -### 6. Integer Overflow in String Length Calculation - -**Location**: `pg_semantic_cache.c:194-200` - -**Issue**: `strlen(rstr)` result is cast to `int` without overflow check. - -```c -appendStringInfo(&buf, - // ... - "%d, %d, " // result_size_bytes - // ... - (int)strlen(rstr), ttl, ttl); -``` - -**Risk**: If `rstr` length exceeds INT_MAX, the cast will overflow. - -**Recommendation**: -```c -size_t result_len = strlen(rstr); -if (result_len > INT_MAX) - elog(ERROR, "Result data too large"); -int result_size = (int)result_len; -``` - -### 7. SPI Error Handling Inconsistency - -**Location**: Multiple locations - -**Issue**: Some SPI operations check return codes, others don't. - -**Examples**: -- ✅ Good: `pg_semantic_cache.c:217-218` checks `ret < 0` -- ⚠️ Inconsistent: `pg_semantic_cache.c:279` doesn't check SPI_connect result -- ⚠️ Inconsistent: `pg_semantic_cache.c:321` checks SELECT result but not all operations - -**Recommendation**: Standardize SPI error handling: -```c -if (SPI_connect() != SPI_OK_CONNECT) - elog(ERROR, "SPI_connect failed"); -// ... do work ... -if (SPI_finish() != SPI_OK_FINISH) - elog(WARNING, "SPI_finish failed"); -``` - ---- - -## Input Validation Summary - -### ✅ Well-Validated Inputs - -1. **evict_lru/evict_lfu**: Checks for NULL and negative values -2. **get_cost_savings**: Uses default for NULL days parameter -3. **cache_query**: Checks for NULL tags parameter - -### ⚠️ Needs Validation - -1. **Embedding vectors**: No dimension validation (assumes 1536) -2. **Similarity thresholds**: No range check (should be 0.0-1.0) -3. **TTL values**: No upper bound (could be set to extreme values) -4. **JSONB result size**: No size limit - ---- - -## Recommendations - -### Immediate Actions (P0) - -1. ✅ **Replace custom escaping** with PostgreSQL's `quote_literal_cstr()` -2. ✅ **Add upper bounds** to eviction function parameters -3. ✅ **Add result size limits** for cached data - -### Short-term (P1) - -4. **Validate embedding dimensions** against expected size -5. **Validate similarity thresholds** (0.0 - 1.0 range) -6. **Standardize SPI error handling** - -### Long-term (P2) - -7. **Add rate limiting** for cache writes -8. **Implement query allowlisting** for production use -9. **Add encryption** for sensitive cached data -10. **Add audit logging** for security events - ---- - -## Code Fixes - -### Fix 1: Replace Custom Escaping - -```c -// BEFORE -#include "utils/builtins.h" - -static char * -pg_escape_string(const char *str) -{ - // ... custom escaping logic ... -} - -// AFTER -#include "utils/builtins.h" -#include "utils/quote.h" - -// Remove pg_escape_string function entirely -// Use quote_literal_cstr() directly: -char *escaped = quote_literal_cstr(qstr); -``` - -### Fix 2: Add Input Validation Function - -```c -static void validate_similarity_threshold(float4 threshold) -{ - if (threshold < 0.0 || threshold > 1.0) - elog(ERROR, "Similarity threshold must be between 0.0 and 1.0"); -} - -static void validate_ttl(int32 ttl) -{ - if (ttl < 0) - elog(ERROR, "TTL must be non-negative"); - if (ttl > 86400 * 365) // 1 year max - elog(ERROR, "TTL exceeds maximum (1 year)"); -} - -static void validate_embedding_size(text *embedding_text) -{ - char *emb_str = text_to_cstring(embedding_text); - // Parse and count dimensions - // Ensure it matches expected vector size - pfree(emb_str); -} -``` - -### Fix 3: Standardized Error Handling - -```c -#define SPI_CONNECT_OR_ERROR() \ - do { \ - if (SPI_connect() != SPI_OK_CONNECT) \ - elog(ERROR, "%s: SPI_connect failed", __func__); \ - } while(0) - -#define SPI_FINISH_OR_WARN() \ - do { \ - if (SPI_finish() != SPI_OK_FINISH) \ - elog(WARNING, "%s: SPI_finish failed", __func__); \ - } while(0) -``` - ---- - -## Security Testing Checklist - -- [x] SQL injection attempts (integers, strings) -- [x] NULL input handling -- [x] Negative value handling -- [ ] Extremely large values (INT_MAX, LONG_MAX) -- [ ] Very long strings (> 1GB) -- [ ] Malformed embeddings -- [ ] Concurrent access testing -- [ ] Resource exhaustion testing -- [ ] Privilege escalation attempts - ---- - -## Conclusion - -The pg_semantic_cache extension has **reasonable security** for a prototype but requires **hardening for production use**. The main concerns are: - -1. Custom string escaping should be replaced with PostgreSQL built-ins -2. Input validation should be more comprehensive -3. Resource limits should be enforced - -**Recommended Actions Before Production**: -1. Implement all P0 fixes -2. Add comprehensive fuzz testing -3. Perform load testing with malicious inputs -4. Add security documentation for users -5. Consider third-party security audit - ---- - -**Next Review Date**: Before 1.0.0 release -**Status**: CONDITIONALLY APPROVED for development use -**Production Readiness**: BLOCKED pending P0 fixes diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index c5eeec5..e403928 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -1,12 +1,12 @@ # Troubleshooting Installation -The following lists some common issues encountered during installation, and -how to resolve the problems. +The Troubleshooting page lists some common issues encountered during +installation, and how to resolve the problems. ## pg_config not found -The build system needs pg_config to locate PostgreSQL installation paths. If -pg_config is not in your PATH, the build will fail. +The build system needs pg_config to locate PostgreSQL installation +paths. If pg_config is not in your PATH, the build will fail. ```bash # Find PostgreSQL installation @@ -21,8 +21,9 @@ PG_CONFIG=/path/to/pg_config make install ## Permission Denied During Installation -Installing extensions requires write access to PostgreSQL's system directories. -Use sudo for standard installations or specify a custom directory. +Installing an extension requires write access to PostgreSQL's system +directories. Use sudo for standard installations or specify a custom +directory. ```bash # Use sudo for system directories @@ -34,8 +35,9 @@ make install DESTDIR=/path/to/custom/location ## pgvector Not Found -pg_semantic_cache depends on pgvector and will fail to create if pgvector is -not installed. Install pgvector before installing pg_semantic_cache. +The pg_semantic_cache extension depends on pgvector and will fail to +create if pgvector is not installed. You must install pgvector before +installing pg_semantic_cache. ```sql -- Error: could not open extension control file @@ -52,8 +54,9 @@ sudo make install ## Extension Already Exists -When reinstalling or upgrading, PostgreSQL may report that the extension -already exists. Drop the existing extension before creating a new one. +When reinstalling or upgrading, PostgreSQL may report that the +extension already exists. Drop the existing extension before creating +a new one. ```sql -- If you're upgrading, drop the old version first @@ -64,12 +67,14 @@ CREATE EXTENSION pg_semantic_cache; ``` !!! warning "Data Loss Warning" - Dropping the extension will delete all cached data. Use `ALTER EXTENSION UPDATE` for upgrades when available. + Dropping the extension will delete all cached data. Use `ALTER + EXTENSION UPDATE` for upgrades when available. ## Compilation Errors -Compilation failures typically occur when PostgreSQL development headers are -missing. Install the appropriate development package for your platform. +Compilation failures typically occur when PostgreSQL development +headers are missing. Install the appropriate development package for +your platform. ```bash # Ensure development headers are installed diff --git a/docs/use_cases.md b/docs/use_cases.md index 2a9ecbf..5380a47 100644 --- a/docs/use_cases.md +++ b/docs/use_cases.md @@ -5,20 +5,23 @@ pg_semantic_cache extension in real-world applications. ## LLM and AI Applications -This section demonstrates how to use the pg_semantic_cache extension to -optimize costs and performance in LLM and AI-powered applications. +The following sections demonstrate how to use the pg_semantic_cache extension +to optimize costs and performance in LLM and AI-powered applications. ### RAG (Retrieval Augmented Generation) Caching -The RAG caching pattern addresses the challenge of expensive LLM API calls by -caching responses based on semantic similarity of user questions. +The RAG caching pattern addresses the challenge of expensive LLM API +calls by caching responses based on semantic similarity of user +questions. -LLM API calls typically cost between $0.02 and $0.05 per request, and users -often ask similar questions using different wording. The pg_semantic_cache -extension solves this problem by caching LLM responses with semantic matching. +LLM API calls typically cost between $0.02 and $0.05 per request, and +users often ask similar questions using different wording. The +pg_semantic_cache extension solves this problem by caching LLM +responses with semantic matching. -In the following example, the `SemanticLLMCache` class uses the OpenAI API to -generate embeddings and cache LLM responses based on semantic similarity. +In the following example, the `SemanticLLMCache` class uses the +OpenAI API to generate embeddings and cache LLM responses based on +semantic similarity. ```python import openai @@ -95,14 +98,15 @@ cache.ask_llm_cached("Show me Q4 revenue") # Cache hit! cache.ask_llm_cached("Q4 revenue please") # Cache hit! ``` -An organization processing 10,000 daily queries with an 80% cache hit rate -can save approximately $140 per day or $51,100 per year using this approach. +An organization processing 10,000 daily queries with an 80% cache hit +rate can save approximately $140 per day or $51,100 per year using this +approach. ### Chatbot Response Caching The chatbot response caching pattern optimizes conversational AI -applications by storing and reusing responses for semantically similar -user messages. +applications by storing and reusing responses for semantically +similar user messages. In the following example, the `ChatbotCache` class uses TypeScript to implement a caching layer for chatbot responses with configurable @@ -178,8 +182,8 @@ class ChatbotCache { ## Analytics and Reporting -This section demonstrates how to use the pg_semantic_cache extension to -improve performance of analytical queries and reporting workloads. +This section demonstrates how to use the pg_semantic_cache extension +to improve performance of analytical queries and reporting workloads. ### Dashboard Query Caching @@ -187,9 +191,9 @@ The dashboard query caching pattern reduces latency for expensive analytical queries that power business intelligence dashboards and reporting tools. -In the following example, the `app.get_sales_analytics` function uses -a deterministic embedding to cache analytics results for a configurable -TTL period. +In the following example, the `app.get_sales_analytics` function +uses a deterministic embedding to cache analytics results for a +configurable TTL period. ```sql -- Application caching wrapper for analytics @@ -271,11 +275,12 @@ SELECT app.get_sales_analytics( ### Time-Series Report Caching The time-series report caching pattern optimizes recurring reports by -adjusting cache TTL based on the temporal granularity of the data being -reported. +adjusting cache TTL based on the temporal granularity of the data +being reported. -In the following example, the `app.cached_time_series_report` function -uses different TTL values for daily, weekly, and monthly reports. +In the following example, the `app.cached_time_series_report` +function uses different TTL values for daily, weekly, and monthly +reports. ```sql -- Cache daily/weekly/monthly reports @@ -331,14 +336,15 @@ $$ LANGUAGE plpgsql; ## External API Results -This section demonstrates how to use the pg_semantic_cache extension to -reduce costs and latency when integrating with third-party external APIs. +This section demonstrates how to use the pg_semantic_cache extension +to reduce costs and latency when integrating with third-party +external APIs. ### Third-Party API Response Caching The external API caching pattern stores responses from expensive -third-party APIs such as weather services, geocoding providers, and stock -price feeds. +third-party APIs such as weather services, geocoding providers, and +stock price feeds. In the following example, the `APICache` class uses the sentence-transformers library to generate embeddings and cache API @@ -397,8 +403,9 @@ class APICache: return api_response ``` -The following examples demonstrate how to use the `APICache` class with -different external APIs using appropriate TTL values for each use case. +The following examples demonstrate how to use the `APICache` class +with different external APIs using appropriate TTL values for each +use case. ```python # Weather API @@ -434,18 +441,19 @@ def get_stock_price(symbol): ## Database Query Optimization -This section demonstrates how to use the pg_semantic_cache extension to -optimize expensive database queries and reduce computational overhead. +This section demonstrates how to use the pg_semantic_cache extension +to optimize expensive database queries and reduce computational +overhead. ### Expensive Join Caching -The expensive join caching pattern stores results from complex multi-table -joins to avoid repeated execution of resource-intensive database -operations. +The expensive join caching pattern stores results from complex +multi-table joins to avoid repeated execution of resource-intensive +database operations. -In the following example, the `app.get_customer_summary` function caches -the results of a complex customer data aggregation query with multiple -joins. +In the following example, the `app.get_customer_summary` function +caches the results of a complex customer data aggregation query with +multiple joins. ```sql -- Wrap expensive queries with semantic caching @@ -525,14 +533,15 @@ SELECT app.get_customer_summary('john'); ## Scheduled Maintenance -This section demonstrates how to implement automated maintenance routines -for the pg_semantic_cache extension to ensure optimal performance and -storage use. +This section demonstrates how to implement automated maintenance +routines for the pg_semantic_cache extension to ensure optimal +performance and storage use. ### Automatic Cache Cleanup -The automatic cache cleanup pattern uses scheduled maintenance functions -to evict expired entries and optimize cache storage on a regular basis. +The automatic cache cleanup pattern uses scheduled maintenance +functions to evict expired entries and optimize cache storage on a +regular basis. In the following example, the `semantic_cache.scheduled_maintenance` function performs multiple maintenance operations and returns timing @@ -591,12 +600,13 @@ SELECT * FROM semantic_cache.scheduled_maintenance(); ### Cache Warming -The cache warming pattern pre-populates the cache with common queries to -improve application performance during startup or after cache +The cache warming pattern pre-populates the cache with common queries +to improve application performance during startup or after cache invalidation. In the following example, the `app.warm_cache` function pre-caches -frequently accessed dashboard queries to reduce initial page load times. +frequently accessed dashboard queries to reduce initial page load +times. ```sql -- Warm cache with popular queries @@ -630,19 +640,19 @@ SELECT app.warm_cache(); ## Multi-Language Support -This section demonstrates how to use the pg_semantic_cache extension to -support caching across multiple languages using multilingual embedding -models. +This section demonstrates how to use the pg_semantic_cache extension +to support caching across multiple languages using multilingual +embedding models. ### Caching Across Languages The multilingual caching pattern enables cache hits across different -languages by using multilingual embedding models that map semantically -similar queries. +languages by using multilingual embedding models that map +semantically similar queries. In the following example, the `MultilingualCache` class uses the -multilingual mpnet model to cache queries across English, Spanish, French, -and Portuguese. +multilingual mpnet model to cache queries across English, Spanish, +French, and Portuguese. ```python from sentence_transformers import SentenceTransformer From a601bf409cefc7a51212569091ca79a3aac73fc1 Mon Sep 17 00:00:00 2001 From: Susan Douglas Date: Fri, 13 Mar 2026 08:54:11 -0400 Subject: [PATCH 12/12] Updates to doc files - ready for review --- README.md | 183 ++++++++++++++-------------- docs/logging.md | 318 ++++++++++++++++++++++++++++-------------------- 2 files changed, 282 insertions(+), 219 deletions(-) diff --git a/README.md b/README.md index 8dcda81..844a1d6 100644 --- a/README.md +++ b/README.md @@ -30,24 +30,28 @@ for semantically similar queries. ## Quick Start -The following steps walk you through installing and configuring the extension. +The following steps walk you through installing and configuring the +extension. 1. Install the required dependencies for your operating system. + In the following example, the commands install dependencies on + Ubuntu, Rocky Linux, or macOS: + ```bash - # Ubuntu/Debian - sudo apt-get install postgresql-16 postgresql-server-dev-16 postgresql-16-pgvector + sudo apt-get install postgresql-16 postgresql-server-dev-16 \ + postgresql-16-pgvector - # Rocky Linux/RHEL sudo dnf install postgresql16 postgresql16-devel postgresql16-contrib - # macOS (with Homebrew) brew install postgresql@16 - # Install pgvector separately ``` 2. Build and install the extension from source. + In the following example, the commands clone the repository, build + the extension, and install it: + ```bash git clone https://github.com/pgedge/pg_semantic_cache.git cd pg_semantic_cache @@ -58,152 +62,151 @@ The following steps walk you through installing and configuring the extension. 3. Enable the extension in your PostgreSQL database. + In the following example, the SQL commands create the required + extensions and initialize the cache schema: + ```sql - -- Connect to your database psql -U postgres -d your_database - -- Install required extensions CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS pg_semantic_cache; - -- Initialize the cache schema (run once per database) SELECT semantic_cache.init_schema(); - -- Verify installation SELECT * FROM semantic_cache.cache_stats(); ``` ### Configuration -All runtime settings can be configured through the cache configuration table. +All runtime settings can be configured through the cache configuration +table. -Configuration settings are stored in the `semantic_cache.cache_config` table. You can view and modify them directly: +Configuration settings are stored in the `semantic_cache.cache_config` +table. + +In the following example, the SQL commands view and modify +configuration settings: ```sql --- View all configuration SELECT * FROM semantic_cache.cache_config ORDER BY key; --- Update configuration (direct SQL) INSERT INTO semantic_cache.cache_config (key, value) VALUES ('max_cache_size_mb', '2000') ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value; --- Get specific config value -SELECT value FROM semantic_cache.cache_config WHERE key = 'eviction_policy'; +SELECT value FROM semantic_cache.cache_config +WHERE key = 'eviction_policy'; ``` -**Common Configuration Keys:** +The following table describes common configuration keys: + | Key | Example Value | Description | |-----|---------------|-------------| -| `max_cache_size_mb` | '1000' | Maximum cache size in megabytes | -| `default_ttl_seconds` | '3600' | Default TTL for cached entries | -| `eviction_policy` | 'lru' | Eviction policy: lru, lfu, or ttl | -| `similarity_threshold` | '0.95' | Default similarity threshold | +| max_cache_size_mb | 1000 | Maximum cache size in megabytes | +| default_ttl_seconds | 3600 | Default TTL for cached entries | +| eviction_policy | lru | Eviction policy | +| similarity_threshold | 0.95 | Default similarity threshold | ## Basic Usage -The following examples demonstrate the core workflow for storing, retrieving, -and monitoring cached query results. +The following examples demonstrate the core workflow for storing, +retrieving, and monitoring cached query results. -1. Store a query result with its vector embedding in the cache. +In the following example, the `cache_query` function stores a +completed orders query with a one-hour TTL and analytics tags: - In the following example, the `cache_query` function stores a completed - orders query with a one-hour TTL and analytics tags. - - ```sql - SELECT semantic_cache.cache_query( - query_text := 'SELECT * FROM orders WHERE status = ''completed''', - embedding := '[0.1, 0.2, 0.3, ...]'::text, -- From OpenAI, Cohere, etc. - result_data := '{"total": 150, "orders": [...]}'::jsonb, - ttl_seconds := 3600, -- 1 hour - tags := ARRAY['orders', 'analytics'] -- Optional tags - ); - -- Returns: cache_id (bigint) - ``` +```sql +SELECT semantic_cache.cache_query( + query_text := 'SELECT * FROM orders WHERE status = ''completed''', + embedding := '[0.1, 0.2, 0.3, ...]'::text, + result_data := '{"total": 150, "orders": [...]}'::jsonb, + ttl_seconds := 3600, + tags := ARRAY['orders', 'analytics'] +); +``` -2. Retrieve a cached result using semantic similarity search. +In the following example, the `get_cached_result` function searches +for cached results with at least 95 percent similarity to the query +embedding: - In the following example, the `get_cached_result` function searches for - cached results with at least 95% similarity to the query embedding. +```sql +SELECT * FROM semantic_cache.get_cached_result( + embedding := '[0.11, 0.19, 0.31, ...]'::text, + similarity_threshold := 0.95, + max_age_seconds := NULL +); +``` - ```sql - SELECT * FROM semantic_cache.get_cached_result( - embedding := '[0.11, 0.19, 0.31, ...]'::text, -- Similar query embedding - similarity_threshold := 0.95, -- 95% similarity required - max_age_seconds := NULL -- Any age (optional) - ); - -- Returns: (found boolean, result_data jsonb, similarity_score float4, age_seconds int) - ``` +The function returns a table with the following columns: - The function returns a table with the following columns: +``` + found | result_data | similarity_score | age_seconds +-------+----------------------------+------------------+------------- + true | {"total": 150, "orders"... | 0.973 | 245 +``` - ``` - found | result_data | similarity_score | age_seconds - -------+----------------------------+------------------+------------- - true | {"total": 150, "orders"... | 0.973 | 245 - ``` +In the following example, the queries retrieve comprehensive +statistics, health metrics, and recent activity for the semantic +cache: -3. Monitor cache performance using built-in statistics and health views. +```sql +SELECT * FROM semantic_cache.cache_stats(); - In the following example, the queries retrieve comprehensive statistics, - health metrics, and recent activity for the semantic cache. +SELECT * FROM semantic_cache.cache_health; - ```sql - -- Comprehensive statistics - SELECT * FROM semantic_cache.cache_stats(); +SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; +``` - -- Health overview (includes hit rate and more details) - SELECT * FROM semantic_cache.cache_health; - -- Recent cache activity - SELECT * FROM semantic_cache.recent_cache_activity LIMIT 10; - ``` +## Building the Documentation +Before building the documentation, install Python 3.8 or later and +pip. -## Building the Documentation +In the following example, the command installs documentation +dependencies: -Before building the documentation, install Python 3.8+ and pip. +```bash +pip install -r docs-requirements.txt +``` -1. Install dependencies: - ```bash - pip install -r docs-requirements.txt - ``` +In the following example, the command starts a local documentation +server: -2. Use the following command to review the documentation locally: - ```bash - mkdocs serve - ``` +```bash +mkdocs serve +``` - Then open http://127.0.0.1:8000 in your browser. +Open http://127.0.0.1:8000 in your browser to view the documentation. -3. To build a static site: - ```bash - mkdocs build - ``` +In the following example, the command builds a static documentation +site: - Documentation will added to the `site/` directory. +```bash +mkdocs build +``` ---- +Documentation will be added to the `site/` directory. -## Support & Resources +## Support and Resources To report an issue with this software, visit the -[GitHub Issues](https://github.com/pgEdge/pg_semantic_cache/issues) page. +[GitHub Issues](https://github.com/pgEdge/pg_semantic_cache/issues) +page. -Check the `examples/` directory for usage patterns and code samples; see -the `test/` directory for comprehensive testing examples. +Check the `examples/` directory for usage patterns and code samples. +See the `test/` directory for comprehensive testing examples. For more information, visit [docs.pgedge.com](https://docs.pgedge.com). ## Contributing -We welcome your project contributions; for more information, see +We welcome your project contributions. For more information, see [docs/development.md](docs/development.md). ---- - ## License -This project is licensed under the [PostgreSQL License](docs/LICENSE.md). +This project is licensed under the +[PostgreSQL License](docs/LICENSE.md). diff --git a/docs/logging.md b/docs/logging.md index bf65d2a..ba8caeb 100644 --- a/docs/logging.md +++ b/docs/logging.md @@ -1,80 +1,106 @@ -# Logging & Cost Tracking +# Logging and Cost Tracking -Track cache hits/misses and calculate cost savings from avoided LLM API calls. +This guide describes how to track cache hits and misses and calculate +cost savings from avoided LLM API calls. -## Quick Start +The following sections provide a quick introduction to logging cache +access and tracking cost savings. + +In the following example, the `log_cache_access` function logs cache +access events and retrieves cost savings reports: ```sql --- Log a cache miss (cost incurred) SELECT semantic_cache.log_cache_access('query_hash', false, NULL, 0.006); --- Log a cache hit (cost saved) SELECT semantic_cache.log_cache_access('query_hash', true, 0.95, 0.006); --- Get cost savings for last 7 days SELECT * FROM semantic_cache.get_cost_savings(7); --- View daily summary -SELECT * FROM semantic_cache.cost_savings_daily ORDER BY date DESC LIMIT 7; +SELECT * FROM semantic_cache.cost_savings_daily +ORDER BY date DESC +LIMIT 7; ``` ---- - ## Functions -### `log_cache_access()` +The following sections describe the functions available for logging +cache access and calculating cost savings. -Record a cache access event with cost information. +### log_cache_access + +The `log_cache_access` function records a cache access event with cost +information. + +In the following example, the function signature shows the required +parameters for logging cache access: ```sql SELECT semantic_cache.log_cache_access( - query_hash text, -- Unique identifier for the query (e.g., SHA-256 hash) - cache_hit boolean, -- true = hit, false = miss - similarity_score float4, -- Similarity score (0-1), NULL for misses - query_cost numeric -- Cost of the query in dollars (e.g., 0.006) + query_hash text, + cache_hit boolean, + similarity_score float4, + query_cost numeric ); ``` -**Examples:** +In the following example, the function logs a cache miss when the LLM +API must be called: + ```sql --- Log a cache miss (had to call LLM) SELECT semantic_cache.log_cache_access('abc123...', false, NULL, 0.008); +``` + +In the following example, the function logs a cache hit when a cached +result is returned: --- Log a cache hit (saved LLM call) +```sql SELECT semantic_cache.log_cache_access('def456...', true, 0.97, 0.008); ``` -### `get_cost_savings()` +### get_cost_savings + +The `get_cost_savings` function generates a cost savings report for a +specified time period. -Get cost savings report for a time period. +In the following example, the function signature shows the optional +days parameter defaulting to 30: ```sql SELECT * FROM semantic_cache.get_cost_savings( - days integer DEFAULT 30 -- Number of days to analyze + days integer DEFAULT 30 ); ``` -**Returns:** +The following table describes the columns returned by the function: | Column | Type | Description | |--------|------|-------------| | total_queries | bigint | Total number of queries | | cache_hits | bigint | Number of cache hits | | cache_misses | bigint | Number of cache misses | -| hit_rate | float4 | Hit rate percentage (0-100) | +| hit_rate | float4 | Hit rate percentage | | total_cost_saved | float8 | Total money saved | | avg_cost_per_hit | float8 | Average savings per hit | -| total_cost_if_no_cache | float8 | What it would have cost without cache | +| total_cost_if_no_cache | float8 | Cost without cache | + +In the following example, the function returns cost savings for the +last 30 days: -**Examples:** ```sql --- Last 30 days (default) SELECT * FROM semantic_cache.get_cost_savings(); +``` + +In the following example, the function returns cost savings for the +last 7 days: --- Last 7 days +```sql SELECT * FROM semantic_cache.get_cost_savings(7); +``` --- Formatted output +In the following example, the query formats the cost savings output +for display: + +```sql SELECT total_queries, cache_hits, @@ -84,13 +110,18 @@ SELECT FROM semantic_cache.get_cost_savings(30); ``` ---- - ## Views -### `cache_access_summary` +The following sections describe the views available for monitoring +cache access patterns and cost savings. + +### cache_access_summary + +The `cache_access_summary` view provides hourly cache access +statistics with cost savings information. -Hourly cache access statistics with cost savings. +In the following example, the query retrieves hourly statistics for +the last 24 hours: ```sql SELECT * FROM semantic_cache.cache_access_summary @@ -98,17 +129,21 @@ ORDER BY hour DESC LIMIT 24; ``` -**Columns:** -- `hour` - Hour timestamp -- `total_accesses` - Total accesses in that hour -- `hits` - Number of hits -- `misses` - Number of misses -- `hit_rate_pct` - Hit rate percentage -- `cost_saved` - Total cost saved +The view includes: + +- the hour timestamp. +- the total accesses in that hour. +- the number of hits and misses. +- the hit rate percentage. +- the total cost saved. + +### cost_savings_daily -### `cost_savings_daily` +The `cost_savings_daily` view provides a daily breakdown of cost +savings and query statistics. -Daily cost breakdown and savings analysis. +In the following example, the query retrieves daily cost savings for +the last 7 days: ```sql SELECT * FROM semantic_cache.cost_savings_daily @@ -116,37 +151,45 @@ ORDER BY date DESC LIMIT 7; ``` -**Columns:** -- `date` - Date -- `total_queries` - Total queries that day -- `cache_hits` - Number of hits -- `cache_misses` - Number of misses -- `hit_rate_pct` - Hit rate percentage -- `total_cost_saved` - Total cost saved -- `avg_cost_per_hit` - Average savings per hit +The view includes: -### `top_cached_queries` +- the date. +- the total queries for that day. +- the number of cache hits and misses. +- the hit rate percentage. +- the total cost saved. +- the average savings per hit. -Top queries ranked by total cost savings. +### top_cached_queries + +The `top_cached_queries` view ranks queries by total cost savings. + +In the following example, the query retrieves the top ten queries with +the highest cost savings: ```sql SELECT * FROM semantic_cache.top_cached_queries LIMIT 10; ``` -**Columns:** -- `query_hash` - Query identifier -- `hit_count` - Number of times served from cache -- `avg_similarity` - Average similarity score -- `total_cost_saved` - Total cost saved by this query -- `last_access` - Last access time +The view includes: ---- +- the query hash identifier. +- the number of times served from cache. +- the average similarity score. +- the total cost saved by this query. +- the last access time. ## Integration Examples +The following sections provide integration examples for Python and +Node.js applications. + ### Python with OpenAI +In the following example, the Python code integrates cache logging +with OpenAI API calls: + ```python import psycopg2 import openai @@ -159,37 +202,28 @@ def query_with_cache(query_text, embedding): cur = conn.cursor() query_hash = hashlib.sha256(query_text.encode()).hexdigest() - # Check cache cur.execute(""" SELECT * FROM semantic_cache.get_cached_result(%s, 0.95) """, (embedding,)) result = cur.fetchone() - if result and result[0]: # Cache HIT + if result and result[0]: cur.execute(""" SELECT semantic_cache.log_cache_access(%s, true, %s, 0.008) """, (query_hash, result[2])) conn.commit() return result[1] - - # Cache MISS - call API response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": query_text}] ) - - # Calculate cost usage = response.usage cost = (usage.prompt_tokens / 1000) * 0.03 + \ (usage.completion_tokens / 1000) * 0.06 - - # Cache result result_json = response.choices[0].message.content cur.execute(""" SELECT semantic_cache.cache_query(%s, %s, %s::jsonb, 3600) """, (query_text, embedding, result_json)) - - # Log miss cur.execute(""" SELECT semantic_cache.log_cache_access(%s, false, NULL, %s) """, (query_hash, cost)) @@ -200,6 +234,9 @@ def query_with_cache(query_text, embedding): ### Node.js with Anthropic +In the following example, the Node.js code integrates cache logging +with Anthropic API calls: + ```javascript const { Pool } = require('pg'); const Anthropic = require('@anthropic-ai/sdk'); @@ -210,42 +247,33 @@ const anthropic = new Anthropic(); async function queryWithCache(queryText, embedding) { const client = await pool.connect(); - const queryHash = crypto.createHash('sha256').update(queryText).digest('hex'); + const queryHash = crypto.createHash('sha256') + .update(queryText).digest('hex'); try { - // Check cache const cache = await client.query( 'SELECT * FROM semantic_cache.get_cached_result($1, 0.95)', [embedding] ); if (cache.rows[0]?.found) { - // Cache HIT await client.query( 'SELECT semantic_cache.log_cache_access($1, $2, $3, $4)', [queryHash, true, cache.rows[0].similarity_score, 0.008] ); return cache.rows[0].result_data; } - - // Cache MISS - call API const message = await anthropic.messages.create({ model: "claude-3-5-sonnet-20241022", max_tokens: 1024, messages: [{ role: "user", content: queryText }] }); - - // Calculate cost const cost = (message.usage.input_tokens / 1_000_000) * 3.00 + (message.usage.output_tokens / 1_000_000) * 15.00; - - // Cache result await client.query( 'SELECT semantic_cache.cache_query($1, $2, $3, 3600)', [queryText, embedding, JSON.stringify(message.content)] ); - - // Log miss await client.query( 'SELECT semantic_cache.log_cache_access($1, $2, $3, $4)', [queryHash, false, null, cost] @@ -258,23 +286,30 @@ async function queryWithCache(queryText, embedding) { } ``` ---- - ## Cost Calculation +The following sections describe how to calculate costs for logging +cache access. + ### Where Costs Come From -You provide the cost when calling `log_cache_access()`. Calculate it from your LLM API response: +You must calculate the cost from your LLM API response and provide it +when calling the `log_cache_access` function. + +In the following example, the Python code calculates the cost for an +OpenAI GPT-4 API call: ```python -# OpenAI GPT-4 example usage = response['usage'] -input_cost = (usage['prompt_tokens'] / 1000) * 0.03 # $0.03/1K tokens -output_cost = (usage['completion_tokens'] / 1000) * 0.06 # $0.06/1K tokens +input_cost = (usage['prompt_tokens'] / 1000) * 0.03 +output_cost = (usage['completion_tokens'] / 1000) * 0.06 total_cost = input_cost + output_cost ``` -### Current Pricing (Jan 2026) +### Current Pricing + +The following table shows current pricing for common LLM models as of +January 2026: | Model | Input (per 1M tokens) | Output (per 1M tokens) | |-------|----------------------|------------------------| @@ -283,55 +318,58 @@ total_cost = input_cost + output_cost | Claude 3.5 Sonnet | $3.00 | $15.00 | | Claude 3 Haiku | $0.25 | $1.25 | ---- - ## Monitoring Dashboard +The following sections describe how to create monitoring dashboards +for cache performance. + +In the following example, the query retrieves key cache metrics for a +monitoring dashboard: + ```sql SELECT - -- Last 24 hours (SELECT COUNT(*) FILTER (WHERE cache_hit = true) FROM semantic_cache.cache_access_log WHERE access_time >= NOW() - INTERVAL '24 hours') as hits_24h, - (SELECT ROUND(SUM(cost_saved)::numeric, 4) FROM semantic_cache.cache_access_log WHERE access_time >= NOW() - INTERVAL '24 hours') as saved_24h, - - -- All time (SELECT total_cost_saved FROM semantic_cache.cache_metadata WHERE id = 1) as saved_all_time, - - -- Current cache size (SELECT COUNT(*) FROM semantic_cache.cache_entries) as entries; ``` ---- - ## Maintenance +The following sections describe maintenance tasks for the cache access +log. + ### Manual Cleanup +In the following example, the commands perform manual cleanup of old +log entries: + ```sql --- Delete logs older than 30 days DELETE FROM semantic_cache.cache_access_log WHERE access_time < NOW() - INTERVAL '30 days'; --- Check table size -SELECT pg_size_pretty(pg_total_relation_size('semantic_cache.cache_access_log')); +SELECT pg_size_pretty( + pg_total_relation_size('semantic_cache.cache_access_log')); --- Reclaim space VACUUM semantic_cache.cache_access_log; ``` -### Automated Cleanup (pg_cron) +### Automated Cleanup + +You can use the pg_cron extension to schedule automated cleanup +tasks. + +In the following example, the pg_cron extension schedules daily +cleanup at 2 AM: ```sql --- Install pg_cron extension CREATE EXTENSION pg_cron; - --- Schedule daily cleanup at 2 AM SELECT cron.schedule( 'semantic-cache-log-cleanup', '0 2 * * *', @@ -340,13 +378,15 @@ SELECT cron.schedule( ); ``` ---- - ## Database Schema +The following sections describe the database schema for logging and +cost tracking. + ### Tables -**cache_metadata:** +The `cache_metadata` table tracks overall cache statistics: + ```sql id SERIAL PRIMARY KEY total_hits BIGINT DEFAULT 0 @@ -354,7 +394,8 @@ total_misses BIGINT DEFAULT 0 total_cost_saved NUMERIC(12,6) DEFAULT 0.0 ``` -**cache_access_log:** +The `cache_access_log` table records individual cache access events: + ```sql id BIGSERIAL PRIMARY KEY access_time TIMESTAMPTZ DEFAULT NOW() @@ -365,44 +406,65 @@ query_cost NUMERIC(10,6) cost_saved NUMERIC(10,6) ``` -Indexes: -- `idx_access_log_time` on `access_time` -- `idx_access_log_hash` on `query_hash` +The table includes the following indexes: ---- +- the `idx_access_log_time` index on `access_time`. +- the `idx_access_log_hash` index on `query_hash`. ## Troubleshooting +The following sections address common troubleshooting scenarios for +logging and cost tracking. + ### No data in reports +If reports show no data, use the following troubleshooting queries. + +The following query checks if logging is being performed: + ```sql --- Check if logging is happening SELECT COUNT(*) FROM semantic_cache.cache_access_log; +``` + +The following query checks the date range of logs: --- Check date range of logs +```sql SELECT MIN(access_time), MAX(access_time) FROM semantic_cache.cache_access_log; +``` --- Try longer time period +The following query tries a longer time period: + +```sql SELECT * FROM semantic_cache.get_cost_savings(365); ``` -### Costs showing as $0 +### Costs showing as zero + +Ensure you are passing actual costs to the `log_cache_access` +function. -Ensure you're passing actual costs to `log_cache_access()`: +In the following example, an incorrect call passes zero as the cost: ```sql --- Wrong: passing 0 SELECT semantic_cache.log_cache_access('hash', true, 0.95, 0); +``` + +In the following example, the correct call passes the actual cost: --- Correct: passing actual cost +```sql SELECT semantic_cache.log_cache_access('hash', true, 0.95, 0.008); ``` ### Storage growing too large +If the cache access log table is growing too large, archive old logs +before deleting them. + +In the following example, the commands archive and delete old log +entries: + ```sql --- Archive old logs before deleting CREATE TABLE semantic_cache.cache_access_log_archive AS SELECT * FROM semantic_cache.cache_access_log WHERE access_time < NOW() - INTERVAL '90 days'; @@ -413,15 +475,13 @@ WHERE access_time < NOW() - INTERVAL '90 days'; VACUUM semantic_cache.cache_access_log; ``` ---- - ## Performance -- **Overhead:** ~1-2ms per log entry -- **Storage:** ~100 bytes per log entry -- **Indexes:** Automatic on `access_time` and `query_hash` -- **Recommendation:** Archive logs older than 30-90 days +The logging system has the following performance characteristics: ---- +- The overhead is approximately 1 to 2ms per log entry. +- The storage requirement is approximately 100 bytes per log entry. +- The system creates automatic indexes on `access_time` and + `query_hash`. +- The recommendation is to archive logs older than 30 to 90 days. -For more information, see the main [README](https://github.com/pgEdge/pg_semantic_cache#readme) and [CHANGELOG](https://github.com/pgEdge/pg_semantic_cache/blob/main/CHANGELOG.md).