diff --git a/docs/SIE Blog/SIE_choosing_vdb.md b/docs/SIE Blog/SIE_choosing_vdb.md new file mode 100644 index 00000000..5e61c237 --- /dev/null +++ b/docs/SIE Blog/SIE_choosing_vdb.md @@ -0,0 +1,221 @@ +# A Practical Guide for Choosing a Vector Database: Considerations and Trade-Offs + +**Not sure which vector DB fits your architecture?** +Get a free 15-min technical consultation → [let's chat](https://getdemo.superlinked.com/?utm_source=vdb_table_article) + +Choosing a vector database for large-scale AI or search applications is less about comparing feature checklists and more about understanding your system’s architecture and constraints. The goal is to pick a solution that aligns with your use case’s scale, performance needs, and operational limits. + +For many production systems the vector database is only half of the story. The inference layer that encodes queries, runs rerankers and extracts structured context has an equally large operational footprint. SIE, the Superlinked Inference Engine, is an inference orchestration layer that exposes three primitives, encode, score and extract, and is designed to work with any vector DB to simplify hybrid indexing, reranking and large-model serving. + +If you’re exploring what might work best for your use case or want to discuss different architecture choices, you can book a short technical chat using [this link](https://getdemo.superlinked.com/?utm_source=vdb_table_article). + +## An overview of key factors to compare when selecting a vector database + + +| **Dimension** | **Key Considerations** | **Trade-Offs / Recommendations** | +| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Prototype → Production** | - In-process vs. standalone deployment
- Ephemeral vs. durable storage | - Use embedded/in-memory DBs for prototyping; migrate to managed or self-hosted clusters for production.
- Ephemeral (fast but volatile) vs. durable (persistent, reliable). | +| **Workload Type** | - Write-heavy vs. read-heavy access patterns
- Hybrid workloads | - Write-heavy: need async indexing, buffering, or real-time insert support.
- Read-heavy: pre-built indexes (HNSW, IVF) offer speed at higher memory cost.
- Hybrid: mix mutable "fresh" and static "main" indexes. | +| **Memory vs. Disk** | - In-memory vs. disk-backed indexing
- Sharding and scaling
- Metadata and payload size | - In-memory = fastest but costly and limited.
- Disk-based = larger scale, slower but persistent.
- Hybrid (memory + disk) balances both.
- Store only embeddings in vector DB; offload large documents elsewhere. | +| **Deployment Model** | - Fully managed service vs. self-hosted
- In-process vs. standalone server | - Managed: minimal ops, faster deployment, higher cost & less control.
- Self-hosted: full control, cheaper at scale, higher ops burden.
- Start embedded → move to networked as system scales. | +| **Tenant Model** | - Single-tenant vs. multi-tenant architecture | - Single-tenant: simpler, faster.
- Multi-tenant: cost-efficient but adds isolation and scaling complexity.
- Use namespaces/collections for isolation if needed. | +| **Query Features** | - Metadata filters
- Hybrid (dense + sparse) search
- Multi-vector or mixture-of-encoders
- Specialized queries (geo, facets) | - Strong filtering support is critical for scalability.
- Hybrid search merges semantics + keywords.
- Multi-vector support or mixture-of-encoders simplifies multi-modal search.
- Few DBs support geo/faceted search natively. | +| **Indexing Strategy** | - ANN vs. brute-force
- HNSW, IVF, PQ, LSH variants
- Index rebuild costs | - ANN offers massive speed-up with small recall trade-off.
- Brute-force only for small datasets or accuracy-critical cases.
- Evaluate on latency-recall curve, not index name.
- Index rebuilds can be expensive - plan for them. | +| **Operational Costs** | - Expensive ops: index rebuilds, bulk inserts, unindexed filters, strict consistency
- Cheap ops: flat inserts, ANN queries, buffered writes, lazy deletes | - Avoid frequent rebuilds and unindexed filters.
- Use async writes and lazy deletes.
- ANN queries are efficient; design updates to be batched. | +| **Decision Factors** | - Scale & latency goals
- Operational capacity
- Required query features
- Acceptable trade-offs | - Focus on fit to architecture and constraints, not feature lists.
- No universal "best" DB - choose based on workload, ops tolerance, and cost. | + + +## From Prototype to Production Scale + +> **Planning a migration from prototype to production?** +> Get a quick architecture review → **[Book here](https://getdemo.superlinked.com/?utm_source=vdb_table_article)** + +When starting out, you might use an in-process or embedded vector store for quick prototyping. An in-process library (running inside your application) is simple to set up and offers full control over your data in development. This works well for single-user scenarios or small datasets, where you can load vectors into memory and iterate rapidly. However, as you move to production scale, the requirements change: + +- **Deployment Model:** Production deployments often demand a standalone or distributed database service rather than an embedded library. Fully managed services handle infrastructure scaling, replication, and maintenance for you, which accelerates deployment and reduces DevOps burden. You focus on using the database, not running it. In contrast, self-hosting gives complete control (important for data privacy or custom tuning) but means you manage servers, updates, and scaling yourself. The trade-off comes down to ease against control. + +- **Ephemeral vs. Durable:** During prototyping you might tolerate an ephemeral in-memory index. This can be faster since it avoids disk I/O, but it’s not suitable for production where persistence and recovery are required. For large-scale applications, ensure the vector DB supports disk-backed storage so indexes and data persist across restarts. + +The same prototype to production migration applies to inference. Local encode calls are fine at the start, but they become hard to manage when you need multiple models, dedicated rerankers and predictable p99 latency. An inference layer such as SIE centralises model lifecycle management, batching, memory-safe multi-model serving and hot reload. That lets you run multi-output encoders and dedicated rerankers in production without rewriting your retrieval path. + +In summary, use lightweight embedded solutions for early development. As you scale, plan for a robust deployment: either a self-managed cluster tuned for your needs or a managed cloud service that offloads maintenance. + +--- + +## Write-Heavy vs. Read-Heavy Workloads + +> **Unsure how to structure mutable vs. static indexes?** +> Get a free technical consult → **[Book here](https://getdemo.superlinked.com/?utm_source=vdb_table_article)** + +Consider your application’s access patterns. Is it ingesting vectors constantly (high write throughput), or mainly querying existing vectors (read-heavy)? Different vector database architectures handle writes and reads differently: + +- **High Write Throughput:** If you need to index new vectors continuously, look for systems optimized for fast inserts and updates. Some indexes can handle incremental additions but with caveats (e.g. HNSW insert performance degradation over time). Many databases separate write/read paths by buffering new vectors and merging later. + +- **Read-Heavy Patterns:** If your dataset is mostly static, you can use heavier, more advanced index structures to accelerate reads. This adds memory overhead but delivers very low-latency queries at scale. + +- **Balancing Both:** Many real-time applications use a hybrid: a mutable index for recent data and a static ANN index for older data. Queries hit both, ensuring freshness + performance. + +On the inference side, decide whether reranking and extraction will be online or offline. A practical hybrid approach is to do multi-output encoding at ingest, fuse dense and sparse signals in the vector DB, and perform lightweight online reranking only over a small candidate set. Use encode and score to split work between batched background tasks and low-cost online scoring so you avoid expensive reranking on large candidate sets. + +In short, match indexing strategy to your workload profile to avoid unnecessary latency or reindexing overhead. + +--- + +## Memory vs. Disk: Index Storage and Sharding + +Another fundamental consideration is whether your vector index can reside fully in memory or must be disk-backed: + +- **In-Memory Indexes:** Deliver the fastest latencies but are limited by RAM scale and cost. They may require heavy sharding for very large datasets. + +- **Disk-Based Indexes:** Scale beyond RAM limits and provide persistence, though with higher latency. Modern approaches (DiskANN, SPANN) optimize disk access for speed. + +- **Hybrid Models:** Many databases mix both keeping coarse structures in RAM while storing vectors on disk. + +- **Payload Size:** Large metadata or documents stored inside the vector DB increase memory/disk footprint. Often it's better to store embeddings in the DB and content elsewhere. + +Do not forget model memory. Large embedding models and cross-encoder rerankers require GPUs or significant host RAM. Plan node sizes for both DB and inference hosts so neither side becomes the bottleneck. Production inference platforms should offer memory-safe multi-model serving with pre-load eviction and safe drain semantics so you can host multiple models on constrained GPU capacity without OOMing critical traffic or disrupting live requests. + +Choose based on your data scale and memory constraints to avoid bottlenecks or unnecessary complexity. + +--- + +## Managed Services vs. Self-Hosted Operations + +Operational considerations can be as important as raw performance. Vector databases come in both fully-managed cloud services and self-hosted software packages and even in-between (like enterprise appliances or cloud-managed open source). Your choice will affect development speed, cost structure, and maintenance work: + +- **Fully Managed Services:** Cloud-hosted vector DB offerings where the provider runs the infrastructure. They handle scaling, replication, upgrades, and often provide high-level APIs. Ideal when you want to integrate semantic search quickly without building ML ops expertise. The trade-off is less customization and potentially higher long-term cost. + +- **Self-Hosted (Open Source or Enterprise):** Running your own vector database (or using an open-source library) gives maximum flexibility and control, at the cost of operational complexity and DevOps investment. + +- **In-Process Libraries vs. Standalone Servers:** Embedded libraries are great for prototyping or single-application usage with no network overhead. Standalone services are better for multi-client architectures or horizontal scaling. + +The inference layer mirrors these tradeoffs. A managed inference service simplifies ops but may limit model choice. SIE is OSS-friendly and designed to run in your environment so teams can self-host multi-model inference, or integrate SIE into managed platform stacks if they prefer an external service. SIE's encode, score and extract primitives make both deployment models practical by enabling precomputed multi-output encodes for hybrid indexing in managed environments and colocated encoders and dedicated rerankers when teams self-host. + +Ultimately, decide based on your priorities and resources. If speed to market and minimal ops are paramount, lean towards a managed service. If customizability, cost control at scale, or data sovereignty are top concerns, be prepared to self-host and invest in the necessary infrastructure work. + +--- + +## Single-Tenant Simplicity vs. Multi-Tenant Architecture + +Consider whether your application serves one dataset or many isolated datasets (tenants). Multi-tenancy is common in SaaS platforms: a single system must handle vectors for multiple clients or domains, keeping their data separate. Your choice of vector DB should align with how you plan to isolate and organize data: + +- **Single-Tenant (Simplicity):** One large corpus or a few related ones, in one or a few indexes. Straightforward to manage and typically higher per-query performance. + +- **Multi-Tenant Support:** A single cluster serving many isolated datasets. Enables cost sharing but adds complexity around isolation, quotas, indexing strategies, and scaling. + +In summary, design for multi-tenancy only if you need to. A vector DB that shines for one big dataset may not handle thousands of tiny ones efficiently, and vice versa. + +--- + +## Search Features and Query Functionality + +Not all vector databases offer the same query capabilities beyond basic nearest neighbor search. Consider what search functionality your application requires: + +- **Metadata Filters:** Essential for combining similarity with constraints like category, time, or price. Strong pre-filtering support is critical for performance at scale. + +- **Hybrid Search (Dense + Sparse):** Combining semantic similarity with keyword or sparse scoring is important when some terms must match exactly. + +- **Multi-Vector per Document or Mixture-of-Encoders:** Multi-vector support helps for multi-modal content; mixture-of-encoders can also embed multiple signals into a single vector. + +- **Geospatial and Other Specialized Queries:** Check for native support if you need geo-distance, faceting, or analytical-style queries. + +List the query types you expect and confirm the DB can support them efficiently. + +**Practical SIE + DB pattern:** Below is a compact pattern you can use with any vector DB. It shows the typical flow: use SIE to encode the query, generate candidates from the DB, and then rerank with a cross-encoder via SIE. + +```python +from sie_sdk import SIEClient, Item +from qdrant_client import QdrantClient + +sie = SIEClient("http://localhost:8080") +qdrant = QdrantClient(url="http://localhost:6333") + +# encode query with SIE +q_vec = sie.encode("BAAI/bge-m3", Item(text="How does hybrid search improve RAG systems?"), output_types=["dense"]) + +# dense search in the vector DB +hits = qdrant.search(collection_name="docs", query_vector=q_vec.dense, limit=100) + +# rerank candidates with SIE +candidate_texts = [h.payload["text"] for h in hits] +scores = sie.score(model="cross-encoder/ms-marco-MiniLM-L-6-v2", query="How does hybrid search improve RAG systems?", documents=candidate_texts, top_k=10) + +# present top results +ranked = sorted(zip(hits, scores), key=lambda x: x[1], reverse=True)[:10] +for hit, score in ranked: + print(f"{score:.3f}\t{hit.payload.get('title','')}\t{hit.payload.get('text')[:180]}") +``` + +--- + +## Indexing Strategies vs. Query Performance + +![](../assets/use_cases/choosing_vdb/accuracy-query.png) + +At the heart of any vector database is the index type it uses for nearest neighbor search. You’ll encounter terms like brute-force (flat) search, HNSW graphs, IVF, PQ, LSH, and others. Rather than focusing on names, focus on how index choice affects query latency and recall: + +- **Brute-Force vs. Approximate:** Brute-force delivers exact results but does not scale. ANN trades a bit of accuracy for orders-of-magnitude faster queries and is standard at scale. + +- **Index Types:** Different structures (HNSW, IVF, PQ, DiskANN, etc.) have different strengths but all live on a latency/recall trade-off curve. The important question is whether the DB can hit your targets on that curve. + +- **Guaranteed Recall:** For cases demanding 100 percent recall, a common pattern is ANN for candidates plus exact reranking on a smaller subset. + +Using an inference score endpoint for reranking keeps heavy compute off the vector DB and ensures consistent reranking across datasets. + +- **Build and Maintenance Costs:** Some indexes build slowly or handle deletions poorly, which affects how often you can re-embed or restructure data. + +Do not choose a vector DB just because it advertises a specific index type. Choose based on its ability to meet your latency and recall goals at your scale. + +--- + +## Operational Cost: Expensive vs. Cheap Operations + +Finally, it’s useful to know which operations or features will cost the most in terms of performance or complexity: + +### Expensive Operations + +- Index rebuilds or major reconfiguration +- Large bulk insertions into complex ANN structures +- Filterable search on unindexed metadata +- Very high-dimensional vectors or large payloads +- Strict, real-time consistency guarantees + +### Cheaper Operations + +- Flat (brute-force) inserts into an unindexed collection +- Approximate k-NN queries on a built index +- Adding new vectors to a write buffer +- Lazy deletes with deferred cleanup + +**Inference cost bullets:** + +- Cross-encoder reranking at high QPS +- Hosting many 7-8B models on GPUs +- Frequent model churn and hot-reloads that require repeated CI evals. + +**Inference cost savings:** + +- ANN candidate generation plus batched reranking via an inference layer is cheaper than running rerankers over very large candidate sets +- Batched multi-output encode calls at ingest reduce online costs for hybrid indexing. + +Knowing this helps you design a system that leans on cheap operations in the hot path and schedules expensive ones carefully. + +--- + +## Conclusion + +Selecting a vector database is a strategic decision that hinges on your specific needs and constraints. Instead of asking “which product has the most features?”, ask how each option fits your scenario in terms of architecture: + +- Data volume and scale +- Latency and recall requirements +- Write/read patterns +- Operational model and team capacity +- Feature requirements (filters, hybrid search, multi-tenancy) +- Which trade-offs you are willing to accept + +By focusing on these architectural and practical considerations, you move beyond marketing checklists and choose a vector database that will serve your application well in the long run. No single option is the best in all scenarios; the best vector database is the one that fits your constraints and makes your engineers’ lives easier while delivering the performance your application and users demand. + +Want help evaluating DB + inference architectures? Book a demo to see SIE (Superlinked Inference Engine) integrated with a vector DB on your data, or request a technical review to discuss self-hosted and managed inference patterns. + +> **Need help evaluating vector DB architectures for your use case?** +> Get a technical review from Superlinked → **[Book here](https://getdemo.superlinked.com/?utm_source=vdb_table_article)** diff --git a/docs/SIE Blog/embeddings_on_browser.md b/docs/SIE Blog/embeddings_on_browser.md new file mode 100644 index 00000000..4a3043f4 --- /dev/null +++ b/docs/SIE Blog/embeddings_on_browser.md @@ -0,0 +1,527 @@ +# Vector Embeddings in the browser + + + +![Visual Summary of our Tutorial](../assets/use_cases/embeddings_on_browser/embeddings-browser-animation.gif) + +--- +## Vector Embeddings, just for specialists? + +Let's say you want to build an app that assesses the similarity of content using vector embeddings. You know a little about what you'll need: first, obviously, a way of creating vector embeddings, maybe also some retrieval augmented generation. But how do you operationalize your idea into a real-world application? Don't you require a substantial hardware setup or expensive cloud APIs? Even if you had the requisite backend resources, who's going to develop and configure them? Don't you also need highly specialized machine learning engineers or data scientists even to get started? Don't you have to at least know Python? + +Happily, the answer to all of these concerns is No. + +**You can start building AI apps without having to learn a new programming language or adopt an entirely new set of skills**. + +You don't require high-end equipment, or powerful GPUs. You _don't_ need ML and data science experts. Thanks to pre-trained machine learning models, **you can create an intuitive component that generates and compares vector embeddings right within your browser, on a local machine, tailored to your data**. You also don't require library installations or complex configurations for end-users. You don't have to know Python; you can do it directly in TypeScript. And you can start immediately. + +The following tutorial in creating a small-scale AI application demonstrates just how straightforward and efficient the process can be. Though our component is a very specific use case, you can apply its basic approach to operationalizing vector embeddings for all kinds of practical applications. + +Intrigued? Ready to start building? + +Note for production users: Running embeddings and rerankers entirely in the browser is a great way to prototype. For production workloads or for models that produce multiple outputs (for example dense plus sparse encodings), teams often run inference in a server-side layer. SIE (Superlinked Inference Engine) is an inference orchestration layer that exposes `encode`, `score`, and `extract` primitives and is designed to work with client-side components or any vector database when you need a production inference surface. + +## An app that generates, compares, and visualizes vector embeddings in your browser! + +Our component takes input content, produces vector embeddings from it, assesses its parts - in our case, sentences - and provides a user-friendly visual display of the results. And you can build it right within your web browser. + +In our tutorial, we will take some user input text, split it into sentences, and derive vector embeddings for each sentence using TensorFlow.js. To assess the quality of our embeddings, we will generate a similarity matrix mapping the distance between vectors as a colorful heatmap. Our component enables this by managing all the necessary state and UI logic. + +Let's take a closer look at our component's parts. + +## Specific parts of our application + +1. We import all necessary dependencies: React, Material-UI components, TensorFlow.js, and D3 (for color interpolation). +2. Our code defines a React functional component that generates sentence embeddings and visualizes their similarity matrix in a user interface. +3. We declare various state variables using the **`useState`** hook, in order to manage user input, loading states, and results. +4. The **`handleSimilarityMatrix`** function toggles the display of the similarity matrix, and calculates it when necessary. +5. The **`handleGenerateEmbedding`** function is responsible for starting the sentence embedding generation process. It splits the input sentences into individual sentences and triggers the **`embeddingGenerator`** function. +6. The **`calculateSimilarityMatrix`** is marked as a *memoized* function using the **`useCallback`** hook. It calculates the similarity matrix based on sentence embeddings. +7. The **`embeddingGenerator`** is an asynchronous function that loads the Universal Sentence Encoder model and generates sentence embeddings. +8. We use the **`useEffect`** hook to render the similarity matrix as a colorful canvas when **`similarityMatrix`** changes. +9. The component's return statement defines the user interface, including input fields, buttons, and result displays. +10. The user input section includes a text area where the user can input sentences. +11. The embeddings output section displays the generated embeddings. +12. We provide two buttons. One generates the embeddings, and the other shows/hides the similarity matrix. +13. The code handles loading and model-loaded states, displaying loading indicators or model-loaded messages. +14. The similarity matrix section displays the colorful similarity matrix as a canvas when the user chooses to show it. + + +## Our encoder + +The [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf) is a pre-trained machine learning model built on the transformer architecture. It creates context-aware representations for each word in a sentence, using the attention mechanism - i.e., carefully considering the order and identity of all other words. The Encoder employs element-wise summation to combine these word representations into a fixed-length sentence vector. To normalize these vectors, the Encoder then divides them by the square root of the sentence length - to prevent shorter sentences from dominating solely due to their brevity. + +The Encoder takes sentences or paragraphs of text as input, and outputs vectors that effectively capture the meaning of the text. This lets us assess vector similarity (i.e., distance) - a result you can use in a wide variety of natural language processing (NLP) tasks, including ours. + +### Encoder, Lite + +For our application, we'll utilize a scaled-down and faster 'Lite' variant of the full model. The Lite model maintains strong performance while demanding less computational power, making it ideal for deployment in client-side code, mobile devices, or even directly within web browsers. And because the Lite variant doesn't require any kind of complex installation or a dedicated GPU, it's more accessible to a broader range of users. + +### Why a pre-trained model + +The rationale behind pre-trained models is straightforward. Most NLP projects in research and industry contexts only have access to relatively small training datasets. It's not feasible, then, to use data-hungry deep learning models. And annotating more supervised training data is often prohibitively expensive. Here, **pre-trained models can fill the data gap**. + +Many NLP projects employ pre-trained word embeddings like word2vec or GloVe, which transform individual words into vectors. However, recent developments have shown that, on many tasks, **pre-trained sentence-level embeddings excel at capturing higher level semantics** than word embeddings can. The Universal Sentence Encoder's fixed-length vector embeddings are extremely effective for computing semantic similarity between sentences, with high scores in various semantic textual similarity benchmarks. + +Though our Encoder's sentence embeddings are pre-trained, they can also be fine-tuned for specific tasks, even when there isn't much task-specific training data. (If we needed, we could even make the encoder more versatile, supporting _multiple_ downstream tasks, by training it with multi-task learning.) + +If your production system needs features that the browser cannot provide, such as multi-output encodes, model hot-reload, or memory-safe multi-model serving for larger models and rerankers, SIE provides a server-side inference surface that implements `encode`, `score`, and `extract` APIs. You can keep the client-side UI and call SIE for production-quality encodes and reranking. This lets you prototype in the browser and graduate to a server-side inference layer without rewriting your application logic. + +Okay, let's get started, using TypeScript. + +## Our step-by-step tutorial + +### Import modules +```tsx +import React, { FC, useState, useEffect, useCallback } from 'react'; +import { + Box, + Grid, + Typography, + TextField, + Paper, + Button, + CircularProgress, +} from "@mui/material"; + +import '@tensorflow/tfjs-backend-cpu'; +import '@tensorflow/tfjs-backend-webgl'; + +import * as use from '@tensorflow-models/universal-sentence-encoder'; +import * as tf from '@tensorflow/tfjs-core'; +import { interpolateGreens } from 'd3-scale-chromatic'; +``` + +### State variables to manage user input, loading state, and results + +We use the **`useState`** hook in a React functional component to manage user input, track loading states of machine learning models, and update the user interface with the results, including the similarity matrix visualization. + +```tsx +// State variables to manage user input, loading state, and results + +// sentences - stores user input + const [sentences, setSentences] = useState(''); + +// sentencesList - stores sentences split into array + const [sentencesList, setSentencesList] = useState([]); + +// embeddings - stores generated embeddings + const [embeddings, setEmbeddings] = useState(''); + +// modelLoading - tracks model loading state + const [modelLoading, setModelLoading] = useState(false); + +// modelComputing - tracks model computing state + const [modelComputing, setModelComputing] = useState(false); + +// showSimilarityMatrix - controls matrix visibility + const [showSimilarityMatrix, setShowSimilarityMatrix] = useState(false); + +// embeddingModel - stores sentence embeddings + const [embeddingModel, setEmbeddingModel] = useState(tf.tensor2d([[1, 2], [3, 4]])); + +// canvasSize - size of canvas for matrix + const [canvasSize, setCanvasSize] = useState(0); + +// similarityMatrix - stores similarity matrix + const [similarityMatrix, setSimilarityMatrix] = useState(null); +``` + +### Function to toggle the display of the similarity matrix + +The **`handleSimilarityMatrix`** function is called in response to user input, toggling the display of a UI similarity matrix - by changing the **`showSimilarityMatrix`** state variable. If the matrix was previously shown, the **`handleSimilarityMatrix`** hides it by setting it to **`null`**. If the matrix wasn't shown, the **`handleSimilarityMatrix`** calculates the matrix and sets it to display in the UI. + +```tsx +// Toggles display of similarity matrix + const handleSimilarityMatrix = () => { + // Toggle showSimilarityMatrix state + setShowSimilarityMatrix(!showSimilarityMatrix); + + // If showing matrix, clear it + if (showSimilarityMatrix) { + setSimilarityMatrix(null); + + // Else calculate and set matrix + } else { + const calculatedMatrix = calculateSimilarityMatrix(embeddingModel, sentencesList); + setSimilarityMatrix(calculatedMatrix); + } + }; +``` + +### Function to generate sentence embeddings and populate state + +The **`handleGenerateEmbedding`** function, called when a user clicks the "Generate Embedding" button, initiates the process of generating sentence embeddings. It sets the **`modelComputing`** state variable to **`true`** to indicate that the model is working, splits the user's input into individual sentences, updates the **`sentencesList`** state variable with these sentences, and then calls the **`embeddingGenerator`** function to start generating embeddings based on the individual sentences. + +```tsx +// Generate embeddings for input sentences + const handleGenerateEmbedding = async () => { + // Set model as computing + setModelComputing(true); + + // Split input into individual sentences + const individualSentences = sentences.split('.').filter(sentence => sentence.trim() !== ''); + + // Save individual sentences to state + setSentencesList(individualSentences); + + // Generate embeddings + await embeddingGenerator(individualSentences); + }; +``` + +### Function to calculate the similarity matrix for sentence embeddings + +The **`calculateSimilarityMatrix`** function computes a similarity matrix for a set of sentences by comparing the embeddings of each sentence with all other sentence embeddings. The matrix contains similarity scores for all possible sentence pairs. You can use it to perform further visualization and analysis. + +This function is memoized using the **`useCallback`** hook, which ensures that its behavior will remain consistent across renders unless its dependencies change. + +```tsx + // Calculates similarity matrix for sentence embeddings + const calculateSimilarityMatrix = useCallback( + + // Embeddings and sentences arrays + (embeddings: tf.Tensor2D, sentences: string[]) => { + + // Matrix to store scores + const matrix = []; + + // Loop through each sentence + for (let i = 0; i < sentences.length; i++) { + + // Row to store scores for sentence i + const row = []; + + // Loop through each other sentence + for (let j = 0; j < sentences.length; j++) { + + // Get embeddings for sentences + const sentenceI = tf.slice(embeddings, [i, 0], [1]); + const sentenceJ = tf.slice(embeddings, [j, 0], [1]); + + const sentenceITranspose = false; + const sentenceJTranspose = true; + + // Calculate similarity score + const score = tf.matMul(sentenceI, sentenceJ, sentenceITranspose, sentenceJTranspose).dataSync(); + + // Add score to row + row.push(score[0]); + } + + // Add row to matrix + matrix.push(row); + } + + // Return final matrix + return matrix; + }, + [] + ); +``` + +### Function to generate sentence embeddings using the Universal Sentence Encoder + +The **`embeddingGenerator`** function is called when the user clicks a "Generate Embedding" button. It loads the Universal Sentence Encoder model, generates sentence embeddings for a list of sentences, and updates the component's state with the results. It also handles potential errors. + +```tsx + // Generate embeddings using Universal Sentence Encoder (Cer., et al., 2018) + const embeddingGenerator = useCallback(async (sentencesList: string[]) => { + + // Only run if model is not already loading + if (!modelLoading) { + try { + + // Set model as loading + setModelLoading(true); + + // Load model + const model = await use.load(); + + // Array to store embeddings + const sentenceEmbeddingArray: number[][] = []; + + // Generate embeddings + const embeddings = await model.embed(sentencesList); + + // Process each sentence + for (let i = 0; i < sentencesList.length; i++) { + + // Get embedding for sentence + const sentenceI = tf.slice(embeddings, [i, 0], [1]); + + // Add to array + sentenceEmbeddingArray.push(Array.from(sentenceI.dataSync())); + } + + // Update embeddings state + setEmbeddings(JSON.stringify(sentenceEmbeddingArray)); + + // Update model state + setEmbeddingModel(embeddings); + + // Reset loading states + setModelLoading(false); + setModelComputing(false); + } catch (error) { + console.error('Error loading model or generating embeddings:', error); + + // Handle errors + setModelLoading(false); + setModelComputing(false); + } + } + }, [modelLoading]); +``` + +### useEffect hook to render the similarity matrix as a colorful canvas + +**`useEffect`** is triggered when the **`similarityMatrix`** or **`canvasSize`** changes. **`useEffect`** draws a similarity matrix on an HTML canvas element. The matrix is represented as a grid of colored cells, with each color (hue) determined by the similarity value among sentences. The resulting visualization is a dynamic part of the user interface. + +```tsx +// Render similarity matrix as colored canvas + useEffect(() => { + + // If matrix exists + if (similarityMatrix) { + + // Get canvas element + const canvas = document.querySelector('#similarity-matrix') as HTMLCanvasElement; + + // Set fixed canvas size + setCanvasSize(250); + + // Set canvas dimensions + canvas.width = canvasSize; + canvas.height = canvasSize; + + // Get canvas context + const ctx = canvas.getContext('2d'); + + // If context available + if (ctx) { + + // Calculate cell size + const cellSize = canvasSize / similarityMatrix.length; + + // Loop through matrix + for (let i = 0; i < similarityMatrix.length; i++) { + for (let j = 0; j < similarityMatrix[i].length; j++) { + + // Set cell color based on value + ctx.fillStyle = interpolateGreens(similarityMatrix[i][j]); + + // Draw cell + ctx.fillRect(j * cellSize, i * cellSize, cellSize, cellSize); + } + } + } + } + }, [similarityMatrix, canvasSize]); +``` + +### User Input Section + +This code represents UI fields where users can input multiple sentences. It includes a label, a multiline text input field, and React state management to control and update the input, storing user-entered sentences in the **`sentences`** state variable for further processing in the component. + +```tsx +{/* User Input Section */} + + + // Heading + Encode Sentences + + // Multiline text input + setSentences(e.target.value)} + /> + +``` + +### Embeddings Output Section + +The UI embeddings output section displays the embeddings stored in the **`embeddings`** state variable, including a label, and a multiline text output field. React state management lets you control and update the displayed content. + + +```tsx + {/* Embeddings Output Section */} + + + // Heading + Embeddings + + // Multiline text field to display embeddings + + +``` + +### Generate Embedding Button + +The following code represents a raised, solid button in the UI that triggers the **`handleGenerateEmbedding`** function to initiate the embedding generation process. The generate embedding button is initially disabled if there are no input sentences (**`!sentences`**) or if the model is currently loading (**`modelLoading`**). + +```tsx + {/* Generate Embedding Button */} + + + +``` + +### Model Indicator + +This code deploys the values of the **`modelComputing`** and **`modelLoading`** state variables to control what's displayed in the user interface. If **`modelComputing`** and **`modelLoading`** are **`true`**, a loading indicator is displayed. If **`modelLoading`** is **`false`**, then the model is already loaded and we display a message indicating this. This conditional rendering shows the user either a loading indicator or a model loaded message based on the status of model loading and computing. + +```tsx + {/* Display model loading or loaded message */} + {modelComputing ? ( + modelLoading ? ( + + // Show loading indicator + + + + Loading the model... + + + ) : ( + + // Show model loaded message + + + + Model Loaded + + + + + ) + ) : null} +``` + +### Similarity Matrix + +The following code displays the similarity matrix in the user interface if the **`showSimilarityMatrix`** state variable is **`true`**. This section of the UI includes a title, "Similarity Matrix," and a canvas element for rendering the matrix. If **`false`**, the similarity matrix is hidden. + +```tsx +{/* Similarity Matrix Section */} + {showSimilarityMatrix ? ( + + + + + + + Similarity Matrix + + + + + + + + + + ) : null} + + ); +}; +``` + +Optional server-side inference: If you want to run heavier models, support multi-output encodes, or centralise reranking, you can call a server-side inference API from your client. SIE provides a simple HTTP client surface. For example, from TypeScript you can POST a query to SIE's `encode` endpoint and receive a dense vector, then call your vector DB for candidate retrieval and call SIE's `score` endpoint to rerank. This is optional and does not replace the browser-based tutorial above. + +```ts +// TypeScript: minimal SIE integration example (illustrative) +async function encodeWithSIE(query: string) { + const resp = await fetch('https://sie.example.com/encode', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ model: 'sentence-transformers/all-mpnet-base-v2', text: query, output_types: ['dense'] }) + }); + return (await resp.json()).dense; +} + +async function scoreWithSIE(query: string, docs: string[]) { + const resp = await fetch('https://sie.example.com/score', { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ model: 'cross-encoder/ms-marco-MiniLM-L-6-v2', query, documents: docs }) + }); + return resp.json(); +} +``` + +## The test drive: functionality & embedding quality + +Before we launch our intuitive semantic search application into production, we should test it. Let's check its functionality, and the quality of our model's vector embeddings. + +Functionality is easy. We just run and test it. Checking embedding quality is a bit more complex. We are dealing with arrays of 512 elements. How do we gauge their effectiveness? + +Here is where our **similarity matrix** comes into play. We employ the dot product between vectors for each pair of sentences to discern their proximity or dissimilarity. To illustrate this, let's take two random pages from Wikipedia, each containing different paragraphs. These two pages will provide us with a total of seven sentences for comparison. + +1) [The quick brown fox jumps over the lazy dog](https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog) + +2) [Los Angeles Herald](https://en.wikipedia.org/wiki/Los_Angeles_Herald) + +### Paragraph 1 input + +> "The quick brown fox jumps over the lazy dog" is an English-language pangram – a sentence that contains all the letters of the alphabet at least once. The phrase is commonly used for touch-typing practice, testing typewriters and computer keyboards, displaying examples of fonts, and other applications involving text where the use of all letters in the alphabet is desired. +> + +### Paragraph 2 input + +> The Los Angeles Herald or the Evening Herald was a newspaper published in Los Angeles in the late 19th and early 20th centuries. Founded in 1873 by Charles A. Storke, the newspaper was acquired by William Randolph Hearst in 1931. It merged with the Los Angeles Express and became an evening newspaper known as the Los Angeles Herald-Express. A 1962 combination with Hearst's morning Los Angeles Examiner resulted in its final incarnation as the evening Los Angeles Herald-Examiner. +> + +When we input these sentences to our model and generate the similarity matrix, we can observe some remarkable patterns. + +![Similarity Matrix for seven sentences from two documents](../assets/use_cases/embeddings_on_browser/embeddings-browser-similarity-matrix.png) +(Note: the 7x7 matrix represents seven sentences; Paragraph 2's second sentence breaks at the "A." of "Charles A. Storke." The third sentence begins with "Storke.") + +Our similarity matrix uses color hue to illustrate that same-paragraph sentence pairs are more similar (darker green) than different-paragraph sentence pairs (lighter green). The darker the hue of green, the more similar the vectors representing the sentences are - i.e., the closer they are in semantic meaning. For example, pairing Paragraph 1's first sentence ("The quick brown fox...") and second sentence ("The phrase is commonly...") displays as medium green squares - [1,2] and [2,1]. So does pairing Paragraph 2's first ("The Los Angeles Herald...") and second ("Founded in 1873...") - [3,4] and [4,3]. The darkest green squares represent the dot product values of identical pairs - [1,1], [2,2] [3,3], and so on. + +![Numbered sentence pairs in similarity matrix](../assets/use_cases/embeddings_on_browser/embeddings-browser-numbered-similarity-matrix.png) + +As a result, each paragraph's same-paragraph sentence pairs form their own notably darker regions within the larger matrix above. Conversely, different-paragraph sentence pairs are less similar, and therefore display as lighter green squares. For example, pairings of Paragraph 1's first sentence [1] and Paragraph 2's first sentence [3] are distinctively lighter green (i.e., more distant in meaning) - [1,3] and [3,1], and lie outside our two same-paragraph sentence pair regions. + + +## A real-world-ready semantic search application + +And that's it! + +You can now build our low cost, intuitive, ready-to-deploy, in-browser vector embedding generator and visualizer in your own browser, and use it for your own real-world applications. + +This is just one example of the kind of AI apps any developer can build, using pre-trained models configured with TypeScript, and without any cloud models, expensive hardware, or specialized engineering knowledge. + + +## Contributors + +- [Rod Rivera](http://twitter.com/rorcde) +- [Robert Turner, editor](https://robertturner.co/copyedit) + +If you need higher performance, multi-output encodes, or production reranking and extraction, consider a server-side inference layer such as SIE (Superlinked Inference Engine). It provides `encode`, `score`, and `extract` primitives, supports multi-model serving, and lets you move from a browser prototype to a production inference surface without rewriting your application logic.