deepset-ai · fzowl · Dec 21, 2025 · Dec 21, 2025 · bilgeyucel · Dec 29, 2025
@@ -24,7 +24,9 @@ toc: true
 
 - [Installation](#installation)
 - [Usage](#usage)
+- [Supported Models](#supported-models)
 - [Example](#example)
+- [Multimodal Embeddings](#multimodal-embeddings)
 - [Contextualized Embeddings Example](#contextualized-embeddings-example)
 
 [Voyage AI](https://voyageai.com/)'s embedding and ranking models are state-of-the-art in retrieval accuracy. The integration supports the following models:
@@ -35,6 +37,36 @@ toc: true
 
 For the complete list of available models, see the [Embeddings Documentation](https://docs.voyageai.com/embeddings/) and [Contextualized Chunk Embeddings](https://docs.voyageai.com/docs/contextualized-chunk-embeddings).
 
+## Supported Models
+
+### Text Embedding Models
+
+| Model | Description | Dimensions |
+|-------|-------------|------------|
+| `voyage-3.5` | Latest general-purpose embedding model | 1024 |
+| `voyage-3.5-lite` | Efficient model with lower latency | 1024 |
+| `voyage-3-large` | High-capacity embedding model | 1024 |
+| `voyage-3` | High-performance general-purpose model | 1024 |
+| `voyage-code-3` | Optimized for code retrieval | 1024 |
+| `voyage-finance-2` | Optimized for financial documents | 1024 |
+| `voyage-law-2` | Optimized for legal documents | 1024 |
+| `voyage-2` | Proven general-purpose model | 1024 |
+| `voyage-large-2` | Larger proven model | 1536 |
+
+### Multimodal Embedding Models
+
+| Model | Description | Dimensions | Modalities |
+|-------|-------------|------------|------------|
+| `voyage-multimodal-3` | Multimodal embedding model | 1024 | Text, Images |
+| `voyage-multimodal-3.5` | Multimodal embedding model (preview) | 256, 512, 1024, 2048 | Text, Images, Video |
+
+### Reranker Models
+
+| Model | Description |
+|-------|-------------|
+| `rerank-2` | High-accuracy reranker model |
+| `rerank-2-lite` | Efficient reranker with lower latency |
+
 ## Installation
 
 ```bash
@@ -147,6 +179,98 @@ print("The top search result is:")
 print(top_result)
 ```
 
+## Multimodal Embeddings
+
+Voyage AI's `voyage-multimodal-3.5` model transforms unstructured data from multiple modalities (text, images, video) into a shared vector space. This enables mixed-media document retrieval and cross-modal semantic search.
+
+### Features
+
+- **Multiple modalities**: Supports text, images, and video in a single input
+- **Variable dimensions**: Output dimensions of 256, 512, 1024 (default), or 2048
+- **Interleaved content**: Mix text, images, and video in single inputs
+- **No preprocessing required**: Process documents with embedded images directly
+
+### Limits
+
+- Images: Max 20MB, 16 million pixels
+- Video: Max 20MB
+- Context: 32,000 tokens
+- Token counting: 560 image pixels = 1 token, 1120 video pixels = 1 token
+
+### Multimodal API Example
+
+The multimodal model uses a different API endpoint (`/v1/multimodalembeddings`):
+
+```python
+import os
+import voyageai
+from PIL import Image
+
+# Initialize client (uses VOYAGE_API_KEY environment variable)
+client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+
+# Text-only embedding
+result = client.multimodal_embed(
+    inputs=[["Your text here"]],
+    model="voyage-multimodal-3.5"
+)
+
+# Text + Image embedding
+image = Image.open("document.jpg")
+result = client.multimodal_embed(
+    inputs=[["Caption or context", image]],
+    model="voyage-multimodal-3.5",
+    output_dimension=1024  # Optional: 256, 512, 1024, or 2048
+)
+
+print(f"Dimensions: {len(result.embeddings[0])}")
+print(f"Tokens used: {result.total_tokens}")
+```
+
+### Video Embedding Example
+
+Video inputs require the `voyageai.video_utils` module. Use `optimize_video` to fit videos within the 32K token context:
+
+```python
+import os
+import voyageai
+from voyageai.video_utils import optimize_video
+
+client = voyageai.Client(api_key=os.environ.get("VOYAGE_API_KEY"))
+
+# Load and optimize video (videos can be large in tokens)
+with open("video.mp4", "rb") as f:
+    video_bytes = f.read()
+
+# Optimize to fit within token budget
+optimized_video = optimize_video(
+    video_bytes,
+    model="voyage-multimodal-3.5",
+    max_video_tokens=5000  # Limit tokens used by video
+)
+print(f"Optimized: {optimized_video.num_frames} frames, ~{optimized_video.estimated_num_tokens} tokens")
+
+# Embed video (optionally with text context)
+result = client.multimodal_embed(
+    inputs=[[optimized_video]],
+    model="voyage-multimodal-3.5"
+)
+
+print(f"Dimensions: {len(result.embeddings[0])}")
+print(f"Tokens used: {result.total_tokens}")
+```
+
+### Use Cases
+
+- Mixed-media document retrieval (PDFs, slides with images)
+- Image-text similarity search
+- Video content retrieval and search
+- Cross-modal semantic search
+
+For more information, see the [Multimodal Embeddings Documentation](https://docs.voyageai.com/docs/multimodal-embeddings).
+
+> **Note:** The `voyage-multimodal-3.5` model is currently in preview. Video input requires `voyageai` SDK version 0.3.6 or later.
+
 ## Contextualized Embeddings Example
 
 The `voyage-context-3` model enables contextualized chunk embeddings, which preserve relationships between document chunks for better retrieval accuracy. Documents with the same `source_id` are embedded together in context: