diff --git a/notebooks/advanced_techniques/agentic_video_search.ipynb b/notebooks/advanced_techniques/agentic_video_search.ipynb new file mode 100644 index 0000000..6c4c50d --- /dev/null +++ b/notebooks/advanced_techniques/agentic_video_search.ipynb @@ -0,0 +1,1008 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3743e5aa-5a08-4b37-a545-4e7461e14468", + "metadata": {}, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/agentic_video_search.ipynb)\n", + "\n", + "[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/company/blog/technical/agentic-video-search/?utm_campaign=devrel&utm_source=cross-post&utm_medium=organic_social&utm_content=https%3A%2F%2Fgithub.com%2Fmongodb-developer%2FGenAI-Showcase&utm_term=apoorva.joshi)" + ] + }, + { + "cell_type": "markdown", + "id": "becc445d-5a51-4a8a-abeb-ceab0b54c167", + "metadata": {}, + "source": [ + "# Building an Agentic Video Search System using Voyage AI and MongoDB" + ] + }, + { + "cell_type": "markdown", + "id": "8beabeb0-f231-4e56-9af3-3178be3c92cf", + "metadata": {}, + "source": [ + "## Step 1: Install required packages\n", + "\n", + "- **voyageai**: Voyage AI's Python SDK\n", + "- **pymongo**: MongoDB's Python driver\n", + "- **anthropic**: Anthropic's Python SDK\n", + "- **huggingface_hub**: Python library for interacting with the Hugging Face Hub\n", + "- **ffmpeg-python**: Python wrapper for `ffmpeg`\n", + "- **tqdm**: Python library to display progress bars for loops" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "e9d5d1c4-614b-42de-9e15-c02146e8be89", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -qU voyageai==0.3.7 pymongo==4.15.5 anthropic==0.75.0 huggingface-hub==1.2.3 ffmpeg-python==0.2.0 tqdm==4.67.1" + ] + }, + { + "cell_type": "markdown", + "id": "d4803c8c-eb59-44aa-9306-5e49088add6b", + "metadata": {}, + "source": [ + "You'll also need to install the `ffmpeg` binary itself. To do this, run the following commands from the terminal and note the path to the `ffmpeg` installation:\n", + "\n", + "#### MacOS\n", + "\n", + "```\n", + "brew install ffmpeg\n", + "```\n", + "\n", + "#### Linux\n", + "\n", + "```\n", + "sudo apt-get install ffmpeg\n", + "```\n", + "\n", + "#### Windows\n", + "* Download the executable from [ffmpeg.org](https://ffmpeg.org/download.html#build-windows)\n", + "* Extract the downloaded zip file\n", + "* Note the path to the `bin` folder" + ] + }, + { + "cell_type": "markdown", + "id": "aafea857-7c11-48ff-a5cc-21a44de2f02b", + "metadata": {}, + "source": [ + "## Step 2: Setup prerequisites\n", + "\n", + "**Voyage AI**\n", + "- [Obtain a Voyage AI API key](https://dashboard.voyageai.com/organization/api-keys)\n", + "\n", + "**MongoDB**\n", + "- Register for a [free MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register)\n", + "- [Create a new database cluster](https://www.mongodb.com/docs/guides/atlas/cluster/)\n", + "- [Obtain the connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/) for your database cluster\n", + "\n", + "**Anthropic**\n", + "- [Obtain an Anthropic API key](https://platform.claude.com/settings/keys)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "12b1c0b3-b18a-4fda-b730-0693b5259a8f", + "metadata": {}, + "outputs": [], + "source": [ + "import getpass\n", + "import os\n", + "\n", + "import anthropic\n", + "import voyageai\n", + "from pymongo import MongoClient" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "id": "d05bb72f-bebc-4255-8206-7a3b70b3d302", + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "Enter your Voyage API key: ········\n" + ] + } + ], + "source": [ + "# Set Voyage API key as an environment variable\n", + "os.environ[\"VOYAGE_API_KEY\"] = getpass.getpass(\"Enter your Voyage API key:\")\n", + "# Initialize the Voyage AI client\n", + "voyage_client = voyageai.Client()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "662fc622-7d64-4d12-acb4-8eaa425aa829", + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "Enter your MongoDB connection string: ········\n" + ] + }, + { + "data": { + "text/plain": [ + "{'ok': 1.0,\n", + " '$clusterTime': {'clusterTime': Timestamp(1767387291, 1),\n", + " 'signature': {'hash': b'\\xf8\\xbcI\\xcf\\x81DR\\xc1\\xcdO\\xcf\\xa8\\x1d\\xc9\\x1do\\x14dH\\xf2',\n", + " 'keyId': 7558184680432861186}},\n", + " 'operationTime': Timestamp(1767387291, 1)}" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Set the MongoDB connection string\n", + "MONGODB_URI = getpass.getpass(\"Enter your MongoDB connection string:\")\n", + "# Initialize the MongoDB client\n", + "mongodb_client = MongoClient(\n", + " MONGODB_URI, appname=\"devrel.showcase.agentic_video_search\"\n", + ")\n", + "# Check MongoDB connection\n", + "mongodb_client.admin.command(\"ping\")" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "f0baa8a5-8625-4d41-a2ab-13404b13f3cd", + "metadata": {}, + "outputs": [ + { + "name": "stdin", + "output_type": "stream", + "text": [ + "Enter your Anthropic API key: ········\n" + ] + } + ], + "source": [ + "# Set Anthropic API key as an environment variable\n", + "os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass(\"Enter your Anthropic API key:\")\n", + "# Initialize the Anthropic client\n", + "anthropic_client = anthropic.Anthropic()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "b252c917-c714-4e6b-bd55-a420161bb96f", + "metadata": {}, + "outputs": [], + "source": [ + "# Make ffmpeg accessible from the notebook\n", + "# Replace /path/to/ffmpeg with your ffmpeg path\n", + "os.environ[\"PATH\"] = f\"/path/to/ffmpeg:{os.environ['PATH']}\"" + ] + }, + { + "cell_type": "markdown", + "id": "c58c9820-c53b-49e5-8023-b5234cf7d817", + "metadata": {}, + "source": [ + "## Step 3: Download the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 172, + "id": "6b1cef44-0d97-44fb-878e-7e3872c0d8fe", + "metadata": {}, + "outputs": [], + "source": [ + "from huggingface_hub import snapshot_download" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "id": "d698eba9-fa77-4714-b7d5-558aecd9e93d", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "5153fe076e7c460eb032a40783accb03", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Downloading (incomplete total...): 0.00B [00:00, ?B/s]" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "71591b99eceb486b808022f317715839", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Fetching 10 files: 0%| | 0/10 [00:00 list[list]:\n", + " \"\"\"\n", + " Generate embeddings using Voyage AI's latest multimodal embedding model.\n", + "\n", + " Args:\n", + " inputs (list[list]): Inputs as a list of lists\n", + " input_type (str): Type of input. Can be one of \"document\" or \"query\"\n", + "\n", + " Returns:\n", + " list[list]: List of embeddings\n", + " \"\"\"\n", + " embeddings = voyage_client.multimodal_embed(\n", + " inputs=inputs, model=MODEL_NAME, input_type=input_type\n", + " ).embeddings\n", + " return embeddings" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "af2e7192-1bf1-4f84-8726-fc296ba3c7b7", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + " 0%| | 0/17 [00:00 str:\n", + " \"\"\"\n", + " Format a second timestamp as min:sec.\n", + "\n", + " Args:\n", + " seconds (int): Time in seconds\n", + "\n", + " Returns:\n", + " str: Formatted timestamp\n", + " \"\"\"\n", + " mins = int(seconds // 60)\n", + " secs = int(seconds % 60)\n", + " return f\"{mins}:{secs:02d}\"" + ] + }, + { + "cell_type": "code", + "execution_count": 194, + "id": "360aa134-ca0f-4a21-84ba-0aa4edc12692", + "metadata": {}, + "outputs": [], + "source": [ + "def vector_search(query: str) -> None:\n", + " \"\"\"\n", + " Retrieve relevant video segments using vector search.\n", + "\n", + " Args:\n", + " query (str): User query string\n", + " \"\"\"\n", + " query_embedding = generate_embeddings([[query]], \"query\")[0]\n", + " pipeline = [\n", + " {\n", + " \"$vectorSearch\": {\n", + " \"index\": \"vector-index\",\n", + " \"queryVector\": query_embedding,\n", + " \"path\": \"embedding\",\n", + " \"numCandidates\": 200,\n", + " \"limit\": 3,\n", + " }\n", + " },\n", + " {\n", + " \"$project\": {\n", + " \"_id\": 0,\n", + " \"video_title\": \"$metadata.video_title\",\n", + " \"start\": \"$metadata.start\",\n", + " \"end\": \"$metadata.end\",\n", + " \"score\": {\"$meta\": \"vectorSearchScore\"},\n", + " }\n", + " },\n", + " ]\n", + "\n", + " results = collection.aggregate(pipeline)\n", + " for result in results:\n", + " print(\n", + " f\"{result.get('video_title')} ({format_time(result.get('start'))} - {format_time(result.get('end'))})\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 201, + "id": "765fb1f3-a410-4ea6-89b4-689b57c83e15", + "metadata": {}, + "outputs": [], + "source": [ + "def hybrid_search(query: str) -> None:\n", + " \"\"\"\n", + " Retrieve relevant video segments using hybrid search.\n", + "\n", + " Args:\n", + " query (str): User query string\n", + " \"\"\"\n", + " query_embedding = generate_embeddings([[query]], \"query\")[0]\n", + " pipeline = [\n", + " {\n", + " \"$rankFusion\": {\n", + " \"input\": {\n", + " \"pipelines\": {\n", + " \"vector_pipeline\": [\n", + " {\n", + " \"$vectorSearch\": {\n", + " \"index\": \"vector-index\",\n", + " \"path\": \"embedding\",\n", + " \"queryVector\": query_embedding,\n", + " \"numCandidates\": 200,\n", + " \"limit\": 10,\n", + " }\n", + " }\n", + " ],\n", + " \"fts_pipeline\": [\n", + " {\n", + " \"$search\": {\n", + " \"index\": \"fts-index\",\n", + " \"text\": {\"query\": query, \"path\": \"caption\"},\n", + " }\n", + " },\n", + " {\"$limit\": 10},\n", + " ],\n", + " }\n", + " },\n", + " \"combination\": {\n", + " \"weights\": {\"vector_pipeline\": 0.5, \"fts_pipeline\": 0.5}\n", + " },\n", + " \"scoreDetails\": True,\n", + " }\n", + " },\n", + " {\n", + " \"$project\": {\n", + " \"_id\": 0,\n", + " \"video_title\": \"$metadata.video_title\",\n", + " \"start\": \"$metadata.start\",\n", + " \"end\": \"$metadata.end\",\n", + " \"score\": \"$scoreDetails.value\",\n", + " }\n", + " },\n", + " {\"$limit\": 3},\n", + " ]\n", + "\n", + " results = collection.aggregate(pipeline)\n", + " for result in results:\n", + " print(\n", + " f\"{result.get('video_title')} ({format_time(result.get('start'))} - {format_time(result.get('end'))})\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 196, + "id": "17923831-57bf-4aed-9ee1-ed8196ddaa8c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)\n" + ] + } + ], + "source": [ + "vector_search(\"Rolling croissant dough\")" + ] + }, + { + "cell_type": "code", + "execution_count": 202, + "id": "3490f293-0b1d-4a66-82f3-30f9f443318f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Artisan Sourdough Bread Folding Technique (0:10 - 0:18)\n", + "Artisan Sourdough Bread Folding Technique (0:19 - 0:20)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)\n" + ] + } + ], + "source": [ + "hybrid_search(\"Coil fold technique\")" + ] + }, + { + "cell_type": "markdown", + "id": "61a57c6f-7981-48b7-a0c8-0e5306241ecb", + "metadata": {}, + "source": [ + "## Step 9: Building the Agentic Search Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "id": "6da79c72-de42-4f62-b8c1-697327f9f821", + "metadata": {}, + "outputs": [], + "source": [ + "# Define structured output schema\n", + "output_schema = {\n", + " \"type\": \"object\",\n", + " \"properties\": {\"search\": {\"type\": \"string\", \"enum\": [\"vector\", \"hybrid\"]}},\n", + " \"required\": [\"search\"],\n", + " \"additionalProperties\": False,\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "id": "704d4120-fbce-4b8c-b5f1-360d8216a3aa", + "metadata": {}, + "outputs": [], + "source": [ + "SYSTEM_PROMPT = \"\"\"Given a query, choose the optimal search strategy to retrieve the most relevant video segments for it: \n", + "\n", + "vector\n", + "- Best for: Visual actions and details, methods, concepts or general descriptions.\n", + "- Examples: \"How to chop onions\", \"Grilling vegetables\"\n", + "\n", + "hybrid\n", + "- Best for: Specific names and terms such as techniques, chef names, dietary restrictions etc.\n", + "- Examples: \"Coil fold technique\", \"Egg wash ingredients\"\n", + "\n", + "Default to vector unless exact word matching is critical.\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "id": "93fc66bb-ae15-4c4b-a868-7bad3bc3426c", + "metadata": {}, + "outputs": [], + "source": [ + "def get_search_type(query: str) -> str:\n", + " \"\"\"\n", + " Use an LLM to determine the search strategy based on the query.\n", + "\n", + " Args:\n", + " query (str): User query string\n", + "\n", + " Returns:\n", + " str: Search type. One of \"vector\" or \"hybrid\"\n", + " \"\"\"\n", + " print(\"Determining search type...\")\n", + " response = anthropic_client.beta.messages.create(\n", + " model=\"claude-sonnet-4-5\",\n", + " max_tokens=50,\n", + " temperature=0,\n", + " betas=[\"structured-outputs-2025-11-13\"],\n", + " system=SYSTEM_PROMPT,\n", + " messages=[{\"role\": \"user\", \"content\": f\"Query: {query}\"}],\n", + " output_format={\"type\": \"json_schema\", \"schema\": output_schema},\n", + " )\n", + " search_type = json.loads(response.content[0].text).get(\"search\", \"unknown\")\n", + " print(f\"Using search type: {search_type}\")\n", + " return search_type" + ] + }, + { + "cell_type": "code", + "execution_count": 183, + "id": "b92e7a4d-fb05-4838-bc42-2b00a21fe1da", + "metadata": {}, + "outputs": [], + "source": [ + "def search(query: str) -> None:\n", + " \"\"\"\n", + " Given a query, determine the search type and execute the search.\n", + "\n", + " Args:\n", + " query (str): User quqery string\n", + " \"\"\"\n", + " search_type = get_search_type(query)\n", + " if search_type == \"vector\":\n", + " vector_search(query)\n", + " elif search_type == \"hybrid\":\n", + " hybrid_search(query)\n", + " else:\n", + " print(f\"Not a supported search type: {search_type}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 184, + "id": "19357309-57ec-42b1-896e-b45bb147661d", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Determining search type...\n", + "Using search type: vector\n", + "Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)\n" + ] + } + ], + "source": [ + "search(\"Rolling croissant dough\")" + ] + }, + { + "cell_type": "code", + "execution_count": 203, + "id": "0da06647-0fb9-4d6a-a7af-a9ef9cea32a0", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Determining search type...\n", + "Using search type: hybrid\n", + "Artisan Sourdough Bread Folding Technique (0:10 - 0:18)\n", + "Artisan Sourdough Bread Folding Technique (0:19 - 0:20)\n", + "Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)\n" + ] + } + ], + "source": [ + "search(\"Coil fold technique\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.19" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {} + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}