saturncloud · GeoSegun · Oct 27, 2025 · Oct 30, 2025 · Oct 30, 2025 · Jan 3, 2026
diff --git a/examples/multimodal_ai/cpu-whisper/README.md b/examples/multimodal_ai/cpu-whisper/README.md
@@ -0,0 +1,108 @@
+# 🎙️ Whisper Speech-to-Text: Saturn Cloud Template
+
+This template provides a production-ready environment for deploying **OpenAI Whisper** for high-accuracy speech-to-text tasks. It is optimized for **Saturn Cloud** GPU/CPU resources, allowing for seamless scaling from single-file transcription to large-scale batch processing.
+
+## 📋 Overview
+
+* **Title**: cpu-whisperSpeech-to-Text
+* **Tech Stack**: Whisper AI, PyTorch, FFmpeg, Librosa, Matplotlib
+* **Resource Type**: Saturn Cloud Deployment / Jupyter Server
+* **Description**: Whisper, Torch Transcribe sample audio, Waveform + transcript, Back logs.
+
+---
+
+## 🚀 Environment Setup
+
+The environment configuration is automated via a dedicated setup script designed for the Saturn Cloud file system.
+
+### 1. Initialize the Environment
+
+Run the custom setup script to install system dependencies (FFmpeg), configure your Python environment, and install the Whisper library.
+
+```bash
+# Execute your pre-configured setup script
+bash setup_saturn.sh
+
+```
+
+### 2. Activate the Environment
+
+Once the script completes, ensure you are working within the correct virtual environment:
+
+```bash
+source whisper_env/bin/activate
+
+```
+
+---
+
+## 🧪 Testing & Verification
+
+Your environment contains two primary test scripts to verify the full functionality of the pipeline.
+
+### 1. Running `test.py` (Audio Acquisition)
+
+This script verifies the network connectivity and hardware detection. It automatically downloads a high-quality sample audio file from Hugging Face and transcribe it (output on terminal).
+
+**Command:**
+
+```bash
+python test.py
+
+```
+
+**Terminal Output (Back Logs):**
+
+* **Device Detection**: Shows `Testing on Device: CUDA` (or CPU).
+* **Download Log**: Displays `Downloading sample audio...` followed by `Download complete.`.
+* **Model Loading**: Shows a progress bar for the Whisper `base` model (139MB).
+* **Transcription**: Prints a raw text block of the transcribed audio to the terminal.
+
+### 2. Running `test2.py` (Visualization & Export)
+
+This script tests the advanced features of the template, including waveform generation and local file processing.
+
+**Command:**
+
+```bash
+python test2.py
+
+```
+
+**Terminal Output (Back Logs):**
+
+* **Status**: `Loading model and transcribing...`.
+* **Visualization Log**: `Generating waveform...` using Librosa and Matplotlib.
+* **Success Message**: `Verification Complete: Check transcript.txt and waveform.png`.
+
+---
+
+## 📂 Expected Output Files
+
+After running the tests, verify the presence of these files in your **Explorer**:
+
+* **`sample1.flac`**: The downloaded test audio.
+* **`transcript.txt`**: The saved text version of the transcription.
+* **`waveform.png`**: The visual representation of the audio waves.
+
+---
+
+## 📊 Model Selection Guide
+
+Choose the model size that best fits your hardware constraints on Saturn Cloud.
+
+| Model | Parameters | Required VRAM | Relative Speed |
+| --- | --- | --- | --- |
+| **Tiny** | 39 M | ~1 GB | ~10x |
+| **Base** | 74 M | ~1 GB | ~7x |
+| **Small** | 244 M | ~2 GB | ~4x |
+| **Medium** | 769 M | ~5 GB | ~2x |
+| **Large** | 1550 M | ~10 GB | 1x |
+
+---
+
+## 🔗 Reference Links
+
+* **Platform**: [Saturn Cloud Dashboard](https://saturncloud.io/)
+* **Support**: [Saturn Cloud Documentation](https://saturncloud.io/docs/)
+* **Community**: [Whisper AI Discussions](https://github.com/openai/whisper/discussions)
diff --git a/examples/multimodal_ai/cpu-whisper/setup_saturn.sh b/examples/multimodal_ai/cpu-whisper/setup_saturn.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+# Exit on any error
+set -e
+
+echo "--- 1. Environment Pre-flight Check ---"
+
+# Update package list
+echo "Updating system package repositories..."
+sudo apt update -y
+
+# Install FFmpeg and Python dependencies
+echo "Installing FFmpeg and Python tools..."
+sudo apt install -y ffmpeg python3-pip python3-venv
+
+# Verify FFmpeg installation
+if ffmpeg -version > /dev/null 2>&1; then
+    echo "SUCCESS: FFmpeg is installed and ready."
+else
+    echo "ERROR: FFmpeg installation failed."
+    exit 1
+fi
+
+echo "--- 2. Setting Up Python Environment ---"
+
+# Create and activate a virtual environment
+echo "Creating virtual environment: whisper_env..."
+python3 -m venv whisper_env
+source whisper_env/bin/activate
+
+# Install OpenAI Whisper
+echo "Installing OpenAI Whisper..."
+pip install -U openai-whisper
+
+
+# Install core transcription and visualization dependencies
+pip install openai-whisper librosa matplotlib
+
+# Verify Whisper installation
+if whisper --help > /dev/null 2>&1; then
+    echo "SUCCESS: Whisper AI is installed."
+else
+    echo "ERROR: Whisper AI installation failed."
+    exit 1
+fi
+
+echo "--- Setup Complete ---"
+echo "You can now run your transcription tests using 'whisper <audio_file>'."
diff --git a/examples/multimodal_ai/cpu-whisper/test.py b/examples/multimodal_ai/cpu-whisper/test.py
@@ -0,0 +1,40 @@
+import torch
+import whisper
+import os
+import urllib.request
+
+# 1. Hardware Detection
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Testing on Device: {device.upper()}")
+
+# 2. Verified Stable Test Audio
+# This is a sample1.flac file from Hugging Face spaces
+audio_url = "https://huggingface.co/spaces/speechbox/whisper-restore-punctuation/resolve/main/sample1.flac"
+audio_file = "sample1.flac"
+
+try:
+    if not os.path.exists(audio_file):
+        print(f"Downloading sample audio from {audio_url}...")
+        # Standard headers to ensure the server accepts the request
+        req = urllib.request.Request(audio_url, headers={'User-Agent': 'Mozilla/5.0'})
+        with urllib.request.urlopen(req) as response, open(audio_file, 'wb') as out_file:
+            out_file.write(response.read())
+        print("Download complete.")
+except Exception as e:
+    print(f"Error downloading audio: {e}")
+    exit(1)
+
+# 3. Load Model and Transcribe
+print("Loading Whisper 'base' model...")
+# The 'base' model requires ~1GB VRAM and is ~7x faster than the large model
+model = whisper.load_model("base", device=device)
+
+print("Starting transcription...")
+# Ensure ffmpeg is installed as it is required for audio processing
+result = model.transcribe(audio_file)
+
+# 4. Final Output Verification
+print("-" * 30)
+print("TRANSCRIPT OUTPUT:")
+print(result["text"].strip())
+print("-" * 30)
diff --git a/examples/multimodal_ai/cpu-whisper/test2.py b/examples/multimodal_ai/cpu-whisper/test2.py
@@ -0,0 +1,24 @@
+import whisper
+import torch
+import librosa
+import matplotlib.pyplot as plt
+
+# Check hardware
+device = "cuda" if torch.cuda.is_available() else "cpu"
+print(f"Running on: {device.upper()}")
+
+# Load model and transcribe
+model = whisper.load_model("base", device=device)
+result = model.transcribe("sample1.flac")
+
+# Export Transcript
+with open("transcript.txt", "w") as f:
+    f.write(result["text"])
+
+# Generate Waveform
+y, sr = librosa.load("sample1.flac")
+plt.figure(figsize=(10, 4))
+librosa.display.waveshow(y, sr=sr)
+plt.savefig("waveform.png")
+
+print("Verification Complete: Check transcript.txt and waveform.png")
diff --git a/examples/multimodal_ai/nvidia-video-rag/README.md b/examples/multimodal_ai/nvidia-video-rag/README.md
@@ -0,0 +1,47 @@
+# 🎥 Video Q&A Pipeline (LangChain + Transformers)
+
+A lightweight, modular pipeline that enables question-answering from video content using frame extraction, image captioning, semantic retrieval, and LLM-based response generation.
+
+## 🚀 Features
+
+* ✅ Frame extraction from video (OpenCV)
+* 🧠 Image captioning using ViT-GPT2 (Hugging Face)
+* 🔍 Semantic retrieval with ChromaDB + LangChain
+* 🤖 Q&A using `flan-t5-small` (Text2Text pipeline)
+* 💻 Works in CPU/GPU environments
+
+## 📦 Dependencies
+
+* `torch`, `transformers`, `opencv-python-headless`
+* `langchain`, `langchain-community`, `langchain-huggingface`
+* `sentence-transformers`, `chromadb`, `Pillow`
+
+
+## 🧩 Pipeline Overview
+
+```text
+Video → Frames → Captions → Embeddings → ChromaDB → Retriever + LLM → Answer
+```
+
+## 🛠️ Usage
+
+1. **Run in Jupyter**
+
+2. **Open the notebook** and follow steps:
+
+   * 📥 Download video
+   * 🖼️ Extract frames
+   * 🧾 Generate captions
+   * 💾 Store in ChromaDB
+   * ❓ Ask questions via LLM
+
+## 🧠 Example Questions
+
+* What is happening in the video?
+* What objects or people appear?
+* Describe the main activity.
+
+## ✅ Conclusion
+
+This template provides a clean foundation for building **video understanding** applications using modern AI tooling. Extend it with your own videos, models, or use cases.
+