RAG-Anything: Local Multimodal Document Intelligence

RAG-Anything is a state-of-the-art Multimodal Retrieval-Augmented Generation (RAG) system running entirely on local hardware. It leverages MinerU for high-fidelity document parsing and LightRAG for advanced graph-based retrieval, all powered by local LLMs via Ollama.

🚀 Key Features

All-in-One Processing: Handles text, images, tables, and mathematical equations from complex PDFs and Office documents.
100% Local: No API keys required. Your data never leaves your machine.
Multimodal Knowledge Graph: Automatically extracts entities and relationships across different content types for deeper understanding.
Vision-Aware Retrieval: Uses vision-language models (VLM) to analyze figures and charts directly.

🛠️ Tech Stack

Framework: raganything (HKUDS)
Indexing Engine: LightRAG (Graph-based RAG)
Document Parser: MinerU (VLM-based parsing)
Local LLM Server: Ollama
Models Used:
- LLM: gemma4:latest (Reasoning & Chat)
- Vision: qwen3-vl:4b (Image & Chart understanding)
- Embeddings: nomic-embed-text:latest (Vector search)

📁 Code Modules

ollama_rag.py: The main entry point. Configures the RAG-Anything pipeline to use local Ollama endpoints for text completion, vision tasks, and embeddings.
rag_storage/: Directory containing the LightRAG knowledge graph, vector database, and document status.
output/: Contains the structured output from MinerU (JSON, Markdown, and extracted images).

🏃 How to Run

Prerequisites

Install Ollama.

Pull the required models:

ollama run gemma4:e2b   
ollama run qwen3-vl:4b
ollama run nomic-embed-text

Installation

pip install "raganything[all]"

Execution

Navigate to the project directory:

cd "D:\Ray Codes\AG Projects\RAGAnything"

Start the indexing and interactive query session:

python ollama_rag.py "path/to/your/document.pdf"

🌟 Use Cases

Academic Research: Index research papers and ask for comparisons between graphs, formulas, and text across multiple documents.
Technical Datasheets: Extract precise specifications from complex tables and circuit diagrams in engineering PDFs.
Financial Analysis: Analyze annual reports where key data is often trapped in charts and nested tables.
Legal Discovery: Parse and index large contracts where cross-referencing between sections and exhibits is critical.

🔮 Future Features

Multi-Document Support: Expand the UI to manage and search across large libraries of documents simultaneously.
Gradio Dashboard: A full web interface for document upload, visual graph exploration, and chat.
Adaptive Parsing: Dynamically switch between MinerU, Docling, and OCR based on document complexity to save resources.
Export to Obsidian/Logseq: Automatically convert processed multimodal documents into linked notes for personal knowledge management.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
ollama_rag.py		ollama_rag.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-Anything: Local Multimodal Document Intelligence

🚀 Key Features

🛠️ Tech Stack

📁 Code Modules

🏃 How to Run

Prerequisites

Installation

Execution

🌟 Use Cases

🔮 Future Features

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG-Anything: Local Multimodal Document Intelligence

🚀 Key Features

🛠️ Tech Stack

📁 Code Modules

🏃 How to Run

Prerequisites

Installation

Execution

🌟 Use Cases

🔮 Future Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages