Skip to content

47thtechcorner/RayCodes_RAGAnything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 

Repository files navigation

RAG-Anything: Local Multimodal Document Intelligence

RAG-Anything is a state-of-the-art Multimodal Retrieval-Augmented Generation (RAG) system running entirely on local hardware. It leverages MinerU for high-fidelity document parsing and LightRAG for advanced graph-based retrieval, all powered by local LLMs via Ollama.

๐Ÿš€ Key Features

  • All-in-One Processing: Handles text, images, tables, and mathematical equations from complex PDFs and Office documents.
  • 100% Local: No API keys required. Your data never leaves your machine.
  • Multimodal Knowledge Graph: Automatically extracts entities and relationships across different content types for deeper understanding.
  • Vision-Aware Retrieval: Uses vision-language models (VLM) to analyze figures and charts directly.

๐Ÿ› ๏ธ Tech Stack

  • Framework: raganything (HKUDS)
  • Indexing Engine: LightRAG (Graph-based RAG)
  • Document Parser: MinerU (VLM-based parsing)
  • Local LLM Server: Ollama
  • Models Used:
    • LLM: gemma4:latest (Reasoning & Chat)
    • Vision: qwen3-vl:4b (Image & Chart understanding)
    • Embeddings: nomic-embed-text:latest (Vector search)

๐Ÿ“ Code Modules

  • ollama_rag.py: The main entry point. Configures the RAG-Anything pipeline to use local Ollama endpoints for text completion, vision tasks, and embeddings.
  • rag_storage/: Directory containing the LightRAG knowledge graph, vector database, and document status.
  • output/: Contains the structured output from MinerU (JSON, Markdown, and extracted images).

๐Ÿƒ How to Run

Prerequisites

  1. Install Ollama.
  2. Pull the required models:
    ollama run gemma4:e2b   
    ollama run qwen3-vl:4b
    ollama run nomic-embed-text

Installation

pip install "raganything[all]"

Execution

  1. Navigate to the project directory:
    cd "D:\Ray Codes\AG Projects\RAGAnything"
  2. Start the indexing and interactive query session:
    python ollama_rag.py "path/to/your/document.pdf"

๐ŸŒŸ Use Cases

  1. Academic Research: Index research papers and ask for comparisons between graphs, formulas, and text across multiple documents.
  2. Technical Datasheets: Extract precise specifications from complex tables and circuit diagrams in engineering PDFs.
  3. Financial Analysis: Analyze annual reports where key data is often trapped in charts and nested tables.
  4. Legal Discovery: Parse and index large contracts where cross-referencing between sections and exhibits is critical.

๐Ÿ”ฎ Future Features

  • Multi-Document Support: Expand the UI to manage and search across large libraries of documents simultaneously.
  • Gradio Dashboard: A full web interface for document upload, visual graph exploration, and chat.
  • Adaptive Parsing: Dynamically switch between MinerU, Docling, and OCR based on document complexity to save resources.
  • Export to Obsidian/Logseq: Automatically convert processed multimodal documents into linked notes for personal knowledge management.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages