🛠️ RagForge: The Ultimate RAG Pipeline Factory

RagForge is a high-performance, developer-centric platform for building, versioning, and deploying Retrieval-Augmented Generation (RAG) pipelines. Effortlessly transform your documents into searchable, intelligent knowledge bases powered by the latest LLMs.

✨ Key Features

🏗️ Advanced Pipeline Management

Multi-Project Support: Organize your workflows into distinct projects.
Linear Versioning: Experiment with different configurations (chunk size, overlap, models) by creating multiple versions (v1, v2, ...) for each pipeline.
Granular Control: Fine-tune how your data is processed at the version level.

📥 Robust Document Ingestion

Multi-Format Support: Process PDF, DOCX, and TXT files seamlessly.
Smart Storage: Integrated with S3/Cloudflare R2 for scalable file management.
Intelligent OCR: Automatically detects scanned documents with low text density and offers LLM-powered OCR.
Real-time Feedback: Detailed progress tracking with granular states: uploading ➔ extracting ➔ embedding ➔ ready.
Asynchronous Processing: Heavy ingestion tasks are offloaded to BullMQ background workers, ensuring the UI remains responsive even during large batch uploads.
Wait-Time Estimation: Smart calculation of processing time based on document size and server load.

🧠 High-Performance RAG Engine

Optimized Embeddings: Parallelized batch processing (10x concurrency) with exponential backoff to maximize throughput while respecting API limits.
Semantic Retrieval: Advanced vector search using Cosine Similarity to find the most relevant context.
Grounded Chat: Interactive chat interface that provides LLM responses strictly grounded in your uploaded documents.
Source Citations: Every answer comes with precise references (document name, page number, and text snippet).

🛠️ Developer Ecosystem

API Key Management: Securely generate, hash, and manage API keys for programmatic access.
Usage Analytics: Track document counts, chunk statistics, query response times, and token usage through a beautiful dashboard.
End-to-End Type Safety: Built with tRPC and TypeScript for a rock-solid developer experience.

⚙️ How It Works

The RAG Lifecycle

Ingestion & Background Processing: When a document is uploaded, it is stored in S3/R2. A background worker (powered by BullMQ and Redis) is immediately triggered. This ensures the main web server remains responsive even while processing massive documents.
Extraction & Chunking: The worker extracts text (using high-performance PDF/DOCX parsers and LLM-powered OCR if necessary) and breaks it into overlapping chunks based on your pipeline's configuration.
Vectorization: Chunks are processed in parallel batches (10x concurrency) and sent to the Gemini Embedding API to generate high-dimensional vectors.
Retrieval: When you ask a question, your query is embedded. The system performs a similarity search across all chunks in the active pipeline version using native database-side vector operations.
Generation: The top context chunks are injected into a prompt for Gemma 4 / Gemini, which generates a grounded response with precise source citations.

🚀 Getting Started

Prerequisites

Bun runtime
MySQL (or TiDB) database
Redis (for background job processing)
Google AI API Key (for Gemini/Gemma)
Cloudflare R2 or AWS S3 credentials

Installation

Clone the repository:

git clone https://github.com/devshayan101/RAGForge.git
cd RAGForge

Install dependencies:
```
bun install
```
Configure environment: Create a .env file in the root directory (refer to .env.example).
Run migrations:
```
bun run db:push
```
Start the development server:
```
bun run dev
```

📖 Usage Guide

Create a Project: Start by defining a project container (e.g., "Customer Support Knowledge Base").
Define a Pipeline: Create a pipeline and its initial version (v1).
Upload Documents: Drag and drop your PDFs or text files. Watch the real-time progress bar as they are indexed.
Chat & Test: Head to the Chat tab to start querying your data. Verify the accuracy using the source citations.
Go Programmatic: Generate an API key and use the /api endpoints to integrate RagForge into your own applications.

🛠️ Tech Stack

Frontend: React 19, Tailwind CSS 4, wouter, Lucide Icons, shadcn/ui.
Backend: Express.js, tRPC, Bun.
Database: MySQL (via Drizzle ORM).
Queueing: BullMQ & Redis.
AI: Google Gemini Gemma 4 31B (LLM) & Gemini Embedding 2.
Storage: AWS S3 / Cloudflare R2.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
client		client
drizzle		drizzle
patches		patches
scratch		scratch
server		server
shared		shared
.gitignore		.gitignore
.gitkeep		.gitkeep
.prettierignore		.prettierignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
README.md		README.md
api_documentation.md		api_documentation.md
bun.lock		bun.lock
components.json		components.json
drizzle.config.ts		drizzle.config.ts
dump.rdb		dump.rdb
memurai-log.txt		memurai-log.txt
mindmap.md		mindmap.md
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
todo.md		todo.md
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts
working.md		working.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛠️ RagForge: The Ultimate RAG Pipeline Factory

✨ Key Features

🏗️ Advanced Pipeline Management

📥 Robust Document Ingestion

🧠 High-Performance RAG Engine

🛠️ Developer Ecosystem

⚙️ How It Works

The RAG Lifecycle

🚀 Getting Started

Prerequisites

Installation

📖 Usage Guide

🛠️ Tech Stack

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛠️ RagForge: The Ultimate RAG Pipeline Factory

✨ Key Features

🏗️ Advanced Pipeline Management

📥 Robust Document Ingestion

🧠 High-Performance RAG Engine

🛠️ Developer Ecosystem

⚙️ How It Works

The RAG Lifecycle

🚀 Getting Started

Prerequisites

Installation

📖 Usage Guide

🛠️ Tech Stack

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages