Quick Start · Key Features · Web UI · How it Works · FAQ
🔍 Agentic Search • 🧠 Knowledge Clustering • 📊 Monte Carlo Evidence Sampling
⚡ Indexless Retrieval • 🔄 Self-Evolving Knowledge Base • 💬 Real-time Chat
Intelligence pipelines built upon vector-based retrieval can be rigid and brittle. They rely on static vector embeddings that are expensive to compute, blind to real-time changes, and detached from the raw context. We introduce Sirchmunk to usher in a more agile paradigm, where data is no longer treated as a snapshot, and insights can evolve together with the data.
Sirchmunk works directly with raw data -- bypassing the heavy overhead of squeezing your rich files into fixed-dimensional vectors.
- Instant Search: Eliminating complex pre-processing pipelines in hours long indexing; just drop your files and search immediately.
- Full Fidelity: Zero information loss —- stay true to your data without vector approximation.
Data is a stream, not a snapshot. Sirchmunk is dynamic by design, while vector DB can become obsolete the moment your data changes.
- Context-Aware: Evolves in real-time with your data context.
- LLM-Powered Autonomy: Designed for Agents that perceive data as it lives, utilizing token-efficient reasoning that triggers LLM inference only when necessary to maximize intelligence while minimizing cost.
Sirchmunk bridges massive local repositories and the web with high-scale throughput and real-time awareness.
It serves as a unified intelligent hub for AI agents, delivering deep insights across vast datasets at the speed of thought.
| Dimension | Traditional RAG | ✨Sirchmunk |
|---|---|---|
| 💰 Setup Cost | High Overhead (VectorDB, GraphDB, Complex Document Parser...) |
✅ Zero Infrastructure Direct-to-data retrieval without vector silos |
| 🕒 Data Freshness | Stale (Batch Re-indexing) |
✅ Instant & Dynamic Self-evolving index that reflects live changes |
| 📈 Scalability | Linear Cost Growth |
✅ Extremely low RAM/CPU consumption Native Elastic Support, efficiently handles large-scale datasets |
| 🎯 Accuracy | Approximate Vector Matches |
✅ Deterministic & Contextual Hybrid logic ensuring semantic precision |
| ⚙️ Workflow | Complex ETL Pipelines |
✅ Drop-and-Search Zero-config integration for rapid deployment |
-
🚀 Feb 5, 2026: Release v0.0.2 — MCP Support, CLI Commands & Knowledge Persistence!
- MCP Integration: Full Model Context Protocol support, works seamlessly with Claude Desktop and Cursor IDE.
- CLI Commands: New
sirchmunkCLI withinit,serve,search,web, andmcpcommands. - KnowledgeCluster Persistence: DuckDB-powered storage with Parquet export for efficient knowledge management.
- Knowledge Reuse: Semantic similarity-based cluster retrieval for faster searches via embedding vectors.
-
🎉🎉 Jan 22, 2026: Introducing Sirchmunk: Initial Release v0.0.1 Now Available!
- Python 3.10+
- LLM API Key (OpenAI-compatible endpoint, local or remote)
- Node.js 18+ (Optional, for web interface)
# Create virtual environment (recommended)
conda create -n sirchmunk python=3.13 -y && conda activate sirchmunk
pip install sirchmunk
# Or via UV:
uv pip install sirchmunk
# Alternatively, install from source:
git clone https://github.com/modelscope/sirchmunk.git && cd sirchmunk
pip install -e .import asyncio
from sirchmunk import AgenticSearch
from sirchmunk.llm import OpenAIChat
llm = OpenAIChat(
api_key="your-api-key",
base_url="your-base-url", # e.g., https://api.openai.com/v1
model="your-model-name" # e.g., gpt-4o
)
async def main():
searcher = AgenticSearch(llm=llm)
result: str = await searcher.search(
query="How does transformer attention work?",
paths=["/path/to/documents"],
)
print(result)
asyncio.run(main())- Upon initialization,
AgenticSearchautomatically checks ifripgrep-allandripgrepare installed. If they are missing, it will attempt to install them automatically. If the automatic installation fails, please install them manually. - Replace
"your-api-key","your-base-url","your-model-name"and/path/to/documentswith your actual values.
Sirchmunk provides a powerful CLI for server management and search operations.
pip install "sirchmunk[web]"
# or install via UV
uv pip install "sirchmunk[web]"# Initialize Sirchmunk with default settings (Default work path: `~/.sirchmunk/`)
sirchmunk init
# Alternatively, initialize with custom work path
sirchmunk init --work-path /path/to/workspace# Start backend API server only
sirchmunk serve
# Custom host and port
sirchmunk serve --host 0.0.0.0 --port 8000# Search in current directory
sirchmunk search "How does authentication work?"
# Search in specific paths
sirchmunk search "find all API endpoints" ./src ./docs
# Quick filename search
sirchmunk search "config" --mode FILENAME_ONLY
# Output as JSON
sirchmunk search "database schema" --output json
# Use API server (requires running server)
sirchmunk search "query" --api --api-url http://localhost:8584| Command | Description |
|---|---|
sirchmunk init |
Initialize working directory, .env, and MCP config |
sirchmunk serve |
Start the backend API server |
sirchmunk search |
Perform search queries |
sirchmunk web init |
Build WebUI frontend (requires Node.js 18+) |
sirchmunk web serve |
Start API + WebUI (single port) |
sirchmunk web serve --dev |
Start API + Next.js dev server (hot-reload) |
sirchmunk mcp serve |
Start the MCP server (stdio/HTTP) |
sirchmunk mcp version |
Show MCP version information |
sirchmunk version |
Show version information |
Sirchmunk provides a Model Context Protocol (MCP) server that exposes its intelligent search capabilities as MCP tools. This enables seamless integration with AI assistants like Claude Desktop and Cursor IDE.
# Install with MCP support
pip install sirchmunk[mcp]
# Initialize (generates .env and mcp_config.json)
sirchmunk init
# Edit ~/.sirchmunk/.env with your LLM API key
# Test with MCP Inspector
npx @modelcontextprotocol/inspector sirchmunk mcp serveAfter running sirchmunk init, a ~/.sirchmunk/mcp_config.json file is generated. Copy it to your MCP client configuration directory.
Example:
{
"mcpServers": {
"sirchmunk": {
"command": "sirchmunk",
"args": ["mcp", "serve"],
"env": {
"SIRCHMUNK_SEARCH_PATHS": "/path/to/your_docs,/another/path"
}
}
}
}| Parameter | Description |
|---|---|
command |
The command to start the MCP server. Use full path (e.g. /path/to/venv/bin/sirchmunk) if running in a virtual environment. |
args |
Command arguments. ["mcp", "serve"] starts the MCP server in stdio mode. |
env.SIRCHMUNK_SEARCH_PATHS |
Default document search directories (comma-separated). Supports both English , and Chinese , as delimiters. When set, these paths are used as default if no paths parameter is provided during tool invocation. |
Tip: MCP Inspector is a great way to test the integration before connecting to your AI assistant. In MCP Inspector: Connect → Tools → List Tools →
sirchmunk_search→ Input parameters (queryandpaths, e.g.["/path/to/your_docs"]) → Run Tool.
- Multi-Mode Search: DEEP mode for comprehensive analysis, FILENAME_ONLY for fast file discovery
- Knowledge Cluster Management: Automatic extraction, storage, and reuse of knowledge
- Standard MCP Protocol: Works with stdio and Streamable HTTP transports
📖 For detailed documentation, see Sirchmunk MCP README.
The web UI is built for fast, transparent workflows: chat, knowledge analytics, and system monitoring in one place.
Build the frontend once, then serve everything from a single port — no Node.js needed at runtime.
# Build WebUI frontend (requires Node.js 18+ at build time)
sirchmunk web init
# Start server with embedded WebUI
sirchmunk web serveAccess: http://localhost:8584 (API + WebUI on the same port)
For frontend development with hot-reload:
# Start backend + Next.js dev server
sirchmunk web serve --devAccess:
- Frontend (hot-reload): http://localhost:8585
- Backend APIs: http://localhost:8584/docs
# Start frontend and backend via script
python scripts/start_web.py
# Stop all services
python scripts/stop_web.pyConfiguration:
- Access
Settings→Envrionment Variablesto configure LLM API, and other parameters.
| Component | Description |
|---|---|
| AgenticSearch | Search orchestrator with LLM-enhanced retrieval capabilities |
| KnowledgeBase | Transforms raw results into structured knowledge clusters with evidences |
| EvidenceProcessor | Evidence processing based on the MonteCarlo Importance Sampling |
| GrepRetriever | High-performance indexless file search with parallel processing |
| OpenAIChat | Unified LLM interface supporting streaming and usage tracking |
| MonitorTracker | Real-time system and application metrics collection |
All persistent data is stored in the configured SIRCHMUNK_WORK_PATH (default: ~/.sirchmunk/):
{SIRCHMUNK_WORK_PATH}/
├── .cache/
├── history/ # Chat session history (DuckDB)
│ └── chat_history.db
├── knowledge/ # Knowledge clusters (Parquet)
│ └── knowledge_clusters.parquet
└── settings/ # User settings (DuckDB)
└── settings.db
When the server is running (sirchmunk serve or sirchmunk web serve), the Search API is accessible via any HTTP client.
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/search |
Execute a search query |
GET |
/api/v1/search/status |
Check server and LLM configuration status |
Interactive Docs: http://localhost:8584/docs (Swagger UI)
cURL Examples
# Basic search (DEEP mode)
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "How does authentication work?",
"paths": ["/path/to/project"],
"mode": "DEEP"
}'
# Filename search (fast, no LLM required)
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "config",
"paths": ["/path/to/project"],
"mode": "FILENAME_ONLY"
}'
# Full parameters
curl -X POST http://localhost:8584/api/v1/search \
-H "Content-Type: application/json" \
-d '{
"query": "database connection pooling",
"paths": ["/path/to/project/src"],
"mode": "DEEP",
"max_depth": 10,
"top_k_files": 20,
"keyword_levels": 3,
"include_patterns": ["*.py", "*.java"],
"exclude_patterns": ["*test*", "*__pycache__*"],
"return_cluster": true
}'
# Check server status
curl http://localhost:8584/api/v1/search/statusPython Client Examples
Using requests:
import requests
response = requests.post(
"http://localhost:8584/api/v1/search",
json={
"query": "How does authentication work?",
"paths": ["/path/to/project"],
"mode": "DEEP"
},
timeout=300 # DEEP mode may take a while
)
data = response.json()
if data["success"]:
print(data["data"]["result"])Using httpx (async):
import httpx
import asyncio
async def search():
async with httpx.AsyncClient(timeout=300) as client:
resp = await client.post(
"http://localhost:8584/api/v1/search",
json={
"query": "find all API endpoints",
"paths": ["/path/to/project"],
"mode": "DEEP"
}
)
data = resp.json()
print(data["data"]["result"])
asyncio.run(search())JavaScript Client Example
const response = await fetch("http://localhost:8584/api/v1/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
query: "How does authentication work?",
paths: ["/path/to/project"],
mode: "DEEP"
})
});
const data = await response.json();
if (data.success) {
console.log(data.data.result);
}Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string |
required | Search query or question |
paths |
string[] |
required | Directories or files to search (min 1) |
mode |
string |
"DEEP" |
DEEP or FILENAME_ONLY |
max_depth |
int |
null |
Maximum directory depth |
top_k_files |
int |
null |
Number of top files to return |
keyword_levels |
int |
null |
Keyword granularity levels |
include_patterns |
string[] |
null |
File glob patterns to include |
exclude_patterns |
string[] |
null |
File glob patterns to exclude |
return_cluster |
bool |
false |
Return full KnowledgeCluster object |
Note:
FILENAME_ONLYmode does not require an LLM API key.DEEPmode requires a configured LLM.
How is this different from traditional RAG systems?
Sirchmunk takes an indexless approach:
- No pre-indexing: Direct file search without vector database setup
- Self-evolving: Knowledge clusters evolve based on search patterns
- Multi-level retrieval: Adaptive keyword granularity for better recall
- Evidence-based: Monte Carlo sampling for precise content extraction
What LLM providers are supported?
Any OpenAI-compatible API endpoint, including (but not limited too):
- OpenAI (GPT-4, GPT-4o, GPT-3.5)
- Local models served via Ollama, llama.cpp, vLLM, SGLang etc.
- Claude via API proxy
How do I add documents to search?
Simply specify the path in your search query:
result = await searcher.search(
query="Your question",
paths=["/path/to/folder", "/path/to/file.pdf"]
)No pre-processing or indexing required!
Where are knowledge clusters stored?
Knowledge clusters are persisted in Parquet format at:
{SIRCHMUNK_WORK_PATH}/.cache/knowledge/knowledge_clusters.parquet
You can query them using DuckDB or the KnowledgeManager API.
How do I monitor LLM token usage?
- Web Dashboard: Visit the Monitor page for real-time statistics
- API:
GET /api/v1/monitor/llmreturns usage metrics - Code: Access
searcher.llm_usagesafter search completion
- Text-retrieval from raw files
- Knowledge structuring & persistence
- Real-time chat with RAG
- Web UI support
- Web search integration
- Multi-modal support (images, videos)
- Distributed search across nodes
- Knowledge visualization and deep analytics
- More file type support
We welcome contributions !
This project is licensed under the Apache License 2.0.
ModelScope · ⭐ Star us · 🐛 Report a bug · 💬 Discussions
✨ Sirchmunk: Raw data to self-evolving intelligence, real-time.



