A fully local, privacy-first multi-agent system built with LangGraph and Ollama. This project features an Orchestrator agent that autonomously evaluates prompts and routes queries either directly to a Summarizer or to a Web Search worker for live data extraction.
- Orchestrator Node (
gemma4-fast): Evaluates the user's terminal input to decide whether to answer directly or to generate an optimized keyword-rich query for live web searching. - Web Search Worker Node: Uses
ddgs(DuckDuckGo API) to fetch the top 5 highly relevant text snippets for current events and real-time facts securely and without hitting bot protections. - Summarizer Worker Node (
llama3.2:1b): Ingests the search snippets and constructs a concise, accurate final response.
- Ensure Ollama is installed and serving models locally.
- Ensure you have the necessary models installed.
To download the fast summarization model, simply run the pull command in your terminal:
ollama pull llama3.2:1bTo drastically speed up routing decisions, we build a customized gemma4-fast model that disables the base model's token-heavy "reasoning/thinking" steps.
- Ensure you have pulled the base model first:
ollama pull gemma4:e2b - Create a file named
Modelfile_GemmaFastin your project directory with the following exact content:FROM gemma4:e2b SYSTEM """You are an expert, direct assistant. You must provide the final answer immediately. DO NOT use <think> tags, DO NOT output internal reasoning, and DO NOT brainstorm before answering. Output only the final response.""" - Build the lightning-fast router model by running this command:
ollama create gemma4-fast -f Modelfile_GemmaFast
langgraph&langchain-core: Foundation for creating the multi-agent state machine.langchain-ollama: Interface enabling seamless communication with our local Ollama engines.ddgs(DuckDuckGo Search): A highly reliable internet querying library used to fetch rapid text snippets and bypass anti-bot scrapers.
AgentState: ATypedDictmodule tracking data as it flows between agents (input_text,search_query, etc).orchestrator(state): The routing brain. Usesgemma4-fastto decide if the query requires live web facts, and if so, translates natural dialogue into an optimized keyword query (e.g.,"SEARCH: UEFA scores 2026").web_search(state): The search worker. Connects toddgs, extracts the top 5 direct search engine text snippets, and injects them into the state.route_decision(state): LangGraph conditional edge logic mapping the Orchestrator's decision string to the correct Graph node.summarize(state): The synthesizer. Feeds your query (along with any extracted search snippets) intollama3.2:1bto compile an elegant, truthful answer.
Install the required Python packages (e.g., LangGraph, internal DDGS integrations, LangChain Ollama):
pip install -r requirements.txtRun the application locally via the terminal:
python main.pyThe app will prompt you for an input. Try asking for live data like "Current US inflation rate 2026" (which triggers the Web Search node) versus abstract general knowledge questions (which bypasses the search entirely).