tinyMem gives small and medium language models (7Bβ13B) reliable long-term memory in complex codebases. It sits between you and the LLM, injecting verified context and capturing validated factsβall locally, without model retraining or cloud dependencies.
- Purpose
- Key Features
- Quick Start
- Installation
- Usage
- The Ralph Loop
- Integration
- Architecture
- Token Economics
- Configuration
- Development
- Contributing
- License
If you've ever used an AI for a large project, you know it eventually starts to "forget." It forgets which database you chose, it forgets the naming conventions you agreed on, and it starts making things up (hallucinating).
tinyMem is a "Hard Drive for your AI's Brain."
Instead of the AI trying to remember everything in its limited "short-term memory" (the chat window), tinyMem saves important facts and decisions to a local database on your computer. When the AI needs to answer a question or write code, tinyMem "reminds" it of the relevant facts.
- No more repeating yourself: "Remember, we use Go for the backend."
- No more AI hallucinations: If the AI isn't sure, it checks its memory.
- Total Privacy: Your project data never leaves your machine to "train" a model.
- Evidence-Based Truth: Typed memories (
fact,claim,decision, etc.). Only verified claims become facts. - Chain-of-Verification (CoVe): Optional LLM-based quality filter to reduce hallucinations before storage.
- Local & Private: Runs as a single binary. Data lives in
.tinyMem/. - Zero Configuration: Works out of the box.
- Dual Mode: Works as an HTTP Proxy or Model Context Protocol (MCP) server.
- Hybrid Search: FTS (lexical) + Optional Semantic Search.
- Recall Tiers: Prioritizes
Always(facts) >Contextual(decisions) >Opportunistic(notes).
Get up and running in seconds.
Go to your project root and initialize the memory database:
cd /path/to/your/project
tinymem healthStart the server (choose one mode):
Option A: Proxy Mode (for generic LLM clients)
tinymem proxy
# Then point your client (e.g., OpenAI SDK) to http://localhost:8080/v1Option B: MCP Mode (for Claude Desktop, Cursor, VS Code)
tinymem mcp
# Configure your IDE to run this commandSee the Quick Start Guide for Beginners for a detailed walkthrough.
Download from the Releases Page.
macOS / Linux:
curl -L "https://github.com/andrzejmarczewski/tinyMem/releases/latest/download/tinymem-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m)" -o tinymem
chmod +x tinymem
sudo mv tinymem /usr/local/bin/Windows:
Download tinymem-windows-amd64.exe, rename to tinymem.exe, and add to your system PATH.
Requires Go 1.25.6+.
git clone https://github.com/andrzejmarczewski/tinyMem.git
cd tinyMem
./build/build.sh # macOS/Linux
# or
.\build\build.bat # WindowsThe tinyMem CLI is your primary way to interact with the system from your terminal.
| Command | What it is | Why use it? | Example |
|---|---|---|---|
health |
System Check | To make sure tinyMem is installed correctly and can talk to its database. | tinymem health |
stats |
Memory Overview | To see how many memories you've stored and how your tasks are progressing. | tinymem stats |
dashboard |
Visual Status | To get a quick, beautiful summary of your project's memory "health." | tinymem dashboard |
query |
Search | To find specific information you or the AI saved previously. | tinymem query "API" |
recent |
Recent History | To see the last few things tinyMem learned or recorded. | tinymem recent |
write |
Manual Note | To tell the AI something important that it should never forget. | tinymem write --type decision --summary "Use Go 1.25" |
run |
Command Wrapper | To run a script or tool (like make or npm test) while "reminding" it of project context. |
tinymem run make build |
proxy / mcp |
Server Modes | To start the "brain" that connects tinyMem to your IDE or AI client. | tinymem mcp |
doctor |
Diagnostics | To fix the system if it stops working or has configuration issues. | tinymem doctor |
addContract |
Agent Setup | To automatically configure your AI agents to use tinyMem properly. | tinymem addContract |
Think of writing memories as "tagging" reality for the AI.
# Record a decision so the AI doesn't suggest an alternative later
tinymem write --type decision --summary "Switching to REST" --detail "GraphQL was too complex for this scale."
# Add a simple note for yourself or the AI
tinymem write --type note --summary "The database password is in the vault, not .env"| Type | Evidence Required? | Truth State | Recall Tier |
|---|---|---|---|
| Fact | β Yes | Verified | Always |
| Decision | β Yes (Confirmation) | Asserted | Contextual |
| Constraint | β Yes | Asserted | Always |
| Claim | β No | Tentative | Contextual |
| Plan | β No | Tentative | Opportunistic |
Evidence types supported: file_exists, grep_hit, cmd_exit0, test_pass.
The Ralph Loop (memory_ralph) is a deterministic governor for autonomous codebase repair. It is not automatic; the AI must explicitly choose to "engage" it when it detects a complex failure that requires iterative fixing. Once triggered, tinyMem takes control, iterating until evidence passes or limits are reached.
| Phase | Action | Purpose |
|---|---|---|
| Execute | Run Command | Executes the target verification (e.g., go test). |
| Evidence | Validate | Checks predicates (test_pass, file_exists). |
| Recall | Search | Retrieves failure patterns from long-term memory. |
| Repair | Apply Fix | tinyMem's internal LLM applies code changes. |
- Evidence is King: Only successful evidence checks can terminate the loop.
- Safety First: Supports path blacklisting (
forbid_paths) and command blocking. - Durable Memory: The loop results are stored even if the agent is reset.
Intercepts standard OpenAI-compatible requests.
export OPENAI_API_BASE_URL=http://localhost:8080/v1
# Your existing scripts now use tinyMem automaticallyWhile proxying, tinyMem now reports recall activity back to the client so that downstream UIs or agents can show βmemory checkedβ indicators:
- Streaming responses append an SSE event of type
tinymem.memory_statusonce the upstream LLM finishes. The payload includesrecall_count,recall_status(none/injected/failed), and a timestamp. - Non-streaming responses carry the same data via new headers:
X-TinyMem-Recall-StatusandX-TinyMem-Recall-Count. Agents or dashboards that read those fields can display whenever recall was applied or when the proxy skipped it.
Compatible with Claude Desktop, Cursor, and other MCP clients.
Claude Desktop Configuration (claude_desktop_config.json):
{
"mcpServers": {
"tinymem": {
"command": "/absolute/path/to/tinymem",
"args": ["mcp"]
}
}
}Run ./verify_mcp.sh to validate your setup.
When tinyMem is running in MCP mode, your AI agent (like Claude or Gemini) gains these "superpowers":
memory_query: Search the past. The AI uses this to find facts, decisions, or notes related to its current task.memory_recent: Get up to speed. The AI uses this when it first starts to see what has happened recently in the project.memory_write: Learn something new. The AI uses this to save a new fact or decision it just discovered or made. Facts require "Evidence" (like checking if a file exists).memory_ralph: Self-Repair. This is the "Nuclear Option." The AI uses this to try and fix a bug autonomously by running tests, reading errors, and retrying until it works.memory_stats&memory_health: System Check. The AI uses these to check if its memory is working correctly or how much it has learned.memory_doctor: Self-Diagnosis. If the AI feels "confused" or senses memory issues, it can run this to identify problems.
CRITICAL: If you are building an AI agent, you MUST include the appropriate directive in its system prompt to ensure it uses tinyMem correctly.
Quick Setup: Run tinymem addContract to automatically create these files in your project.
- Claude:
docs/agents/CLAUDE.md - Gemini:
docs/agents/GEMINI.md - Qwen:
docs/agents/QWEN.md - Other:
docs/agents/AGENT_CONTRACT.md
flowchart TD
User[LLM Client / IDE] <-->|Request/Response| Proxy[TinyMem Proxy / MCP]
subgraph "1. Recall Phase"
Proxy --> Recall[Recall Engine]
Recall -->|FTS + Semantic| DB[(SQLite)]
Recall -->|Filter| Tiers{Recall Tiers}
Tiers -->|Always/Contextual| Context[Context Injection]
end
subgraph "2. Extraction Phase"
LLM[LLM Backend] -->|Stream| Proxy
Proxy --> Extractor[Extractor]
Extractor -->|Parse| CoVe{CoVe Filter}
CoVe -->|High Conf| Evidence{Evidence Check}
Evidence -->|Verified| DB
end
Context --> LLM
.
βββ .tinyMem/ # Project-scoped storage (DB, logs, config)
βββ assets/ # Logos and icons
βββ build/ # Build scripts
βββ cmd/ # Application entry points
βββ docs/ # Documentation & Agent Contracts
βββ internal/ # Core logic (Memory, Evidence, Recall)
βββ README.md # This file
tinyMem uses more tokens per minute but significantly fewer tokens per task compared to standard agents.
| Feature | Token Impact | Why? |
|---|---|---|
| Recall Engine | π Saves | Replaces "Read All Files" with targeted context snippets. |
| Context Reset | π Saves | Prevents chat history from snowballing by starting iterations fresh. |
| Truth Discipline | π Saves | Stops expensive "hallucination rabbit holes" before they start. |
| Ralph Loop | π Uses | Requires multiple internal completions to reach autonomous success. |
The Verdict: tinyMem acts as a "Sniper Rifle" for context. By ensuring the few tokens sent are the correct ones, it avoids the massive waste of re-reading files and un-breaking hallucinated code.
Zero-config by default. Override in .tinyMem/config.toml:
[recall]
max_items = 10
semantic_enabled = false # Set true if you have an embedding model
[cove]
enabled = true # Chain-of-Verification
confidence_threshold = 0.6See Configuration Docs for details.
# Run tests
go test ./...
# Build
./build/build.shSee Task Management for how we track work.
We value truth and reliability.
- Truth Discipline: No shortcuts on verification.
- Streaming: No buffering allowed.
- Tests: Must pass
go test ./....
See CONTRIBUTING.md.
MIT Β© 2026 Andrzej Marczewski
