Adoit · Adoit · Mar 10, 2026 · Mar 10, 2026
diff --git a/README.md b/README.md
@@ -1 +1,346 @@
-# open-data-agent
+# Open Data Agent
+
+A local CLI tool (`oda`) that lets developers, data analysts, data scientists, and AI agents query databases safely — with auto-generated schema context, read-only enforcement, and self-healing diagnostics.
+
+---
+
+## What is Open Data Agent?
+
+Open Data Agent (`oda`) is a local command-line tool that bridges the gap between natural language questions and SQL databases. It is designed to be used by:
+
+- **Data analysts and data scientists** doing exploratory data analysis (EDA) — query any connected database without memorising schema details
+- **Developers** running ad-hoc SQL safely, with automatic LIMIT injection and query history
+- **AI coding agents** (primarily [OpenCode](https://opencode.ai)) — `oda` acts as a structured, read-only database interface that an agent can call as a tool, using auto-generated schema documentation as context
+
+Key properties:
+
+- **Read-only by default** — write operations are hard-blocked with no bypass mechanism
+- **Schema docs as context** — human- and agent-readable markdown catalog generated from your live schema
+- **Self-healing diagnostics** — zero-row results and errors emit structured diagnostic context (row counts, sample values, NULL counts) so an AI agent can self-correct and retry
+- **Local-first** — no server, no network service, all data on disk
+- **Multi-dialect** — supports PostgreSQL, MySQL, and SQLite
+
+---
+
+## How It Works
+
+### End-to-End Flow
+
+```
+Question (natural language or SQL)
+  │
+  ├─ [AI agent] reads .opencode/rules/data-agent.md       ← generated by `oda connect`
+  ├─ [AI agent] reads docs/data-catalog/_index.md         ← generated by `oda docs generate`
+  ├─ [AI agent] navigates to relevant table docs
+  ├─ [AI agent] checks memory/ for known data quirks
+  │
+  └─ oda query "SELECT ..."
+        │
+        ├─ SafetyChecker       read-only whitelist, dangerous pattern detection
+        ├─ LIMIT injection     auto-appends or clamps LIMIT (default 1000, max 10000)
+        ├─ Timeout             server-side (PostgreSQL/MySQL) or thread-based (SQLite)
+        ├─ Execution           results printed to stdout
+        └─ History log         appended to ~/.config/open-data-agent/history.jsonl
+              │
+              └─ (zero rows or error)
+                    └─ DiagnosticEngine → structured context to stderr → agent retries
+```
+
+### Core Components
+
+**1. Schema Catalog**
+
+`oda docs generate` introspects your connected database and produces a hierarchical markdown catalog under `docs/data-catalog/`. Each table gets its own file with columns, types, nullability, sample rows, and (optionally) column statistics. This catalog is the primary context source — neither you nor the agent needs to run live introspection queries.
+
+**2. Memory Store**
+
+`memory/` contains curated markdown files (with YAML frontmatter) that capture data quirks, business logic, known anomalies, and tribal knowledge about your data. The agent checks memory before constructing a query.
+
+**3. Query Engine**
+
+Every query passes through a safety pipeline:
+- **Whitelist check** (primary guard) — only explicitly allowed prefixes pass: `SELECT`, `WITH`, `EXPLAIN`, and dialect-specific commands (`PRAGMA` for SQLite; `SHOW` for PostgreSQL and MySQL; `TABLE` for MySQL). Any token not on the list is blocked outright.
+- **Blacklist check** (secondary guard) — catches injection patterns inside otherwise-valid SQL: `INSERT`, `UPDATE`, `DELETE`, `DROP`, `CREATE`, `ALTER`, `TRUNCATE`, `REPLACE`, `MERGE`, `GRANT`, `REVOKE`, `CALL`, `EXEC`, `EXECUTE`, and dialect-specific patterns such as `COPY`, `LOAD DATA`, `ATTACH DATABASE`
+- **LIMIT injection** — if no LIMIT is present, one is appended automatically; if the limit exceeds the configured maximum, it is clamped
+- **Timeout** — queries are killed after a configurable timeout (default 30 seconds)
+- **History** — every query is appended to a local JSONL log with timing and metadata
+
+**4. Diagnostics**
+
+When a query returns zero rows or errors, `DiagnosticEngine` automatically emits structured context to `stderr`: table row counts, sample column values, and NULL counts for relevant columns. This gives an AI agent (or a developer) enough signal to self-correct and retry without running additional queries manually.
+
+### Using with OpenCode
+
+`oda` is designed as a first-class tool for the [OpenCode](https://opencode.ai) AI coding agent. When you run `oda connect <name>`, it renders a rules file at `.opencode/rules/data-agent.md` that tells OpenCode:
+
+- Which database is active and how to explore it
+- How to navigate the schema catalog
+- How to call `oda query`, `oda memory`, and `oda docs` commands
+- How to interpret diagnostic output and self-heal on failures
+
+This means **no extra LLM setup is required for natural language queries** — OpenCode reads your schema docs, understands your data, constructs the SQL, and calls `oda query` as a tool. The full NL→SQL→results loop works out of the box.
+
+See [Asking Questions in Natural Language with OpenCode](#asking-questions-in-natural-language-with-opencode) for a step-by-step walkthrough.
+
+---
+
+## How to Use It
+
+### Prerequisites
+
+- Python 3.12+
+- [`uv`](https://docs.astral.sh/uv/) — install with `curl -LsSf https://astral.sh/uv/install.sh | sh`
+
+### Install
+
+```bash
+git clone <repository-url>
+cd open-data-agent
+uv sync
+```
+
+### Initialise
+
+```bash
+uv run oda init
+```
+
+Creates `~/.config/open-data-agent/` with default config files.
+
+### Add a Connection
+
+```bash
+uv run oda connections add
+```
+
+Prompts for connection name, database type (postgresql / mysql / sqlite), host, port, database name, username, and password. Passwords are stored in the OS keychain where available (macOS Keychain, GNOME Keyring, Windows Credential Manager). On headless or CI environments without a keychain backend, passwords fall back to plaintext in `~/.config/open-data-agent/connections.yaml` with a warning.
+
+### Activate a Connection
+
+```bash
+uv run oda connect <name>
+```
+
+Sets the active connection and renders `.opencode/rules/data-agent.md` for OpenCode.
+
+> **Note:** `oda connect` writes the rules file relative to the current working directory. Always run this command from your project root, otherwise the file will be created in the wrong location.
+
+### Generate Schema Docs
+
+```bash
+uv run oda docs generate
+
+# Include column statistics (null counts, distinct counts, min/max):
+uv run oda docs generate --enrich
+```
+
+### Run a Query
+
+```bash
+uv run oda query "SELECT * FROM orders LIMIT 10"
+
+# Output as JSON or CSV:
+uv run oda query "SELECT * FROM orders" --format json
+uv run oda query "SELECT * FROM orders" --format csv
+```
+
+### Command Reference
+
+| Command | Description |
+|---|---|
+| `oda init` | First-run setup: create config directories and default files |
+| `oda connect <name>` | Activate a connection; render OpenCode rules file |
+| `oda connections list` | List all configured connections |
+| `oda connections add` | Add a new connection (interactive) |
+| `oda connections remove <name>` | Remove a connection |
+| `oda connections test <name>` | Test live connectivity |
+| `oda schemas` | List schemas in the active database |
+| `oda tables [<schema>]` | List tables (optional schema filter) |
+| `oda describe <table>` | Show columns and types |
+| `oda sample <table> [--n N]` | Show N sample rows (default 5) |
+| `oda profile <table>` | Column statistics: null count, distinct, min, max |
+| `oda query "<sql>"` | Execute SQL; auto-logged to history |
+| `oda docs generate [--enrich]` | Generate schema documentation catalog |
+| `oda docs status` | Show freshness report for schema docs |
+| `oda memory list` | List memory entries |
+| `oda memory add` | Add a memory entry |
+| `oda memory search <term>` | Search memory entries |
+| `oda history list [--n N]` | Show most recent N query history entries |
+| `oda history search <term>` | Search query history |
+| `oda history stats` | Query history statistics |
+
+### Configuration
+
+Global config lives at `~/.config/open-data-agent/config.yaml`:
+
+```yaml
+row_limit: 1000            # default LIMIT auto-injected if absent
+max_row_limit: 10000       # hard ceiling; never exceeded
+query_timeout_seconds: 30  # query execution timeout
+docs_staleness_days: 7     # warn if schema docs older than this
+log_level: INFO
+strict_mode: false         # if true: block queries when docs are stale
+                           # equivalent to passing --strict on every oda query call
+```
+
+### Development
+
+```bash
+# Unit tests (no external database required)
+uv run pytest tests/unit/ -q
+
+# Lint and format
+uv run ruff check .
+uv run ruff format --check .
+
+# Type check
+uv run mypy src/open_data_agent
+
+# Integration tests (requires Docker)
+docker compose up -d
+uv run pytest -m integration
+docker compose down
+```
+
+---
+
+## Asking Questions in Natural Language with OpenCode
+
+[OpenCode](https://opencode.ai) is an AI coding agent that runs in your terminal. When paired with `oda`, it acts as a natural language interface to your database — you ask a question in plain English, OpenCode reads your schema docs, writes the SQL, and calls `oda query` to execute it. No extra LLM configuration or API keys are needed beyond your OpenCode setup.
+
+### Prerequisites
+
+- OpenCode installed and configured (`opencode` available in your PATH)
+- `oda` set up with a connection and schema docs generated (see [How to Use It](#how-to-use-it))
+
+### Step 1 — Connect and generate schema docs
+
+```bash
+# Run from your project root — oda connect writes .opencode/rules/data-agent.md
+# relative to the current working directory
+uv run oda connect my-db
+
+# Generate the schema catalog OpenCode will use as context
+uv run oda docs generate
+```
+
+After this, `docs/data-catalog/` contains a markdown file for every table in your database. OpenCode reads these files to understand your schema without running any live introspection queries.
+
+### Step 2 — Open OpenCode in your project
+
+```bash
+opencode
+```
+
+OpenCode automatically loads `.opencode/rules/data-agent.md` on startup. This file tells it which database is active, how to navigate the schema catalog, and how to use `oda` commands.
+
+### Step 3 — Ask a question in natural language
+
+Type your question directly in the OpenCode chat. Examples:
+
+```
+How many orders were placed last month, broken down by status?
+```
+
+```
+Which customers have spent more than $10,000 in total?
+```
+
+```
+Show me the top 10 products by revenue in Q1 2025.
+```
+
+OpenCode will:
+
+1. Read the relevant table docs from `docs/data-catalog/`
+2. Check `memory/` for any known data quirks affecting those tables
+3. Construct a safe, read-only SQL query
+4. Call `oda query "..."` to execute it
+5. Present the results — and if zero rows are returned, use the diagnostic output to self-correct and retry
+
+### Step 4 — Refine with follow-up questions
+
+You can ask follow-up questions in the same session:
+
+```
+Filter that to just the 'enterprise' customer segment.
+```
+
+```
+Now group by region instead of status.
+```
+
+```
+Export that as CSV.
+```
+
+OpenCode maintains conversation context, so follow-ups build on the previous query.
+
+### Step 5 — Save data knowledge to memory
+
+When you or OpenCode discovers something important about your data (a quirky column, a misleading field name, a known data quality issue), save it so future sessions benefit:
+
+```bash
+uv run oda memory add --title "Revenue column" \
+  --category data_quality \
+  --content "Use net_item_price not item_price — item_price includes tax"
+```
+
+Or ask OpenCode to do it:
+
+```
+Remember that the revenue column to use is net_item_price, not item_price.
+```
+
+### How OpenCode self-heals
+
+If a query returns zero rows, `oda query` automatically emits structured diagnostics to stderr:
+
+- Row counts for each table referenced in the SQL
+- Sample values for filter columns (e.g. `Column 'status' sample values: ['active', 'pending']`)
+- NULL counts for filter columns
+
+OpenCode reads this output and retries with a corrected query, without any input from you.
+
+### Keeping schema docs fresh
+
+If your database schema changes, regenerate the catalog before asking questions:
+
+```bash
+uv run oda docs generate
+
+# Check freshness at any time:
+uv run oda docs status
+```
+
+Use `--strict` mode to block queries against stale docs:
+
+```bash
+uv run oda query "SELECT ..." --strict
+```
+
+---
+
+## License
+
+MIT License
+
+Copyright (c) 2026 Adoit
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.