Skip to content

Latest commit

 

History

History
298 lines (201 loc) · 9.42 KB

File metadata and controls

298 lines (201 loc) · 9.42 KB

dryrun - Tutorial

dryrun gives your AI assistant (or CLI) full visibility into your PostgreSQL schema, query plans, and migration risks. This tutorial covers three workflows, pick the one that fits.

Build

go build -o bin/dryrun ./cmd/dryrun

Binary: bin/dryrun


Part A: Offline workflow (no database needed)

This is the recommended starting point for evaluation. Someone dumps the schema once, everyone else works from the JSON file.

1. Get a schema file

Either from a teammate, CI, or dump it yourself:

export DATABASE_URL="postgres://readonly_user:pass@host:5432/your_db"
dryrun dump-schema --source "$DATABASE_URL" --pretty --name "production" -o schema.json

2. Import it

dryrun import schema.json

This validates the JSON and copies it to .dryrun/schema.json. Add .dryrun/ to .gitignore.

3. Lint

dryrun lint

Checks naming conventions, PK types, varchar vs text, timestamps, FK indexes, etc. Works entirely from the saved snapshot.

# JSON output for CI
dryrun lint --json --pretty

# filter to one schema
dryrun lint --schema-name public

Customize via dryrun.toml:

[conventions]
table_name = "snake_singular"
require_timestamps = true
prefer_text_over_varchar = true

[conventions.disabled_rules]
rules = ["naming/index_pattern"]

4. MCP server (offline)

Install in Claude Code, it auto-discovers .dryrun/schema.json:

claude mcp add dryrun -- dryrun mcp-serve

No DB credentials needed. Available tools: list_tables, describe_table, search_schema, find_related, validate_query, check_migration, lint_schema.

Not available without a live DB: explain_query, advise, check_drift.


Part B: Online workflow (live database)

For full capabilities including EXPLAIN and schema refresh.

1. Probe the connection

export DATABASE_URL="postgres://readonly_user:pass@host:5432/your_db"
dryrun probe --db "$DATABASE_URL"

Expected output:

PostgreSQL 16.3
  PostgreSQL 16.3 on x86_64-pc-linux-gnu ...
Privileges:
  pg_catalog:           ok
  information_schema:   ok
  pg_stat_user_tables:  ok

2. Initialize

dryrun init --db "$DATABASE_URL"

Creates dryrun.toml, the .dryrun/ directory, and .dryrun/schema.json. Snapshot history lives in ~/.dryrun/history.db (shared across projects, keyed by (project_id, database_id)). If a data-masking-policy.yml resolves, init masks planner stats in-process before writing; for projects with PII, set require_masks = true in dryrun.toml to fail closed when the policy is missing. See SECURITY.md.

3. Snapshots and diffing

dryrun snapshot take --db "$DATABASE_URL"          # saves to history
dryrun snapshot list --db "$DATABASE_URL"          # show all snapshots
dryrun snapshot diff --db "$DATABASE_URL" --latest --pretty  # diff last saved vs live

4. Profiles

Instead of passing --db every time:

# dryrun.toml
[default]
profile = "development"

[profiles.development]
db_url = "${DEV_DATABASE_URL}"

[profiles.staging]
schema_file = ".dryrun/staging-schema.json"

[profiles.production]
db_url = "${PROD_DATABASE_URL}"
dryrun profile list
dryrun --profile staging lint

5. MCP server (live)

claude mcp add dryrun -- env DATABASE_URL=postgres://user:pass@host:5432/db dryrun mcp-serve

All tools available including EXPLAIN ANALYZE (runs in rolled-back transactions, safe on read replicas).


Part C: Multi-node workflow

For setups with one primary and N replicas serving different query patterns. Activity counters (seq_scan, idx_scan, n_dead_tup) differ per node and only live where the queries actually run, on the replicas. dryrun captures schema + planner stats from the primary and activity stats from each replica, then aggregates them.

A snapshot is split into three rows in ~/.dryrun/history.db: schema, planner_stats, activity_stats. snapshot take writes all three from the primary, with the activity row labeled primary. snapshot activity writes one additional activity_stats row per replica, tagged with --label. Planner stats are masked per data-masking-policy.yml at capture time; see SECURITY.md.

1. Schema + planner + activity from the primary

dryrun --profile primary snapshot take

Refuses to run on a standby. Writes schema (DDL), planner_stats (reltuples, relpages, pg_statistic; masked per policy), and one activity_stats row labeled primary.

2. Activity stats from each replica

dryrun --profile replica1 snapshot activity --from "$REPLICA1_URL" --label replica1
dryrun --profile replica2 snapshot activity --from "$REPLICA2_URL" --label replica2
dryrun --profile replica3 snapshot activity --from "$REPLICA3_URL" --label replica3

--label is required and identifies the node in compare_nodes and detect. snapshot activity refuses to run on the primary. Activity rows attach to the most recent schema row by schema_ref_hash; pass --allow-orphan to capture before a schema exists.

3. Define profiles for repeatable runs

# dryrun.toml
[project]
id = "myapp"

[profiles.primary]
db_url = "${PRIMARY_DATABASE_URL}"

[profiles.replica1]
db_url = "${REPLICA1_DATABASE_URL}"

[profiles.replica2]
db_url = "${REPLICA2_DATABASE_URL}"

4. Cron

Schema changes rarely; activity counters shift daily. Capture each on its own schedule:

# /etc/cron.d/dryrun-stats
0  2 * * * app dryrun --profile primary  snapshot take
15 2 * * * app dryrun --profile replica1 snapshot activity --from "$REPLICA1_URL" --label replica1
15 2 * * * app dryrun --profile replica2 snapshot activity --from "$REPLICA2_URL" --label replica2

5. Verify

dryrun snapshot list

Each row prints its kind (schema / planner_stats / activity_stats), node_label for activity rows, and the schema_ref_hash linking activity to schema. The MCP compare_nodes tool then exposes per-node idx_scan for any table.


Part D: MCP setup reference

Claude Code (recommended)

# offline (auto-discover .dryrun/schema.json)
claude mcp add dryrun -- dryrun mcp-serve

# offline (explicit schema file)
claude mcp add dryrun -- dryrun mcp-serve --schema-file /path/to/schema.json

# live database
claude mcp add dryrun -- env DATABASE_URL=postgres://user:pass@host:5432/db dryrun mcp-serve

# project-scope (creates .mcp.json, shared with the team via version control)
claude mcp add --scope project dryrun -- dryrun mcp-serve

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "dryrun": {
      "command": "dryrun",
      "args": ["mcp-serve", "--schema-file", "/path/to/schema.json"]
    }
  }
}

SSE mode (remote / Docker)

DATABASE_URL="$DB" dryrun mcp-serve --transport sse --port 3000

Connect your MCP client to http://host:3000/sse.


Part E: Tool reference

Tool Needs DB? Description
list_tables No List all tables with row estimates and comments
describe_table No Full table detail: columns, constraints, indexes, stats
search_schema No Search across table/column names, comments, constraints
find_related No Foreign key relationships with sample JOIN patterns
validate_query No Parse SQL, check table/column existence, detect anti-patterns
check_migration No Migration safety: lock types, rewrite risk, safe alternatives
lint_schema No Convention checks: naming, types, constraints, timestamps
schema_diff No* Compare snapshots for schema changes
vacuum_health No Autovacuum analysis with effective settings and recommendations
detect No Health checks: stale stats, unused indexes, seq-scan anomalies
compare_nodes No Per-node breakdown for a specific table with anomaly detection
analyze_plan No Analyze a pre-existing EXPLAIN JSON plan
advise Hybrid Comprehensive query analysis: EXPLAIN + anti-patterns + index suggestions
explain_query Yes EXPLAIN with structured plan and warnings
check_drift Yes Compare live database schema against saved snapshot

* schema_diff needs snapshot history; without live DB it compares saved snapshots only.


Part F: Troubleshooting

"connection refused" - Check your connection string. If using Docker, PG host may differ from localhost.

"permission denied for pg_stat_user_tables" - Grant pg_monitor to your user:

GRANT pg_monitor TO your_readonly_user;

EXPLAIN ANALYZE times out - The query actually runs (rolled back). Use analyze=false (default) for cost estimates only.

Schema is stale - Ask Claude to "refresh the schema", or re-run init / import.

MCP connection issues - Server logs to stderr, MCP protocol to stdout. For SSE mode, test with curl http://host:port/sse.

"invalid schema JSON" - The file must be a valid SchemaSnapshot. If you renamed fields or edited by hand, re-dump from the database.

Multi-node stats not showing - Run dryrun snapshot list and confirm you see a schema row plus activity_stats rows for each node sharing the same schema_ref_hash. The primary contributes its own activity_stats row (labeled primary) via snapshot take; each replica contributes one via snapshot activity --label .... Activity captured before any schema exists needs --allow-orphan and won't reattach automatically.