Pi package with custom extensions, themes, and configurations for the Pi Coding Agent
VTSTech • Website • Extensions • Themes • Install
A Pi package containing extensions, themes, and configuration for the Pi Coding Agent. These tools are built and optimized for running Pi on resource-constrained environments such as Google Colab (CPU-only, 12GB RAM) with Ollama serving small local models (0.3B–2B parameters), as well as with cloud providers like OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, and more.
Everything here is battle-tested on real hardware with real models — from small local Ollama models on budget machines to cloud providers via OpenRouter.
pi install git:github.com/VTSTech/pi-coding-agentPi clones the repo, auto-discovers the extensions/ and themes/ directories, and loads everything automatically. Restart Pi and you're done.
Update to the latest version:
pi updatePin to a specific tag:
pi install git:github.com/VTSTech/pi-coding-agent@v1.2.0Each extension is published separately to npm. Install only what you need:
# Install individual extensions
pi install "npm:@vtstech/pi-api"
pi install "npm:@vtstech/pi-diag"
pi install "npm:@vtstech/pi-model-test"
pi install "npm:@vtstech/pi-ollama-sync"
pi install "npm:@vtstech/pi-openrouter-sync"
pi install "npm:@vtstech/pi-react-fallback"
pi install "npm:@vtstech/pi-security"
pi install "npm:@vtstech/pi-status"
# Or install everything as one bundle via GitHub
pi install git:github.com/VTSTech/pi-coding-agentAll extensions depend on
@vtstech/pi-sharedwhich is installed automatically as a dependency.
git clone https://github.com/VTSTech/pi-coding-agent.git
cd pi-coding-agent
cp extensions/*.ts ~/.pi/agent/extensions/
cp themes/*.json ~/.pi/agent/themes/
pi -c- Pi Coding Agent v0.66+ installed
- Ollama running locally or on a remote machine (for Ollama features)
- API key for any supported cloud provider (for cloud provider features)
This repo is a standard Pi package. The package.json contains a pi manifest that tells Pi where to find resources:
{
"name": "@vtstech/pi-coding-agent-extensions",
"version": "1.2.0",
"keywords": ["pi-package"],
"pi": {
"extensions": ["./extensions"],
"themes": ["./themes"]
}
}Pi auto-discovers from conventional directories (extensions/, themes/, skills/, prompts/) even without the manifest. The manifest is included for explicit declaration.
All extensions support remote Ollama instances out of the box — no extra configuration needed. The Ollama URL is resolved automatically from models.json:
models.json ollama provider baseUrl → OLLAMA_HOST env var → http://localhost:11434
This means you can:
- Run Ollama on a separate machine and tunnel it (e.g., Cloudflare Tunnel, Tailscale, SSH)
- Use
/ollama-sync https://your-tunnel-urlto sync models from a remote instance - The sync writes the remote URL back into
models.jsonso all other extensions (model-test,status,diag) automatically use it - Set
OLLAMA_HOSTas an environment variable fallback if nomodels.jsonconfig exists
Model testing and diagnostics work with cloud providers out of the box. The extensions auto-detect the active provider and adapt their behavior:
Supported providers (built-in registry):
| Provider | API Mode | Base URL |
|---|---|---|
| OpenRouter | openai-completions | https://openrouter.ai/api/v1 |
| Anthropic | anthropic-messages | https://api.anthropic.com |
| gemini | https://generativelanguage.googleapis.com |
|
| OpenAI | openai-completions | https://api.openai.com/v1 |
| Groq | openai-completions | https://api.groq.com |
| DeepSeek | openai-completions | https://api.deepseek.com |
| Mistral | openai-completions | https://api.mistral.ai |
| xAI | openai-completions | https://api.x.ai |
| Together | openai-completions | https://api.together.xyz |
| Fireworks | openai-completions | https://api.fireworks.ai/inference/v1 |
| Cohere | cohere-chat | https://api.cohere.com |
Provider detection uses a three-tier lookup: user-defined providers in models.json → built-in provider registry → unknown fallback.
Run a full system diagnostic of your Pi environment.
/diag
Checks:
- System — OS, CPU, RAM usage, uptime, Node.js version
- Disk — Disk usage via
df -h - Ollama — Running? Version? Response latency? Models pulled? Currently loaded in VRAM?
- models.json — Valid JSON? Provider config? Models listed? Cross-references with Ollama
- Settings — settings.json exists? Valid?
- Extensions — Extension files found? Active tools?
- Themes — Theme files? Valid JSON?
- Session — Active model? API mode? Provider? Base URL? Context window? Context usage? Thinking level?
- Security — Active security mode, effective blocklist sizes (mode-aware), command/SSRF/path validation tests, audit log status
Also registers a self_diagnostic tool so the AI agent can run diagnostics on command.
Test any model for reasoning, tool usage, and instruction following — works with Ollama and cloud providers.
/model-test # Test current Pi model (auto-detects provider)
/model-test qwen3:0.6b # Test a specific Ollama model
/model-test --all # Test every Ollama modelThe extension auto-detects whether the active model is on Ollama or a cloud provider (OpenRouter, Anthropic, Google, OpenAI, Groq, DeepSeek, Mistral, xAI, Together, Fireworks, Cohere) and runs the appropriate test suite.
| Test | Method | Scoring |
|---|---|---|
| Reasoning | Snail wall puzzle — "climbs 3ft/day, slides 2ft/night, 10ft wall" — answer (8) never appears in the prompt, preventing false positives. Answer extracted as the last number in the response. | STRONG / MODERATE / WEAK / FAIL |
| Thinking | Extended thinking/reasoning token support (<think tags or native API) — "Multiply 37 × 43" prompt |
SUPPORTED / NOT SUPPORTED |
| Tool Usage | Tool call generation — detects both native Ollama tool_calls API and JSON tool calls embedded in text responses |
STRONG / MODERATE / WEAK / FAIL |
| ReAct Parse | Text-based tool calling without native API — tests Action: / Action Input: pattern parsing |
STRONG / MODERATE / WEAK / FAIL |
| Instruction Following | Strict JSON output format compliance — 4 specific keys with typed values, automatic repair of truncated output | STRONG / MODERATE / WEAK / FAIL |
| Tool Support | Probes model for tool calling capability level (native API, ReAct text, or none) — cached for future runs | NATIVE / REACT / NONE |
| Test | Method | Scoring |
|---|---|---|
| Connectivity | Verifies API reachability and authentication — sends a ping request, expects a response within 30s | OK / FAIL |
| Reasoning | Same snail wall puzzle, sent via OpenAI-compatible chat completions API | STRONG / MODERATE / WEAK / FAIL |
| Instruction Following | Strict JSON output format compliance — 4 specific keys with typed values | STRONG / MODERATE / WEAK / FAIL |
| Tool Usage | Tool call generation using OpenAI function calling format | STRONG / MODERATE / WEAK / FAIL |
Ollama-specific tests (thinking, ReAct parsing, tool support cache, model metadata) are skipped for cloud providers.
Features:
- Automatic provider detection — classifies the active model as
ollama,builtin, orunknownusing a three-tier lookup (models.json → built-in registry → fallback) - Built-in provider registry — 11 known cloud providers with API modes, base URLs, and env var keys
- Calls Ollama
/api/chator cloud provider APIs directly — no Pi agent round-trip - Automatic remote Ollama URL — reads from
models.json, no manual config - Timeout resilience — 180s default with
--connect-timeout, auto-retry on empty responses and connection failures (handles flaky tunnels) - Rate limit delay — configurable delay (default 30s) between tests to avoid upstream rate limiting on free-tier providers
- Thinking model fallback — if a model returns empty without
think:true, automatically retries with thinking enabled (supports qwen3 and similar models) - Displays API mode — shows the active API mode (e.g.,
openai-completions,openai-responses) frommodels.json - Native context length — displays the model's true max context from Ollama
/api/show, not the configurednum-ctx - Retrieves model metadata (size, params, quantization, family) from
/api/tags - Auto-updates
models.jsonreasoning field based on thinking test results - Tool support cache — persistent cache at
~/.pi/agent/cache/tool_support.jsonavoids re-probing on every run - Text-based tool call detection — models that output tool call JSON as text (instead of using the native API) are still correctly identified and scored
- JSON repair — automatically fixes truncated JSON output (missing closing braces) from
num_predictlimits - Thinking token fallback — models that put reasoning in thinking tokens (e.g., qwen3) are detected even when
contentis empty - Complete response display — full model responses are shown with markdown code fences stripped for clean rendering
- Tab-completion for model names in the
/model-testcommand - Final recommendation: STRONG / GOOD / USABLE / WEAK
Sample output (cloud provider):
[model-test-report]
⚡ Pi Model Benchmark v1.2.0
Written by VTSTech
GitHub: https://github.com/VTSTech
Website: www.vts-tech.org
── MODEL: openai/gpt-oss-120b:free ─────────────────────────
ℹ️ Provider: openrouter (built-in)
ℹ️ API: openai-completions
ℹ️ Base URL: https://openrouter.ai/api/v1
ℹ️ API Key: ****d9ef
── CONNECTIVITY TEST ───────────────────────────────────────
ℹ️ Sending minimal request to verify API reachability and key validity...
ℹ️ Time: 1.9s
✅ API reachable and authenticated
── REASONING TEST ──────────────────────────────────────────
ℹ️ Prompt: A snail climbs 3ft up a wall each day, slides 2ft back
each night. Wall is 10ft. How many days?
ℹ️ Testing...
ℹ️ Waiting 30.0s to avoid rate limiting...
ℹ️ Time: 693ms
✅ Answer: 8 — Correct with clear reasoning (STRONG)
ℹ️ Response: The snail gains a net of (3 - 2 = 1) foot each
full day-night cycle.
- After 7 full days (and nights) it has risen (7 × 1 = 7) feet.
- At the start of the 8th day it is 7 feet up. It climbs 3 feet
during that day, reaching (7 + 3 = 10) feet, the top of the
wall. Once it reaches the top it does not slide back.
Thus, the snail reaches the top on the 8th day.
ANSWER: 8
── INSTRUCTION FOLLOWING TEST ──────────────────────────────
ℹ️ Prompt: Respond with ONLY a JSON object with keys: name,
can_count, sum (15+27), language
ℹ️ Testing...
ℹ️ Waiting 30.0s to avoid rate limiting...
ℹ️ Time: 525ms
✅ JSON output valid with correct values (STRONG)
ℹ️ Output: {"name":"ChatGPT","can_count":true,
"sum":42,"language":"English"}
── TOOL USAGE TEST ─────────────────────────────────────────
ℹ️ Prompt: "What's the weather in Paris?" (with get_weather
tool available)
ℹ️ Testing...
ℹ️ Waiting 30.0s to avoid rate limiting...
ℹ️ Time: 518ms
✅ Tool call: get_weather({"location":"Paris",
"unit":"celsius"}) (STRONG)
── SKIPPED TESTS (OLLAMA-ONLY) ─────────────────────────────
⚠️ Thinking test — Ollama-specific think:true option
⚠️ ReAct parsing test — only relevant for Ollama models
⚠️ Tool support detection — Ollama-specific tool support cache
⚠️ Model metadata — Ollama-specific /api/tags endpoint
── SUMMARY ─────────────────────────────────────────────────
✅ Connectivity: OK
✅ Reasoning: STRONG
✅ Instructions: STRONG
✅ Tool Usage: STRONG
ℹ️ Total time: 1.7m
ℹ️ Score: 4/4 tests passed
── RECOMMENDATION ──────────────────────────────────────────
✅ openai/gpt-oss-120b:free is a STRONG model via openrouter
Runtime switching of API modes, base URLs, thinking settings, and compat flags in models.json.
Supports all 10 Pi API modes:
anthropic-messages · openai-completions · openai-responses · azure-openai-responses · openai-codex-responses · mistral-conversations · google-generative-ai · google-gemini-cli · google-vertex · bedrock-converse-stream
/api # Show current provider config (mode, URL, compat flags)
/api mode <mode> # Switch API mode (partial match supported)
/api url <url> # Switch base URL
/api think on|off|auto # Toggle thinking for all models in provider
/api compat <key> # View compat flags
/api compat <key> <val> # Set compat flag
/api modes # List all 10 supported API modes
/api providers # List all configured providers
/api reload # Hint to run /reloadFeatures:
- Partial mode matching —
/api mode openai-rmatchesopenai-responses - Auto-detect local provider — targets the first
localhost/ollamaprovider by default - Batch thinking toggle — set
reasoning: true/falseacross all models at once - Compat flag management — get/set
supportsDeveloperRole,thinkingFormat,maxTokensField, etc. - Tab-completion for sub-commands
Command, path, and network security layer for Pi's tool execution with a configurable security mode.
Automatically loaded — protects against:
- Partitioned command blocklist — 41 CRITICAL commands (always blocked: system modification, privilege escalation, network attacks, shell escapes) + 25 EXTENDED commands (blocked in max mode: package management, process control, development tools)
- Mode-aware SSRF protection — 22 ALWAYS_BLOCKED URL patterns (loopback, RFC1918 private ranges, cloud metadata endpoints) + 7 MAX_ONLY patterns (localhost by name, broadcast, link-local, current network) that are allowed in basic mode
- Security mode toggle — switch between
basicandmaxmodes at runtime; persisted to~/.pi/agent/security.json - Path validation — prevents filesystem escape and access to critical system directories; symlinks are dereferenced via
fs.realpathSync()to block/tmp/evil → /etc/passwdbypasses - Shell injection detection — regex patterns for command chaining, substitution, and redirection
- Audit logging — JSON-lines audit log at
~/.pi/agent/audit.logwith security mode recorded per entry (path exported asAUDIT_LOG_PATH)
/security mode basic # Relaxed mode — CRITICAL commands blocked, localhost URLs allowed
/security mode max # Full lockdown — all 66 commands blocked, strict SSRFDefault mode: max — if security.json doesn't exist, the extension starts in max mode and creates it on first use. The current mode is displayed in the status bar (SEC:BASIC or SEC:MAX).
Text-based tool calling bridge for models without native function calling support.
Automatically loaded — no commands needed. When a model lacks native tool calling:
- Parses
Thought:,Action:,Action Input:patterns from model output - Multi-dialect support: classic ReAct (
Action:), Function (Function:), Tool (Tool:), Call (Call:) — each with dynamically-built regex patterns - Multiple regex strategies including parenthetical style and loose matching
- Bridges text-based tool calls into Pi's native tool execution pipeline
- Disabled by default; toggle via
/react-modewith persistent config across restarts
Auto-populate models.json with all available Ollama models — works with local and remote instances.
/ollama-sync # Sync from models.json URL (or localhost)
/ollama-sync https://your-tunnel-url # Sync from a specific remote URL- Queries Ollama
/api/tagsfor available models (local or remote) - Writes the actual Ollama URL back into
models.jsonso other extensions pick it up automatically - URL priority: CLI argument → existing
models.jsonbaseUrl →OLLAMA_HOSTenv → localhost - Preserves existing provider config (apiKey, compat settings)
- Defaults to
openai-completionsAPI mode (correct for Ollama's/v1/chat/completionsendpoint) - Sorts models by size (smallest first)
- Auto-detects reasoning-capable models (deepseek-r1, qwq, qwen3, o1, o3, think, reason)
- Merges with existing per-model settings
- Per-model metadata in sync report (parameter size, quantization level, model family)
- Registered as both
/ollama-syncslash command andollama_synctool
Add OpenRouter models to models.json from URLs or bare model IDs.
/or-sync <url-or-id> [url-or-id ...] # Alias
/openrouter-sync <url-or-id> [url-or-id ...]- Accepts full OpenRouter URLs (
https://openrouter.ai/model/name:free) or bare IDs (model/name:free) - Multiple models in one command
- Strips query parameters and fragments from URLs before extracting model name
- Creates
openrouterprovider in models.json if missing (inherits baseUrl/api from built-in provider registry) - Appends models, never removes existing entries
- Reorders providers so openrouter sits above ollama
- Registered as both
/openrouter-syncslash command (alias/or-sync) andopenrouter_synctool
Adds composable named status items to the framework footer using ctx.ui.setStatus(). Each metric gets its own slot so it coexists cleanly with other extensions' status items.
CPU/RAM/Swap are only shown when using a local Ollama provider (not for cloud/remote). For cloud providers, system metrics are omitted. Model name, session tokens, and context usage are shown by the framework — not duplicated here. All labels use dimmed coloring; all values use green highlighting.
Status slots (updated every 5s, 1s for active tool):
- CtxMax + RespMax — combined slot showing native model context window and max response/completion tokens (e.g.,
CtxMax:33k RespMax:16.4k) - Resp — agent loop duration via
agent_start/agent_endevents - CPU% — per-core delta via
os.cpus()(local Ollama only) - RAM — used/total via
os.totalmem()/os.freemem()(local Ollama only) - Swap — used/total from
/proc/meminfo(shown only when swap is active, local only) - Generation params — temperature, top_p, top_k, num_predict, context size, reasoning_effort (dimmed)
- SEC — security mode indicator (
SEC:BASICorSEC:MAX) + session-scoped blocked count + 3s flash on blocked tools (resets on shutdown) - Active tool — live elapsed timer with
>indicator while a tool is running - Prompt — system prompt size as
chars chr tokens tokdisplayed on agent start - Pi version —
pi:0.66.1fetched once atsession_start(dim label + green value, always last slot)
All slots are cleared on session shutdown. Metrics that the framework already provides (model name, session tokens, context usage, thinking level) are intentionally omitted to avoid duplication.
A Matrix movie-inspired theme with neon green on pure black. Designed for terminal aesthetics and extended coding sessions.
/theme matrix
Color palette:
| Token | Color | Usage |
|---|---|---|
green |
#39ff14 |
Primary text — neon green |
brightGreen |
#7fff00 |
Accents, headings, inline code, highlights |
phosphor |
#66ff33 |
Links, tool titles, code block text, secondary text |
glowGreen |
#00ff41 |
Thinking text, quotes |
fadeGreen |
#00cc33 |
Muted text, borders |
hotGreen |
#b2ff59 |
Numbers, emphasis |
yellow |
#eeff00 |
Status bar active tool timer |
| Background | #000000 |
Pure black base |
# 1. Install the package
pi install git:github.com/VTSTech/pi-coding-agent
# 2. Restart Pi
pi -c
# 3. Sync your Ollama models into Pi (or use a cloud provider)
/ollama-sync # Local Ollama
/ollama-sync https://your-tunnel-url # Remote Ollama (e.g., Cloudflare Tunnel)
# 4. Reload Pi to pick up model changes
/reload
# 5. Run diagnostics to verify everything
/diag
# 6. Benchmark your models
/model-test --allIf Ollama is running on a different machine, expose it via a tunnel and point Pi at it:
# On the Ollama machine — create a tunnel (example with cloudflared)
cloudflared tunnel --url http://localhost:11434
# In Pi — sync models from the tunnel URL
/ollama-sync https://your-tunnel-url.trycloudflare.comThe URL gets saved to models.json and all extensions use it automatically. No need to set OLLAMA_HOST or pass the URL again.
Pi handles cloud providers natively — just set your API key in the environment and select a model:
export OPENROUTER_API_KEY="sk-or-..."
# In Pi — select a cloud model
/model openrouter/openai/gpt-oss-120b:free
# Test it
/model-test{
"providers": {
"ollama": {
"baseUrl": "http://localhost:11434/v1",
"api": "openai-completions",
"apiKey": "ollama",
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false
},
"models": []
}
}
}Use
/ollama-syncto auto-populate the models array and set the correctbaseUrlfrom your Ollama instance.
Optimized for CPU-only environments with limited RAM:
{
"defaultProvider": "ollama",
"defaultModel": "granite4:350m",
"defaultThinkingLevel": "off",
"theme": "matrix",
"compaction": {
"enabled": true,
"reserveTokens": 2048,
"keepRecentTokens": 8000
}
}Pi supports multiple API backends via the api field in models.json. For Ollama, use openai-completions which maps to Ollama's native /v1/chat/completions endpoint. Other available modes:
| API Mode | Use Case |
|---|---|
openai-completions |
Ollama, OpenAI-compatible /v1/chat/completions |
openai-responses |
OpenAI Responses API (/v1/responses) |
anthropic-messages |
Anthropic native API |
google-generative-ai |
Gemini API |
google-vertex |
Google Vertex AI |
mistral-conversations |
Mistral API |
bedrock-converse-stream |
Amazon Bedrock |
See Pi's AI package docs for the full list.
These extensions are optimized for running Pi on Google Colab with CPU-only and 12GB RAM. Here's the recommended Ollama launch configuration:
import subprocess, os
# Install Ollama
subprocess.run(["curl", "-fsSL", "https://ollama.com/install.sh"], check=True)
# Environment tuning for CPU-only 12GB
os.environ["OLLAMA_HOST"] = "0.0.0.0:11434"
os.environ["CONTEXT_LENGTH"] = "4096" # Reduce from 262k default
os.environ["MAX_LOADED_MODELS"] = "1" # Only one model in memory
os.environ["KEEP_ALIVE"] = "2m" # Unload after 2min idle
os.environ["KV_CACHE_TYPE"] = "f16" # Use f16 for KV cache
os.environ["OLLAMA_MODELS"] = "/tmp/ollama" # Store in tmpfs (RAM disk)
os.environ["BATCH_SIZE"] = "512" # Smaller batches for CPU
os.environ["NO_CUDA"] = "1" # Force CPU mode
# Start Ollama
subprocess.Popen(["ollama", "serve"])| Model | Params | Size | Reasoning | Tools | Best For |
|---|---|---|---|---|---|
granite4:350m |
352M | 676 MB | ❌ | ✅ | Fast tasks, tool calling |
qwen3:0.6b |
752M | 498 MB | ❌ | ✅ | Small footprint, native tools |
qwen3.5:0.8b |
~800M | 1.0 GB | ❌ | ✅ | Daily driver |
qwen2.5-coder:1.5b |
1.5B | 940 MB | ❌ | ✅ | Code tasks |
llama3.2:1b |
1.2B | 1.2 GB | ❌ | ✅ | General use |
qwen3.5:2b |
2.3B | 2.7 GB | ✅ | ✅ | Best quality (fits 12GB) |
See TESTS.md for full benchmark results across all tested Ollama and cloud provider models.
pi-coding-agent/
├── extensions/
│ ├── api.ts # API mode switcher — modes, URLs, thinking, compat flags
│ ├── diag.ts # System diagnostic suite
│ ├── model-test.ts # Model benchmark — Ollama & cloud providers
│ ├── ollama-sync.ts # Ollama ↔ models.json sync
│ ├── openrouter-sync.ts # OpenRouter → models.json sync
│ ├── react-fallback.ts # ReAct fallback for non-native tool models
│ ├── security.ts # Command/path/SSRF protection
│ └── status.ts # System resource monitor & status bar
├── shared/
│ ├── debug.ts # Conditional debug logging
│ ├── format.ts # Shared formatting utilities
│ ├── model-test-utils.ts # Shared test utilities, config, history
│ ├── ollama.ts # Ollama API helpers, provider detection, mutex, retry
│ ├── react-parser.ts # Multi-dialect ReAct text parser
│ ├── security.ts # Security validation, SSRF, DNS rebinding, audit log
│ └── types.ts # TypeScript types & error classes
├── themes/
│ └── matrix.json # Matrix movie theme
├── npm-packages/ # Per-extension npm package manifests
│ ├── shared/ # @vtstech/pi-shared
│ ├── api/ # @vtstech/pi-api
│ ├── diag/ # @vtstech/pi-diag
│ ├── model-test/ # @vtstech/pi-model-test
│ ├── ollama-sync/ # @vtstech/pi-ollama-sync
│ ├── openrouter-sync/ # @vtstech/pi-openrouter-sync
│ ├── react-fallback/ # @vtstech/pi-react-fallback
│ ├── security/ # @vtstech/pi-security
│ └── status/ # @vtstech/pi-status
├── scripts/
│ ├── build-packages.sh # Build all npm packages (esbuild TS→ESM)
│ ├── bump-version.sh # Linux/macOS version bump script
│ ├── bump-version.ps1 # Windows PowerShell version bump script
│ └── publish-packages.sh # Publish to npm (shared first, then extensions)
├── CHANGELOG.md # Version history
├── TESTS.md # Model benchmark results
├── VERSION # Single source of truth for version
├── package.json # Pi package manifest
├── README.md
└── LICENSE
Written by VTSTech
🌐 www.vts-tech.org • 🐙 GitHub • 📧 veritas@vts-tech.org
Optimizing AI agent development for resource-constrained environments.