Open Source Agent Harness Framework. Any LLM. Any platform. Agentic Programming Paradigm.
Getting Started · API Reference · Philosophy · 中文
Built on the Agentic Programming paradigm. Current LLM agent frameworks let the LLM control everything — what to do, when, and how. The result? Unpredictable execution, context explosion, and no output guarantees. OpenProgram flips this: Python controls the flow, LLM only reasons when asked. See philosophy for the full rationale.
Requires Python 3.11+.
pip install openprogram # install (TUI + web UI included)
openprogram setup # connect a provider (interactive)Then chat with it — either in the terminal or the browser:
openprogram # full-screen TUIFor the web UI, just open your browser at http://localhost:3000 — openprogram setup starts the worker in the background (frontend on 3000, FastAPI backend on 8109) so the page is already live.
Both surfaces share the same backend — sessions, settings, web-search defaults are persisted in ~/.agentic/ and visible from either entry point.
openprogram setup is a wizard that imports credentials from any CLI you've already logged into and asks for missing API keys. Skip it by setting one of these env vars yourself:
export ANTHROPIC_API_KEY=sk-ant-... # Claude
export OPENAI_API_KEY=sk-... # GPT
export GOOGLE_API_KEY=... # GeminiOr use a CLI provider (no API key, uses your existing subscription):
npm i -g @anthropic-ai/claude-code && claude login
npm i -g @openai/codex && codex auth
npm i -g @google/gemini-cli && gemini auth loginCheck what's detected with openprogram providers. Auto-detection order: Claude Code → Codex → Gemini CLI → Anthropic API → OpenAI API → Gemini API.
| Extra | Adds | Post-install |
|---|---|---|
[anthropic] / [openai] / [gemini] |
Provider SDKs | — |
[browser] |
Playwright (~150 MB) | playwright install chromium |
[browser-stealth] |
Cloudflare-bypassing browsers | patchright install chromium && camoufox fetch |
[gui] |
Vision/control deps for GUI harness (~2 GB) | — |
[channels] |
Discord / Slack / WeChat bots | — |
[all] |
Everything except [browser-stealth] |
run post-install commands as needed |
"No provider available"
openprogram providers shows what's detected. Common causes: forgot claude login / codex auth; API key set in a different shell than you're running in; token expired (re-login).
"command not found: openprogram"
pip install dir not on PATH. Use python3 -m openprogram <args> instead, or add $(python3 -m site --user-base)/bin to your PATH.
Web UI port in use
Set OPENPROGRAM_WEB_PORT=8101 (frontend) or OPENPROGRAM_BACKEND_PORT=8102 (FastAPI) before starting the worker. Or store the preference via openprogram config ui.
Local-development install (multi-repo)
For GUI-Agent-Harness / Research-Agent-Harness:
pip install -e "$OPENPROGRAM_DIR" # first
pip install -e "$GUI_HARNESS_DIR" # depends on openprogram
pip install -e "$RESEARCH_HARNESS_DIR"openprogram/functions/agentics/{GUI,Research}-Agent-Harness are symlinks — recreate if a repo moves:
cd openprogram/functions/agentics
rm -f GUI-Agent-Harness && ln -s "$GUI_HARNESS_DIR" GUI-Agent-Harness
rm -f Research-Agent-Harness && ln -s "$RESEARCH_HARNESS_DIR" Research-Agent-Harnesspip install -e writes absolute paths — rerun it from the new location if you rename a parent folder.
For platform-builder topics — Runtime retry semantics, the full agentic_function decorator API, the flat-DAG context model — see docs/API.md and the per-topic notes under docs/api/.
Three agent applications ship with OpenProgram, under openprogram/functions/agentics/ — each a full agent built on the @agentic_function paradigm:
| Project | What it does |
|---|---|
| GUI Agent Harness | Autonomous GUI agent — operates desktop apps (and OSWorld VMs) by vision: observe → plan → act → verify. Python drives the loop, the LLM only reasons when asked. |
| Research Agent Harness | Autonomous research agent — literature survey → idea → experiments → paper writing → cross-model review. Full pipeline from topic to submission-ready paper. |
| Wiki Agent Harness | Autonomous wiki builder — ingests notes, documents and conversations into a structured, Obsidian-compatible knowledge vault with [[wikilinks]]. |
| Principle | How |
|---|---|
| Deterministic flow | Python controls if/else/for/while. Execution is guaranteed, not suggested. |
| Minimal LLM calls | Call the LLM only when reasoning is needed. 2 calls, not 10. |
| Prompt in code | The per-call prompt lives in the function body (runtime.exec(content=...)), not in scattered prompt files. |
| Self-evolving | Agents author, fix and improve functions directly — guided by the agentic-programming skill. |
The problem with current frameworks
Current LLM agent frameworks place the LLM as the central scheduler. This creates three fundamental problems:
- Unpredictable execution — the LLM may skip, repeat, or invent steps regardless of defined workflows
- Context explosion — each tool-call round-trip accumulates history
- No output guarantees — the LLM interprets instructions rather than executing them
The core issue: the LLM controls the flow, but nothing enforces it. Skills, prompts, and system messages are suggestions, not guarantees.
| Tool-Calling / MCP | Agentic Programming | |
|---|---|---|
| Who schedules? | LLM decides | Python decides |
| Functions contain | Code only | Code + LLM reasoning |
| Context | Flat conversation | Structured tree |
| Prompt | Hidden in agent config | Docstring = prompt |
| Self-improvement | Not built-in | create → fix → evolve |
MCP is the transport. Agentic Programming is the execution model. They're orthogonal.
Every @agentic_function call creates a Context node. Nodes form a tree that is automatically injected into LLM calls:
login_flow ✓ 8.8s
├── observe ✓ 3.1s → "found login form at (200, 300)"
├── click ✓ 2.5s → "clicked login button"
└── verify ✓ 3.2s → "dashboard confirmed"
When verify calls the LLM, it automatically sees what observe and click returned. No manual context management.
For complex tasks that demand sustained effort and high standards, deep_work runs an autonomous plan-execute-evaluate loop until the result meets the specified quality level:
from openprogram.functions.agentics.deep_work import deep_work
result = deep_work(
task="Write a survey on context management in LLM agents.",
level="phd", # high_school → bachelor → master → phd → professor
runtime=runtime,
)The agent clarifies requirements upfront, then works fully autonomously — executing, self-evaluating, and revising until the output passes quality review. State is persisted to disk, so interrupted work resumes where it left off.
Writing, fixing and scaffolding @agentic_functions is itself agent work — done with ordinary file-editing tools, guided by the agentic-programming skill (skills/agentic-programming/SKILL.md). There are no dedicated create() / fix() framework calls: they only ever wrapped one LLM call plus a file write, which an agent does directly.
The skill is the complete spec — where the file goes, the decorator's metadata, the docstring vs content split, a rule-based validation checklist, and a smoke test. An agent reads it, writes the function, validates it, runs it; the write → run → fail → fix cycle still means programs improve through use.
Session history is stored like a git repository, not a flat list. Every exchange is a commit, branches are first-class, and the right sidebar exposes the usual git operations:
- Branch off any past exchange to explore an alternative without losing the original thread
- Attach context from another session (cross-session reuse) as a labelled user message
- Merge two threads when their branches converge
- Cherry-pick specific commits across branches
Branches that touch files run in isolated git worktrees under the hood, so two concurrent agents on different branches can't fight over the same source tree. Other frameworks fork conversations by copying messages; we fork the underlying repo.
Memory isn't a single bag. Six separate stores under ~/.agentic/memory/ cover different timescales and purposes:
| Layer | What goes there |
|---|---|
journal |
Short-term — recent observations, raw notes |
wiki |
Durable — facts the agent decided to keep around |
sleep |
Periodic consolidation (offline daemon merges journal → wiki) |
scheduler |
Cron-driven recalls that surface a memory at a specific time |
recall_counts |
Hit counts that boost frequently-used memories |
store |
Project-scoped key/value |
Open /memory to inspect or hand-edit any layer; the agent decides which layer to write to based on what it learned. The split exists because "remember this until I tell you to forget" and "remember this for the next 10 turns" want different storage strategies.
| Import | What it does |
|---|---|
from openprogram import agentic_function |
Decorator. Records each call as a node in the session DAG |
from openprogram.agentic_programming.runtime import Runtime |
LLM runtime. exec() calls the LLM with DAG-derived context |
from openprogram.providers.registry import create_runtime |
Create a Runtime with auto-detection or explicit provider (create_runtime() checks API keys and CLIs in priority order) |
There are no create() / fix() meta-functions — writing, editing and
validating @agentic_functions is done directly with ordinary
file-editing tools, guided by the agentic-programming skill
(skills/agentic-programming/SKILL.md). That skill is the complete spec:
file layout, the decorator's metadata, the docstring vs content split,
and a rule-based validation checklist.
| Import | What it does |
|---|---|
from openprogram.functions.agentics.deep_work import deep_work |
Autonomous plan-execute-evaluate loop with quality levels |
from openprogram.functions.agentics.ask_user import ask_user |
Ask the user a clarifying question and block until an answer arrives |
Six built-in providers: Anthropic, OpenAI, Gemini (API), Claude Code, Codex, Gemini (CLI). All CLI providers maintain session continuity across calls. See Provider docs for details.
- agentic_function — decorator behavior, DAG node recording, the docstring /
contentsplit - Runtime —
exec(), retries, response formats, provider wiring - Providers — built-in runtimes, detection order, CLI vs API tradeoffs
| Guide | Description |
|---|---|
| Getting Started | 3-minute setup and runnable examples |
| Claude Code | Use without API key via Claude Code CLI |
| OpenClaw | Use as OpenClaw skill |
| API Reference | Full API documentation |
Project Structure
openprogram/
├── __init__.py # agentic_function re-export
├── cli.py # `openprogram` command entry point
├── agentic_programming/ # engine — paradigm-essential primitives
│ ├── function.py # @agentic_function decorator
│ ├── runtime.py # Runtime (exec + retry + DAG context)
│ ├── session.py # session lifecycle
│ └── skills.py # SKILL.md discovery
├── context/ # flat-DAG context model — nodes, storage, render, compute_reads
├── providers/ # Anthropic, OpenAI, Gemini, Claude Code, Codex, Gemini CLI
├── functions/
│ ├── _registry.py # unified registry for tools + agentic functions
│ ├── tools/ # @function leaves — bash, read, edit, grep, semble_search, web_search, …
│ └── agentics/ # @agentic_function modules (each its own dir, code in __init__.py)
│ ├── ask_user/ # ask the user a clarifying question
│ ├── deep_work/ # autonomous plan-execute-evaluate loop
│ ├── extract_pdf_figures/ # PDF figure extraction
│ ├── … # other agentics …
│ ├── GUI-Agent-Harness/ # GUI agent (separate repo, symlink)
│ ├── Research-Agent-Harness/ # Research agent (separate repo, symlink)
│ └── Wiki-Agent-Harness/ # Wiki agent (separate repo, symlink)
└── webui/ # `openprogram web` — browser UI
skills/ # SKILL.md files for agent integration
examples/ # runnable demos
tests/ # pytest suite
This is a paradigm proposal with a reference implementation. We welcome discussions, alternative implementations in other languages, use cases that validate or challenge the approach, and bug reports.
See CONTRIBUTING.md for details.
OpenProgram stands on shoulders. The tool framework, provider abstraction, and several tool implementations were ported or adapted from the projects below — each under its own license. Enormous thanks to their authors.
- OpenClaw (MIT) — layout of the
tool registry (
name / description / parameters / execute), provider abstraction withcheck_fn+requires_envgating,TOOLSETSpresets, skill loading via SKILL.md frontmatter + late-boundread. Our full clone lives underreferences/openclaw/(gitignored) for browsing. - hermes-agent
(MIT) — starting point for
execute_code(we trimmed the Docker / Modal layers),mixture_of_agents, and the general shape of the multi-providerweb_search/image_generate/image_analyzetools. - pi-coding-agent
(MIT) — via OpenClaw's import, the canonical AgentSkill shape
(
<available_skills>XML formatter, name / description / location). - Claude Code — overall ergonomics
of the
DEFAULT_TOOLSset (bash + read / write / edit + glob / grep / list- apply_patch + todo_read / todo_write) and the
todotool's JSON schema.
- apply_patch + todo_read / todo_write) and the
- Anthropic / OpenAI / Google SDKs — provider HTTP contracts; our providers call the raw HTTP APIs to keep SDK dependencies optional.
Individual tool files call out their direct inspirations in file-level docstrings where the lineage is more specific.
MIT


