Releases: BackendStack21/odek
v1.0.0
odek v1.0.0 — First Stable Release
Minimal Go autonomous agent runtime — 385 commits, 191 releases, one binary.
What is odek?
odek is a runtime, not a framework. It's the smallest possible surface area between an LLM and your tools: a single ~12 MB static binary, zero frameworks (stdlib + 2 packages), instant startup.
At its core is a ReAct loop (Reasoning + Acting): observe → think → act → repeat. The LLM reasons about the current state, decides what to do, and odek executes those actions — in parallel when possible, with systematic recovery when things fail.
$ go install github.com/BackendStack21/odek/cmd/odek@v1.0.0
$ export ODEK_API_KEY=sk-...
$ odek run "Run the tests and fix any failures"
The Journey to 1.0
| Milestone | What shipped |
|---|---|
| v0.1.0 – v0.40.0 | Core loop, tool registry, CLI, REPL, browser, file tools, MCP server, Docker sandbox |
| v0.41.0 – v0.52.0 | Systematic tool-failure recovery, persistent memory (facts + episodes), Telegram bot, parallel tool execution, batch approval gate, Web UI, session resolver, security hardening |
| v0.53.0 – v0.55.0 | Context-limit protection (trimToSurvival), sub-agent delegation, skill auto-learning, bypass-resistant danger classifier |
| v0.56.0 – v0.58.0 | Async post-processing (no more hang), semantic session search, artifact-aware file search, MCP client, episode + skill provenance gating, FD-based API key handoff |
| v1.0.0 | Audit system with divergence heuristic, untrusted-content wrapper with per-call nonce, approver friction mode, sub-agent risk caps, UI refactor — stability and security complete |
385 commits. 191 tagged releases. One binary. We shipped fast, fixed fast, and never let a regression survive longer than a release.
Architecture at a Glance
CLI / REPL / Web UI / Telegram bot
│
┌──────▼──────┐
│ ReAct Loop │ observe → think → parallel-act → repeat
│ (300 iter) │
└──────┬──────┘
│
┌────────┼────────┐
▼ ▼ ▼
Tools Memory Sub-agents
(25+) (3-tier) (up to 8)
Core Engine
- Parallel tool execution — independent tool calls run concurrently (default: 4, configurable)
- Batch approval gate — multiple risky tools shown in a single prompt, reducing fatigue
- Context-limit protection — trimToSurvival drops oldest messages when approaching the model's context window, keeping the agent functional under extended sessions
- Tool-failure recovery — systematic recovery: retry transient errors, skip permanently failed tools, continue without crashing
- Async post-processing — skill learning and episode extraction run in background goroutines (eliminated the 2-5 second hang after every run)
- Interaction modes — engaging (narrated), enhance (persistent), verbose (raw), off
Security — 12-Layer Defense
odek is an LLM agent that executes shell commands, reads/writes files, fetches URLs, and spawns sub-agents. That capability is the point. It's also the security problem. v1.0.0 ships layered defenses against prompt injection and approval fatigue:
| # | Layer | What it does |
|---|---|---|
| 1 | Sandboxed execution | Isolated Docker container per session — no network, no host mounts beyond cwd, zero capabilities, destroyed on exit. odek serve enables it by default. |
| 2 | Untrusted-content wrapper | Every tool output from outside the trust boundary (browser, shell, read_file, MCP tools, transcribe) is wrapped in <untrusted_content_<nonce>>. Per-call nonce defeats wrapper-escape attacks. |
| 3 | Audit log + divergence heuristic | Every ingest is recorded with source + content-hash + turn. After each turn, a heuristic flags suspicious_divergence when the agent references resources the user didn't mention. Inspect with odek audit <session-id>. |
| 4 | Tainted memory episodes | Episodes from sessions that ingested untrusted content are stored but never auto-replayed. Search() filters them out. |
| 5 | Skill provenance gate | Skills auto-learned from untrusted contexts are pinned to Lazy (never auto-load). odek skill promote clears the flag after user review. |
| 6 | Sub-agent risk caps | delegate_tasks carries trust_level + max_risk. Untrusted → all dangerous actions forced to Deny. max_risk → everything above cap Deny. |
| 7 | FD-based API key handoff | Parent writes key to a 0600 tempfile, immediately unlink()s, passes the FD via cmd.ExtraFiles. Key never in /proc/<pid>/environ. |
| 8 | Bypass-resistant classifier | normalize() expands $IFS, extracts $() and backtick substitutions, strips command/exec/builtin wrappers, collapses unquoted backslashes, basenames absolute paths. |
| 9 | Approver friction mode | After 3 approvals of the same class in 60 seconds: requires typing literal approve, enforces 1.5s pause. Disabled shortcut for destructive + blocked regardless. |
| 10 | WS Origin allowlist | Rejects non-localhost WebSocket upgrades. Closes CSRF-on-localhost. |
| 11 | Secret redaction | 20+ patterns: OpenAI, Anthropic, GitHub PAT, AWS, PEM, JWT, Vault, Google OAuth, SendGrid, Discord, DB URLs. |
| 12 | Regression bar | Every documented mitigation has a corresponding test in security_report_validation_test.go. |
Full threat model: docs/SECURITY.md
What's in the Binary
25+ Built-in Tools (zero subprocess forks)
read_file, write_file, search_files, patch, batch_read, batch_patch, glob, file_info, shell, parallel_shell, browser, http_batch, math_eval, diff, count_lines, multi_grep, json_query, tree, checksum, sort, head_tail, base64, tr, word_count, transcribe, delegate_tasks, session_search
Persistent Memory — 3 Tiers
- Facts — agent-managed durable key-value entries
- Session buffer — auto-appended turn summaries
- Episodes — LLM-extracted knowledge from past sessions. Merge-on-write via go-vector RandomProjections (cosine >0.7 auto-merges, <0.3 auto-adds). Saves ~80% LLM calls.
Skill System (on by default)
Skill-matched SKILL.md files load on-demand. Auto-learns patterns from every session — detects multi-step procedures, error recoveries, repeated actions, and user corrections. LLM-enriched with names, descriptions, triggers, and structured bodies. Import from any URI with automatic LLM risk assessment.
Sub-Agent Delegation
Parallel OS-process sub-agents via delegate_tasks. True isolation — each sub-agent is a fresh odek subagent process with its own config, tools, and timeout. Up to 8 concurrent workers. Risk-based trust caps.
MCP — Model Context Protocol
Full server implementation (stdio + SSE transport) and client (connect to external MCP servers). Tools are discovered and usable within the agent loop.
Platform Support
CLI, REPL (with raw-mode terminal editor), Web UI (HTTP + WebSocket), Telegram bot — all from one binary.
Performance
| Metric | Value |
|---|---|
| Binary size | ~12 MB (static) |
| Startup time | Instant (< 50ms) |
| Dependencies | 5 packages (3 stdlib + 2 focused) |
| Benchmark | AIEB v2.0 — 80.3% (highest published agent score) |
| Test coverage | 200+ unit + E2E tests across all tools |
Breaking Changes from v0.x
None. v1.0.0 is backwards-compatible with all v0.58.x configurations and workflows. The 1.0 designation marks stability, not a rewrite.
Upgrade
go install github.com/BackendStack21/odek/cmd/odek@v1.0.0
odek --version # → odek v1.0.0What's Next
1.0 means the core is stable. Upcoming:
- Streaming tool output — real-time
shellandbrowseroutput in the terminal - Multi-model routing — route different workloads to different LLMs automatically
- Remote sandbox — execute in cloud VMs, not just local Docker
- Plugin system — load external tools as shared libraries
385 commits. 191 releases. 1 binary. Let's build.
Full Changelog: v0.58.8...v1.0.0
v0.58.8 — Archive sessions on /new, fix deepsearch test
Features
- archive sessions on /new instead of deleting + fix deepsearch test
Documentation
- reverse CHANGELOG order to newest-first
- regenerate full CHANGELOG.md from git history via generate-changelog.sh
- deprecate manual changelog edits — point to generate-changelog.sh
Infrastructure
- add generate-changelog.sh — conventional-commit changelog generator
Full Changelog: v0.58.7...v0.58.8
v0.58.7 — Dynamic release badge on landing page
Changes
🌐 Landing Page
- Replaced the hardcoded
v0.48.0version badge in the hero section with a dynamic Shields.io badge linked to GitHub Releases - Zero JavaScript — the badge auto-updates via CDN-cached metadata from the latest release tag
- Clicking the badge now takes you straight to the releases page
Full Changelog: v0.58.6...v0.58.7
v0.58.6 — session recall edge-case tests
Tests
6 new edge-case tests for the session recall pipeline
TestSessionSearch_DeepSearchTwoTokens — Verifies the v0.58.4 fix: a session where only "changes" appears in message content does NOT match query "go-vector changes". A session with both "go-vector" AND "changes" DOES match. Prevents the false positive that plagued the events fetcher analysis session.
TestSessionSearch_GetReturnsSessionMessages — Verifies the v0.58.3 fix: get returns the full session_messages array with correct role and content for every user/assistant message. System messages are excluded.
TestSessionSearch_PreSavePersistence — Verifies the v0.58.5 fix: a session saved to the Store is immediately findable by session_search. This simulates the pre-agent-loop save that ensures the current turn's data is visible to search tools inside the ReAct loop.
TestSessionSearch_DeepSearchEdgeCases — Three sub-tests:
- Empty messages don't cause panics
- System-only messages are excluded from deepSearch matching
- Unicode content with two matching tokens works correctly
Full Changelog: v0.58.5...v0.58.6
v0.58.5 — save user message before agent loop
Fixes
Telegram bot: save user message before agent loop
session_search inside the agent loop could never find the current turn's data. The user message was appended to an in-memory slice (line 998 of telegram.go) but only persisted to disk AFTER RunWithMessages completed (line 1534).
The entire ReAct loop ran with the current turn's messages invisible to both:
- Vector search (Phase 1) — the index didn't have the current content
- Deep search (Phase 2) —
Store.Load()read a stale file from disk
Now the user message is saved to the Store immediately after being appended, using a direct Store.Save() call that bypasses the TurnCount increment. The normal end-of-turn save at line 1534 still runs and overwrites with the final state (including tool results and bot responses).
This ensures that any session_search call inside the agent loop can find the current turn's conversation content on disk and in the vector index.
Full Changelog: v0.58.4...v0.58.5
v0.58.4 — deepSearch requires 2+ distinct token matches
Fixes
session_search no longer matches on a single common word
Query "go-vector changes" was matching the events fetcher analysis session because "changes" appeared once in a message like "Events changed: +8 -8 = 10 total". deepSearch accepted any single token match across 100+ messages.
Now deepSearch tracks distinct matched tokens and requires at least 2 (or all for single-token queries). A single common word like "changes", "release", or "update" can no longer qualify an unrelated session.
Tool description updated
Added guidance telling the LLM to use get after search to read the actual conversation content. In v0.58.3, get was updated to return full session_messages but the LLM didn't know to use it.
Full Changelog: v0.58.3...v0.58.4
v0.58.3 — session message content + recursive glob
Fixes
session_search get now returns actual message content
Previously get only returned message count + buffer summaries. The LLM couldn't read what was actually said in past sessions — it only saw 2-line buffer snippets. Now session_messages includes every user and assistant message with role + content, so the bot can directly read and understand past conversations.
glob tool now supports ** recursive patterns
Go's filepath.Match and filepath.Glob don't support ** (globstar) — they treat ** as literal * characters. When the bot called glob {"pattern":"**/*.json","path":"..."}, it got {"matches":null} every time. The pattern **/*.json was silently failing because filepath.Match("**/*.json", path) never matches anything.
Now ** patterns are detected and converted to equivalent regex (e.g. **/*.json -> ^.*/[^/]*\.json$), so recursive globs actually work.
Full Changelog: v0.58.2...v0.58.3
v0.58.2 — stale vector cleanup on prune
Fixes
/prune no longer leaves orphaned vectors
Store.Cleanup() primary path (with index) bypassed Store.Delete() and directly removed session files + index entries — but never called Vec.Remove(). Every /prune command left stale vectors in vectors.gob.
The fallback path (no index) was correct — it used Store.Delete() which includes Vec.Remove().
Now the index-based path also calls Vec.Remove(id) alongside file removal.
Impact: No data corruption (stale vectors are skipped during search since Load() returns nil for deleted files, and the threshold filter vr.Score < 0.40 drops them). But the store accumulated uncompacted garbage that would never be cleaned up.
Full Changelog: v0.58.1...v0.58.2
v0.58.1 — session_search false-positive fix
Fixes
session_search no longer returns garbage results
Problem: Two bugs caused handleSearch to return "say hello" sessions as false positives:
-
Vector score threshold too low (0.05): Random Projections (bag-of-words) matches generic tech queries against "say hello" sessions at 0.30-0.36. Querying "odek project molty agent skill" returned irrelevant hello sessions with scores above the old threshold.
-
Deep search pool too narrow (20 sessions): Keyword fallback only searched the 20 most recent sessions. With 115+ "say hello" heartbeat tests occupying the recent list, substantive older sessions were never reached.
Fix:
- Raised vector score threshold to 0.40 — only strong matches pass Phase 1
- Changed deep search to List(0) — scans ALL sessions, not just the 20 most recent
Full Changelog: v0.58.0...v0.58.1
v0.58.0
Full Changelog: v0.57.0...v0.58.0