Skip to content

Releases: BackendStack21/odek

v1.0.0

29 May 19:38

Choose a tag to compare

odek v1.0.0 — First Stable Release

Minimal Go autonomous agent runtime — 385 commits, 191 releases, one binary.


What is odek?

odek is a runtime, not a framework. It's the smallest possible surface area between an LLM and your tools: a single ~12 MB static binary, zero frameworks (stdlib + 2 packages), instant startup.

At its core is a ReAct loop (Reasoning + Acting): observe → think → act → repeat. The LLM reasons about the current state, decides what to do, and odek executes those actions — in parallel when possible, with systematic recovery when things fail.

$ go install github.com/BackendStack21/odek/cmd/odek@v1.0.0
$ export ODEK_API_KEY=sk-...
$ odek run "Run the tests and fix any failures"

The Journey to 1.0

Milestone What shipped
v0.1.0 – v0.40.0 Core loop, tool registry, CLI, REPL, browser, file tools, MCP server, Docker sandbox
v0.41.0 – v0.52.0 Systematic tool-failure recovery, persistent memory (facts + episodes), Telegram bot, parallel tool execution, batch approval gate, Web UI, session resolver, security hardening
v0.53.0 – v0.55.0 Context-limit protection (trimToSurvival), sub-agent delegation, skill auto-learning, bypass-resistant danger classifier
v0.56.0 – v0.58.0 Async post-processing (no more hang), semantic session search, artifact-aware file search, MCP client, episode + skill provenance gating, FD-based API key handoff
v1.0.0 Audit system with divergence heuristic, untrusted-content wrapper with per-call nonce, approver friction mode, sub-agent risk caps, UI refactor — stability and security complete

385 commits. 191 tagged releases. One binary. We shipped fast, fixed fast, and never let a regression survive longer than a release.


Architecture at a Glance

CLI / REPL / Web UI / Telegram bot
            │
     ┌──────▼──────┐
     │  ReAct Loop  │  observe → think → parallel-act → repeat
     │  (300 iter)  │
     └──────┬──────┘
            │
   ┌────────┼────────┐
   ▼        ▼        ▼
Tools    Memory   Sub-agents
(25+)   (3-tier)  (up to 8)

Core Engine

  • Parallel tool execution — independent tool calls run concurrently (default: 4, configurable)
  • Batch approval gate — multiple risky tools shown in a single prompt, reducing fatigue
  • Context-limit protection — trimToSurvival drops oldest messages when approaching the model's context window, keeping the agent functional under extended sessions
  • Tool-failure recovery — systematic recovery: retry transient errors, skip permanently failed tools, continue without crashing
  • Async post-processing — skill learning and episode extraction run in background goroutines (eliminated the 2-5 second hang after every run)
  • Interaction modes — engaging (narrated), enhance (persistent), verbose (raw), off

Security — 12-Layer Defense

odek is an LLM agent that executes shell commands, reads/writes files, fetches URLs, and spawns sub-agents. That capability is the point. It's also the security problem. v1.0.0 ships layered defenses against prompt injection and approval fatigue:

# Layer What it does
1 Sandboxed execution Isolated Docker container per session — no network, no host mounts beyond cwd, zero capabilities, destroyed on exit. odek serve enables it by default.
2 Untrusted-content wrapper Every tool output from outside the trust boundary (browser, shell, read_file, MCP tools, transcribe) is wrapped in <untrusted_content_<nonce>>. Per-call nonce defeats wrapper-escape attacks.
3 Audit log + divergence heuristic Every ingest is recorded with source + content-hash + turn. After each turn, a heuristic flags suspicious_divergence when the agent references resources the user didn't mention. Inspect with odek audit <session-id>.
4 Tainted memory episodes Episodes from sessions that ingested untrusted content are stored but never auto-replayed. Search() filters them out.
5 Skill provenance gate Skills auto-learned from untrusted contexts are pinned to Lazy (never auto-load). odek skill promote clears the flag after user review.
6 Sub-agent risk caps delegate_tasks carries trust_level + max_risk. Untrusted → all dangerous actions forced to Deny. max_risk → everything above cap Deny.
7 FD-based API key handoff Parent writes key to a 0600 tempfile, immediately unlink()s, passes the FD via cmd.ExtraFiles. Key never in /proc/<pid>/environ.
8 Bypass-resistant classifier normalize() expands $IFS, extracts $() and backtick substitutions, strips command/exec/builtin wrappers, collapses unquoted backslashes, basenames absolute paths.
9 Approver friction mode After 3 approvals of the same class in 60 seconds: requires typing literal approve, enforces 1.5s pause. Disabled shortcut for destructive + blocked regardless.
10 WS Origin allowlist Rejects non-localhost WebSocket upgrades. Closes CSRF-on-localhost.
11 Secret redaction 20+ patterns: OpenAI, Anthropic, GitHub PAT, AWS, PEM, JWT, Vault, Google OAuth, SendGrid, Discord, DB URLs.
12 Regression bar Every documented mitigation has a corresponding test in security_report_validation_test.go.

Full threat model: docs/SECURITY.md


What's in the Binary

25+ Built-in Tools (zero subprocess forks)

read_file, write_file, search_files, patch, batch_read, batch_patch, glob, file_info, shell, parallel_shell, browser, http_batch, math_eval, diff, count_lines, multi_grep, json_query, tree, checksum, sort, head_tail, base64, tr, word_count, transcribe, delegate_tasks, session_search

Persistent Memory — 3 Tiers

  • Facts — agent-managed durable key-value entries
  • Session buffer — auto-appended turn summaries
  • Episodes — LLM-extracted knowledge from past sessions. Merge-on-write via go-vector RandomProjections (cosine >0.7 auto-merges, <0.3 auto-adds). Saves ~80% LLM calls.

Skill System (on by default)

Skill-matched SKILL.md files load on-demand. Auto-learns patterns from every session — detects multi-step procedures, error recoveries, repeated actions, and user corrections. LLM-enriched with names, descriptions, triggers, and structured bodies. Import from any URI with automatic LLM risk assessment.

Sub-Agent Delegation

Parallel OS-process sub-agents via delegate_tasks. True isolation — each sub-agent is a fresh odek subagent process with its own config, tools, and timeout. Up to 8 concurrent workers. Risk-based trust caps.

MCP — Model Context Protocol

Full server implementation (stdio + SSE transport) and client (connect to external MCP servers). Tools are discovered and usable within the agent loop.

Platform Support

CLI, REPL (with raw-mode terminal editor), Web UI (HTTP + WebSocket), Telegram bot — all from one binary.


Performance

Metric Value
Binary size ~12 MB (static)
Startup time Instant (< 50ms)
Dependencies 5 packages (3 stdlib + 2 focused)
Benchmark AIEB v2.0 — 80.3% (highest published agent score)
Test coverage 200+ unit + E2E tests across all tools

Breaking Changes from v0.x

None. v1.0.0 is backwards-compatible with all v0.58.x configurations and workflows. The 1.0 designation marks stability, not a rewrite.


Upgrade

go install github.com/BackendStack21/odek/cmd/odek@v1.0.0
odek --version  # → odek v1.0.0

What's Next

1.0 means the core is stable. Upcoming:

  • Streaming tool output — real-time shell and browser output in the terminal
  • Multi-model routing — route different workloads to different LLMs automatically
  • Remote sandbox — execute in cloud VMs, not just local Docker
  • Plugin system — load external tools as shared libraries

385 commits. 191 releases. 1 binary. Let's build.

Full Changelog: v0.58.8...v1.0.0

v0.58.8 — Archive sessions on /new, fix deepsearch test

26 May 07:09

Choose a tag to compare

Features

  • archive sessions on /new instead of deleting + fix deepsearch test

Documentation

  • reverse CHANGELOG order to newest-first
  • regenerate full CHANGELOG.md from git history via generate-changelog.sh
  • deprecate manual changelog edits — point to generate-changelog.sh

Infrastructure

  • add generate-changelog.sh — conventional-commit changelog generator

Full Changelog: v0.58.7...v0.58.8

v0.58.7 — Dynamic release badge on landing page

26 May 06:26

Choose a tag to compare

Changes

🌐 Landing Page

  • Replaced the hardcoded v0.48.0 version badge in the hero section with a dynamic Shields.io badge linked to GitHub Releases
  • Zero JavaScript — the badge auto-updates via CDN-cached metadata from the latest release tag
  • Clicking the badge now takes you straight to the releases page

Full Changelog: v0.58.6...v0.58.7

v0.58.6 — session recall edge-case tests

26 May 06:10

Choose a tag to compare

Tests

6 new edge-case tests for the session recall pipeline

TestSessionSearch_DeepSearchTwoTokens — Verifies the v0.58.4 fix: a session where only "changes" appears in message content does NOT match query "go-vector changes". A session with both "go-vector" AND "changes" DOES match. Prevents the false positive that plagued the events fetcher analysis session.

TestSessionSearch_GetReturnsSessionMessages — Verifies the v0.58.3 fix: get returns the full session_messages array with correct role and content for every user/assistant message. System messages are excluded.

TestSessionSearch_PreSavePersistence — Verifies the v0.58.5 fix: a session saved to the Store is immediately findable by session_search. This simulates the pre-agent-loop save that ensures the current turn's data is visible to search tools inside the ReAct loop.

TestSessionSearch_DeepSearchEdgeCases — Three sub-tests:

  • Empty messages don't cause panics
  • System-only messages are excluded from deepSearch matching
  • Unicode content with two matching tokens works correctly

Full Changelog: v0.58.5...v0.58.6

v0.58.5 — save user message before agent loop

26 May 06:07

Choose a tag to compare

Fixes

Telegram bot: save user message before agent loop

session_search inside the agent loop could never find the current turn's data. The user message was appended to an in-memory slice (line 998 of telegram.go) but only persisted to disk AFTER RunWithMessages completed (line 1534).

The entire ReAct loop ran with the current turn's messages invisible to both:

  • Vector search (Phase 1) — the index didn't have the current content
  • Deep search (Phase 2) — Store.Load() read a stale file from disk

Now the user message is saved to the Store immediately after being appended, using a direct Store.Save() call that bypasses the TurnCount increment. The normal end-of-turn save at line 1534 still runs and overwrites with the final state (including tool results and bot responses).

This ensures that any session_search call inside the agent loop can find the current turn's conversation content on disk and in the vector index.

Full Changelog: v0.58.4...v0.58.5

v0.58.4 — deepSearch requires 2+ distinct token matches

26 May 05:51

Choose a tag to compare

Fixes

session_search no longer matches on a single common word

Query "go-vector changes" was matching the events fetcher analysis session because "changes" appeared once in a message like "Events changed: +8 -8 = 10 total". deepSearch accepted any single token match across 100+ messages.

Now deepSearch tracks distinct matched tokens and requires at least 2 (or all for single-token queries). A single common word like "changes", "release", or "update" can no longer qualify an unrelated session.

Tool description updated

Added guidance telling the LLM to use get after search to read the actual conversation content. In v0.58.3, get was updated to return full session_messages but the LLM didn't know to use it.

Full Changelog: v0.58.3...v0.58.4

v0.58.3 — session message content + recursive glob

26 May 05:37

Choose a tag to compare

Fixes

session_search get now returns actual message content

Previously get only returned message count + buffer summaries. The LLM couldn't read what was actually said in past sessions — it only saw 2-line buffer snippets. Now session_messages includes every user and assistant message with role + content, so the bot can directly read and understand past conversations.

glob tool now supports ** recursive patterns

Go's filepath.Match and filepath.Glob don't support ** (globstar) — they treat ** as literal * characters. When the bot called glob {"pattern":"**/*.json","path":"..."}, it got {"matches":null} every time. The pattern **/*.json was silently failing because filepath.Match("**/*.json", path) never matches anything.

Now ** patterns are detected and converted to equivalent regex (e.g. **/*.json -> ^.*/[^/]*\.json$), so recursive globs actually work.

Full Changelog: v0.58.2...v0.58.3

v0.58.2 — stale vector cleanup on prune

26 May 05:24

Choose a tag to compare

Fixes

/prune no longer leaves orphaned vectors

Store.Cleanup() primary path (with index) bypassed Store.Delete() and directly removed session files + index entries — but never called Vec.Remove(). Every /prune command left stale vectors in vectors.gob.

The fallback path (no index) was correct — it used Store.Delete() which includes Vec.Remove().

Now the index-based path also calls Vec.Remove(id) alongside file removal.

Impact: No data corruption (stale vectors are skipped during search since Load() returns nil for deleted files, and the threshold filter vr.Score < 0.40 drops them). But the store accumulated uncompacted garbage that would never be cleaned up.

Full Changelog: v0.58.1...v0.58.2

v0.58.1 — session_search false-positive fix

26 May 05:19

Choose a tag to compare

Fixes

session_search no longer returns garbage results

Problem: Two bugs caused handleSearch to return "say hello" sessions as false positives:

  1. Vector score threshold too low (0.05): Random Projections (bag-of-words) matches generic tech queries against "say hello" sessions at 0.30-0.36. Querying "odek project molty agent skill" returned irrelevant hello sessions with scores above the old threshold.

  2. Deep search pool too narrow (20 sessions): Keyword fallback only searched the 20 most recent sessions. With 115+ "say hello" heartbeat tests occupying the recent list, substantive older sessions were never reached.

Fix:

  • Raised vector score threshold to 0.40 — only strong matches pass Phase 1
  • Changed deep search to List(0) — scans ALL sessions, not just the 20 most recent

Full Changelog: v0.58.0...v0.58.1

v0.58.0

26 May 04:53

Choose a tag to compare

Full Changelog: v0.57.0...v0.58.0