feat(hands): synchronous ephemeral hand-query primitive#1237
Open
pbranchu wants to merge 10 commits into
Open
Conversation
Lays the groundwork for per-user memory by giving every install a stable default-user UUID and tagging every session with an owning user. Sessions are now consistently user-scoped: - `Session::user_id: UserId` (required, not Option) — defaults to the kernel's persistent default user - `Session::parent_session_id: Option<SessionId>` — foundation for future tree-scoped cascade deletion of forked sessions (no producer yet) - `MessageSource` enum + optional `Message::source` — additive type that later PRs (structured extraction filtering) will read; no consumer here - `UserConfig::is_default: bool` — `[[users]]` blocks can attach display name and channel bindings to the persistent default identity Kernel boots the default user once and caches it process-wide: - `bootstrap_default_user` — load-or-generate the UUID from `kv_store[shared, "default_user_uuid"]`, install via `set_default_user_id`, then run a one-shot rewrite of legacy nil-UUID sessions, gated by the `default_user_bootstrap_done` sentinel - `resolve_user_id` (strict, HTTP boundary) — folds the deprecated "test" alias and the nil UUID to the default user with `warn!` logs so reserved-bucket abuse is auditable - `resolve_user_id_internal` (raw mapper) — preserves the pre-fix behaviour for in-process test callers - `AuthManager::new_with_default` — binds the `is_default = true` user (or the first user) to the persistent UUID Storage and migration: - Schema v9 adds `user_id` (NOT NULL, default nil UUID) and `parent_session_id` (nullable) to `sessions`, plus `(agent_id, user_id)` and `parent_session_id` indexes - `MemorySubstrate::rewrite_nil_user_sessions` — atomic transaction wrapping the legacy-bucket UPDATE; the kernel only sets the bootstrap-done sentinel after a clean rewrite, so a failure leaves the retry path intact Single-user installs see no behaviour change: everyone is the default user, one session per agent, same as today. Tests: - `MessageSource` deserialises cleanly from pre-field payloads (JSON + msgpack) and survives full round-trips - `UserConfig::is_default` defaults to `false` for existing configs - `AuthManager::new_with_default` honours `is_default`, falls back to first user, and is a no-op for empty configs - Migration v9 adds the columns and indexes, and a v8-built DB upgrades cleanly to v9 with pre-existing rows preserved - `default_user_id`/`test_user_id` are distinct - `create_session(agent, user)` round-trips through SQLite — both the default and an explicit user - `rewrite_nil_user_sessions` is idempotent, targeted, and atomic - Kernel: default-user UUID persists across kernel restarts; the strict filter folds `"test"` and nil to default while passing other UUIDs through; the internal mapper preserves the raw `"test"` alias Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cargo update bumps lettre from 0.11.21 to 0.11.22 to clear RUSTSEC-2026-0141. Pulls in transitive dependency updates as a side effect (mostly windows-sys/socket2 version consolidation) — no API surface change in our own code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…trol API
Adds the storage layer, opt-in gate, and management/control HTTP API for
structured memory. No producer (extraction + dreamer) and no UI changes —
those land in follow-up PRs. Default agents see zero behavior change:
storage tables are created on migration but only populated by agents that
opt in via `[memory] system = "structured"` in their manifest.
Per-agent opt-in:
- `MemorySystem` enum (`Summarization` default / `Structured`) on
`AgentManifest`, with `MemoryConfig` wrapper carrying
`skip_serializing_if = is_default` so manifests that don't opt in
round-trip clean TOML.
- Same `[memory]` field on `HandAgentConfig`; `activate_hand` copies
through to the spawned manifest.
- `MemoryConfig::is_structured()` is the single gate consulted by every
call site that touches structured memory.
Schema migration v10 (PR 1 added v9) consolidates the three structured-
memory storage tables with denormalized columns from the start:
- `session_extractions(user_id, agent_id, ...)` — audit attribution
survives session deletes; idx on `(user_id, created_at)` for the
audit endpoint.
- `user_memory_topics` with `expires_at` + `embedding` columns.
- `user_agent_memory_topics` keyed by `(user_id, agent_id, topic)`.
Storage modules:
- `user_memory.rs` / `user_agent_memory.rs` — CRUD, embedding store, prune.
- `SessionExtraction` + `SessionExtractionStore` in `session.rs`.
- `MemorySubstrate::wipe_user(user_id) -> WipeUserCounts` wraps the three
bucket DELETEs in a single SQLite transaction so a partial failure rolls
back rather than leaving a user half-wiped.
- `MemorySubstrate::list_user_extraction_audit` joins with `sessions`
purely to surface the `session_deleted` flag — attribution comes from
the denormalized `user_id` column.
Kernel prompt-build gate:
- `build_user_memory_context(memory, user_id, agent_id, memory_cfg)`
returns `None` immediately when `memory_cfg.is_structured() == false`
(no SQLite roundtrip for default agents). Called from both prompt-build
sites in `kernel.rs`. `PromptContext::user_memory_context` carries the
value through to a new "What I Remember About You" section that is
skipped entirely for subagents and for empty indexes.
Control API (`/api/users/*`):
- `GET /api/users` — list users + default
- `GET /api/users/{user_id}/memory` — list topics
- `GET /api/users/{user_id}/memory/{topic}` — topic content
- `DELETE /api/users/{user_id}/memory/{topic}` — delete one (404 if absent)
- `DELETE /api/users/{user_id}/memory` — atomic wipe; returns per-bucket counts
- `GET /api/users/{user_id}/agents/{agent_id}/memory` — per-agent topics
- `DELETE /api/users/{user_id}/agents/{agent_id}/memory` — delete per-agent
- `GET /api/users/{user_id}/memory/audit` — extraction events with `session_deleted`
- `GET /api/users/{user_id}/memory/export` — JSON dump
- `parse_user_id()` accepts `"default"` or any non-nil UUID; rejects the
nil UUID (legacy anonymous-bucket sentinel) and the deprecated `"test"`
alias with 400.
- Module doc + per-handler `AUTHORIZATION:` comments call out the
single-tenant RBAC limitation (API-key holder == full memory admin).
PATCH /api/agents/:id/config gains `memory_system: Option<String>`,
validated against `MemorySystem`'s serde and persisted to disk via
`Registry::update_memory_config` + `persist_manifest_to_disk`. GET
/api/agents/:id surfaces both a flat `memory_system` string and the
nested `manifest.memory.system` shape so dashboards can read either form.
Tests (98 new):
- v10 migration creates all three tables + indexes and upgrades cleanly
from a v9 baseline without touching pre-existing rows.
- `wipe_user`: per-bucket counts, scope (user A's wipe leaves user B
untouched), idempotent zero-count run.
- `MemorySystem` defaults to `Summarization`; `[memory] system = "..."`
parses both variants; TOML round-trip skips `[memory]` for the default
case and re-emits it when opted in.
- `build_user_memory_context` returns `None` for default agents (even
with seeded topics), returns the formatted block when opted in with
topics, returns `None` when opted in with an empty index.
- `parse_user_id`: accepts `default` and any UUID, rejects nil UUID,
`"test"`, garbage, and empty string.
- Audit endpoint preserves attribution after session delete and flips
the `session_deleted` flag to true.
…amer
Adds the per-user memory producer code that populates the storage layer
introduced in PR 2. Entirely gated behind `manifest.memory.is_structured()`
— default (Summarization) agents see zero behaviour change, while opted-in
agents pay a structured-extraction LLM call when the context overflows and
a dream-consolidation pass when their session goes idle.
Components:
* `compactor::extract_structured()` — LLM-driven extraction producing
`SessionExtraction { facts, preferences, decisions, tasks, open_items }`.
Filters out `MessageSource::ContextInjection` so calendar/email summaries
do not bleed into long-term memory. Falls back to the existing extraction
(or an empty one) on LLM error or persistent parse failure — never
propagates an error to the agent turn.
* `compactor::needs_extraction()` + `count_tool_calls()` — trigger gates
on tokens-since-last or tool-calls-since-last thresholds.
* `compactor::SessionExtraction` re-export — runtime callers reach the
struct without pulling `openfang-memory` directly.
* `context_overflow::overflow_drain_count()` — peek at how many leading
messages overflow recovery would drain, without applying the trim, so
mini-dream can extract from them first.
* `mini_dream` module — in-loop consumer that, immediately before the
overflow recovery trims messages, runs extract + dream and persists the
resulting topics into the user memory store. Non-fatal: errors are
logged, never returned.
* `dreamer` module — session-end consolidation pass: merges all
`SessionExtraction` records accumulated during compaction plus the
recent message tail into 3–7 topic-organised user memory entries, with
conflict resolution (`supersedes` list) and expiry tagging.
* `agent_loop.rs` integration — two `if manifest.memory.is_structured()`
call sites (streaming + non-streaming) wire mini_dream into the iteration
prologue. Default agents skip entirely.
* `OpenFangKernel::trigger_session_dream()` + `run_session_lifecycle_loop()`
— background loop polls `agent_last_active` every 30 s; for each agent
idle longer than `[sessions] gap_secs`, fires the dream pass.
Per-agent gate: structured-memory only. Per-session gate: skip if no
real user activity (pure `[AUTONOMOUS TICK]` / `[SCHEDULED TICK]` /
`ContextInjection` sessions are no-ops — the production thundering-herd
fix from commit `aa4ec5c` on `branchu`).
* `SessionsConfig` (in `KernelConfig`) — `[sessions] gap_secs` knob,
default 300 s. Set to 0 to disable the dreamer loop entirely.
* `Message::context_injection()` helper — small constructor for the new
`ContextInjection`-tagged messages used by extract/dream tests and the
activity-gating predicate.
Tests (12 new, all green):
* `compactor::test_extract_structured_filters_context_injections`
* `compactor::test_extract_structured_fallback_on_empty_response`
* `compactor::test_needs_extraction_token_threshold`
* `compactor::test_needs_extraction_tool_calls_threshold`
* `compactor::test_count_tool_calls`
* `dreamer::test_dream_filters_context_injections`
* `dreamer::test_dream_result_has_topics`
* `dreamer::test_dream_conflict_resolution`
* `dreamer::test_dream_expiry_tagging`
* `dreamer::test_dream_fallback_on_parse_error`
* `mini_dream::test_structured_memory_gate_skips_default_agents`
* `mini_dream::test_structured_memory_gate_fires_for_opted_in_agents`
Built on top of `pr/memory-storage` (PR 2). No UI, route, schema, or
`MemorySystem`/`MemoryConfig` changes — that surface area is PR 2's.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… gate The dream activity gate (skip pure-tick / pure-context-injection sessions) was inlined in trigger_session_dream with zero direct test coverage. Pulled it out into a pub(crate) free function so the predicate can be unit-tested without spinning up a kernel, then added 8 tests covering the scenarios that matter for the thundering-herd fix: real user activity, autonomous ticks, scheduled ticks, mixed tick+real, pure context injections, empty sessions, assistant-only sessions, and ticks + context injections. This is the most critical test gap flagged by review — the predicate is the live protection against every heartbeat firing a dream pass.
… in dream extract_structured was declared Result<_, String> but every code path returned Ok(fallback()) — the Err arm was unreachable. Callers in mini_dream.rs and the kernel dream entry point had dead match/unwrap_or arms that could never trigger. Made the signature infallible and stripped the dead error handling from both call sites. While touching the kernel call site, also fixed the StubDriver fallback that wrapped resolve_driver().ok().unwrap_or_else(...). StubDriver.complete() always errors, so feeding it to extract_structured burned the full retry budget (3 attempts) producing nothing but warn logs before returning the fallback. Resolve the driver up front and return early when missing — matching the pattern that already existed for the later dream() call in the same function. No behavior change for the happy path; eliminates wasted retries and noisy warns when no driver is configured, and removes dead error code.
The lifecycle loop fires a dream task per expired agent every 30s. If a dream takes longer than gap_secs while the same agent keeps receiving activity, the loop could re-enter and spawn a second dream task for the same agent — racing on extractions, embeddings, and user-topic writes. Added agent_dream_locks: DashMap<AgentId, Arc<tokio::sync::Mutex<()>>> on OpenFangKernel and gate every spawned dream task on `try_lock` of the per-agent mutex. `try_lock` (not `lock`) so iterations never queue — if the previous dream is still running, this tick logs a debug and skips, and the next 30s tick will check again. The mutex is separate from agent_msg_locks because that one serializes user turns vs user turns for the same agent; dreams should not block user turns and vice versa — only dream-vs-dream needs serialization. Added test_agent_dream_locks_serialize_per_agent covering: (a) second dream for same agent fails try_lock while first is in flight, (b) a different agent is not blocked, (c) once the first releases the next dispatch can lock again.
… race doc Five smaller cleanups bundled together: - context_overflow: added test_overflow_drain_count_matches_recover_from_overflow (stage 1 + stage 2 + below-threshold paired tests) so the two implementations cannot silently drift apart. overflow_drain_count mirrors stages 1+2 of recover_from_overflow by hand; if either threshold moves, the paired test catches it. - compactor: marked count_tool_calls + needs_extraction with #[allow(dead_code)] and a TODO(PR4-or-later) explaining the intent. Today the extraction path is triggered by the context-overflow signal (overflow_drain_count → mini-dream); these helpers are the alternative cadence-based gate that will land in a follow-up PR once agent_loop tracks per-session counters. Keeping them next to CompactionConfig so the policy stays co-located. - compactor: tightened build_conversation_text visibility from pub to pub(crate). It is only called inside this crate (compactor and dreamer). - kernel: added a NOTE in run_session_lifecycle_loop explaining the benign race between the agent_last_active snapshot and the per-agent remove. The race only delays a dream by one gap_secs window; the dream itself reads the live session, so content correctness is preserved.
Dashboard surfaces for the structured memory feature. Per-agent Memory
System dropdown in the agent Config tab (opt-in). New Users page with
Memory and Extraction Audit tabs for viewing/deleting/exporting per-user
accumulated memory. Pure UI work — all backend routes and the PATCH
field already shipped in the memory storage PR.
- Memory System selector in the agent detail Config tab:
- <select id="memory-system"> in index_body.html with two options
("Summarization (default)" / "Structured (LLM extraction + dreamer)")
plus help text describing both modes.
- Wired to configForm.memory_system via Alpine x-model; saveConfig()
PATCHes the field along with the rest of the form.
- buildConfigForm reads memory_system from the flat field or the
nested manifest.memory.system shape exposed by GET /api/agents/:id.
- New Users page at #users route:
- Top-level sidebar entry under the System group; #users added to
validPages in app.js.
- Left rail: configured users, default user badged as "default".
- Right panel with Memory + Extraction Audit tabs.
- Memory tab: topic list with View / Delete actions, "Delete All
Memory" button, "Export JSON" header button that downloads
memory_<uid>_<yyyymmdd>.json from the export endpoint.
- Audit tab: per-event log (timestamp, agent name, session, per-field
counts) backed by /memory/audit.
- Topic viewer modal fetches full topic content on demand.
- New crates/openfang-api/static/js/pages/users.js; bundled into the
dashboard via webchat.rs alongside the other page scripts.
Styling matches the existing dashboard conventions (form-group,
form-select, text-xs.text-dim, card, tabs).
No backend changes: all consumed routes (GET /api/users,
GET/DELETE /api/users/{id}/memory[/{topic}], /memory/audit, /export)
and the memory_system PATCH field already exist on pr/memory-producer.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds `KernelHandle::query_hand_ephemeral(hand_name, prompt, max_output_tokens, timeout) -> Result<String, String>`, a one-shot synchronous query surface that runs a hand-owned agent as an ephemeral subagent and returns its text response wrapped in the existing untrusted-content markers. This is the foundation primitive that issue RightNow-AI#896's maintainer (@jaberjaber23) asked for as a prerequisite for any caller (continuous compaction, conversational queries, etc.) that needs to ask a hand for a paragraph synchronously. What this PR adds ================= * `KernelHandle::query_hand_ephemeral` — new trait method with a default impl that returns `Err("not supported")` so existing test doubles don't break. * `OpenFangKernel::query_hand_ephemeral` — production impl. Resolves the hand-owned agent by name (UUIDs also accepted), clones its manifest, overrides `max_tokens` on the clone, builds a one-shot `CompletionRequest`, calls the driver under `tokio::time::timeout(timeout, ...)`, extracts the text, and wraps it in the existing `wrap_external_content` helper with a synthetic `hand://{hand_name}` source URL. * `OpenFangKernel::set_test_default_driver` — `#[cfg(test)]` helper that swaps the boot-time `default_driver` for a test double. Used by the new tests to inject `RecordingDriver` / `SleepDriver` past `resolve_driver`'s fresh-driver creation, which fails by design when no API key is set for the manifest's provider. What this PR deliberately does NOT add ====================================== This is the foundation layer only — no callers in production code paths. Per the spec, the next PR (continuous compaction) brings the first caller. The primitive is dead-code in this commit; only the tests exercise it. Design decisions worth flagging for review ========================================== 1. **Bypass `run_agent_loop` entirely.** The agent loop has many side effects we MUST NOT trigger for an ephemeral spawn (canonical session append, JSONL mirror, daily memory log, mini-dream, metering / scheduler quota recording, pre-emptive compaction, agent-message audit log). A new "one-shot variant" in `agent_loop.rs` would have meant guarding all of those with an `is_ephemeral` flag, which is invasive and easy to break. Instead `query_hand_ephemeral` calls `driver.complete()` directly with a minimal `CompletionRequest`. Tool calls are intentionally disabled for this primitive — it's summarisation, not a tool-use loop. 2. **Wrapping option 1 — reuse `wrap_external_content`.** The maintainer gave two options for the untrusted-content wrap. We picked option 1 (pass `hand://{hand_name}` as the source URL into the existing helper) because the output is readable, the SHA-boundary syntax is identical to what `web_fetch` already emits (so the downstream LLM needs no new syntax to learn), and there's now exactly one wrapper to maintain. A dedicated `wrap_hand_content` would have duplicated the boundary-derivation logic for cosmetic reasons. See `test_wrap_external_content_handles_hand_scheme` for the output contract. 3. **`max_output_tokens` lands on the request, not as post-hoc truncation.** We clone the manifest, mutate the clone's `model.max_tokens`, build the `CompletionRequest` from the clone, and pass it to the driver. The persisted manifest is byte-for-byte unchanged — verified by `test_query_hand_ephemeral_max_output_tokens_applied`. 4. **`default_driver` wrapped in `RwLock`** so tests can swap it. The non-test fallback path (`resolve_driver`'s "create_driver failed, use default" branch) reads through the lock with a cheap shared acquire; production hot path is unchanged. Tests ===== All in `crates/openfang-kernel/src/kernel.rs` test module: * `test_wrap_external_content_handles_hand_scheme` — wrapper output is reasonable for `hand://` URIs. * `test_query_hand_ephemeral_unknown_hand_returns_err` — 404 path. * `test_query_hand_ephemeral_returns_wrapped_response` — happy path, output carries the external-content boundary + untrusted label + the hand's payload. * `test_query_hand_ephemeral_does_not_persist_session` — verifies the hand's SQLite session count AND its canonical context are byte-for-byte identical before/after the call. This is the security invariant the maintainer flagged on issue RightNow-AI#896. * `test_query_hand_ephemeral_timeout_returns_err` — slow driver triggers the recognisable `hand query timed out after Ns` error. * `test_query_hand_ephemeral_max_output_tokens_applied` — recording driver captures the `CompletionRequest`; asserts `max_tokens=256` override landed, persisted manifest still says 4096, exactly one user message, no tools. * `test_query_hand_ephemeral_default_impl_returns_err` — default trait impl returns `Err("not supported")` so other `KernelHandle` impls (test doubles in particular) don't need to implement the new method. Verification ============ * `cargo fmt --check` clean. * `cargo clippy --workspace --tests --all-targets -- -D warnings` clean. * `cargo build --workspace` clean. * `cargo test -p openfang-runtime --lib` — 954 pass, 0 fail. * `cargo test -p openfang-kernel --lib` — 315 pass; only failures are the pre-existing `test_referenced_providers_only_includes_configured_ones` and `test_1188_referenced_providers_resolves_alias_to_provider`, both reproducible on `pr/memory-ui` without my changes. * Docker image rebuilt and openfang container restarted — boots cleanly, all 8 background hands resume normally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
KernelHandle::query_hand_ephemeral(hand_name, prompt, max_output_tokens, timeout) -> Result<String, String>— a synchronous one-shot ephemeral hand-query primitive.Step 1 of 2 of the #896 work. @jaberjaber23 explicitly recommended this split in his 2026-05-12 comment:
This is PR 1. PR 2 (the consumer + continuous compaction) is the follow-up, depending on this primitive.
What this PR adds
KernelHandle::query_hand_ephemeralwith default impl returningErr(\"not supported\")so test doubles don't break.OpenFangKernel:memory.save_session/memory.save_canonical_session/ JSONL mirror / daily memory logrun_agent_loopentirely — callsdriver.complete(request)directly withtools = vecmodel.max_tokensfor this call only — persisted manifest byte-for-byte unchangedtokio::time::timeout(timeout, ...)wrap_external_content(\"hand://{hand_name}\", body)— the same untrusted-content markerweb_fetchusesWhy the wrap matters
The wrap uses the same
<<<EXTCONTENT_xxx>>>...<<</EXTCONTENT_xxx>>>boundary pattern asweb_fetch(crates/openfang-runtime/src/web_content.rs:49). A test exercises a clearly-malicious payload:Test asserts the wrap is content-agnostic — opening sentinel precedes the payload, closing sentinel follows it, the
treat as untrustedlabel is present, thehand://evil-handsource identifies the origin. The downstream LLM treats the entire block as data, not instructions.Why ephemeral matters
Without isolation, every consumer call would append the query prompt into the hand's canonical session and run its full agent loop with side effects (
scheduler.record_usage,mini_dream,audit_log, JSONL mirror). A test asserts the hand's canonical session message count andcanonical_contextbytes are unchanged before/after a call.Test plan
cargo fmt --check,clippy -D warnings,build --workspaceall cleancargo test -p openfang-types --lib398/398 passcargo test -p openfang-runtime --lib954/954 passcargo test -p openfang-kernel --lib315 pass (2 pre-existingtest_referenced_providers_*failures unrelated)test_query_hand_ephemeral_returns_wrapped_response(happy path + wrap)test_query_hand_ephemeral_does_not_persist_session(SQLite-direct verification)test_query_hand_ephemeral_timeout_returns_errtest_query_hand_ephemeral_unknown_hand_returns_errtest_query_hand_ephemeral_max_output_tokens_applied(recording driver inspectsCompletionRequest.max_tokens)test_query_hand_ephemeral_default_impl_returns_errtest_query_hand_ephemeral_wraps_injection_attempt(security claim explicit)test_wrap_external_content_handles_hand_schemeconfirmshand://URI scheme produces sensible wrapper outputReviewed independently
Carved and reviewed by an independent agent. One issue flagged (no explicit malicious-payload test) and addressed in the same commit before submission.
Standalone
The primitive is dead-code in production paths in this PR — only the tests exercise it. PR 2 brings the first caller. This matches the maintainer's exact PR-split recommendation.
What this is NOT in this PR
[[compaction.context_sources]]config (PR 2)