feat: per-conversation wiki digest#51
Merged
Merged
Conversation
A doubly-linked-list + map ring (O(1) touch/remove/rename) tracking pages the user or agent actively used — Create, Update, Get, Move (both ends), Delete, GetBacklinks — rather than what disk mtime says was last changed. Distinguishes 'intent' from 'sync churn' for the upcoming digest's recents signal. Touches fire only on the success path: a failed CreatePage on an existing page, or a GetPage on a missing path, does not pollute the ring. MovePage renames in place so a move shows up as one continuous use rather than dropping the old name and freshly inserting the new. DeletePage drops the entry — a deleted page in recents would mislead the agent. Capacity defaults to 20 (the plan's recents_size default); a later step will swap this for a config-driven value. Persistence lives in state.go (next steps); recentsLRU itself is storage-agnostic, exposing load/snapshot/takeDirty for a ticker to consume. Step 1 of the digest plan (mind-map/plans/digest).
A deterministic, frequency-based summary of 'what is this wiki about'. One pass over pages.body produces unigram + bigram counts, filters through a built-in English stopword list (plus user extras), and selects top K with alphabetical tie-break for stable output across rebuilds. Tokenizer design notes: - Custom Go pass over pages.body rather than reaching into FTS5's C tokenizer. modernc.org/sqlite doesn't cleanly expose token sequences to Go, and reusing FTS5 would lose bigram ordering or drag in CGO-adjacent complexity. unicode61-equivalent for our purposes: lowercase, non-alnum split, with hyphens/underscores preserved mid-token so 'mind-map' and 'page_count' survive as one token each. - Wikilink brackets stripped so [[projects/mind-map]] contributes its target words to the cloud naturally. - Code fences and inline code are NOT stripped: identifiers in code are real 'about' signal in a technical wiki; dropping them would flatten the cloud. - Bigrams require both endpoints to pass the stopword filter (the plan's chosen lean on open question #2): 'the wiki' must not appear just because 'the' is high-frequency. - Single-char and all-digit tokens are dropped as a low-information short-circuit before the stopword map lookup. A single-slot cloudCache exposes Set/Get with defensive copies so the upcoming 5-minute rebuild ticker (Step 6) can swap clouds without readers racing on slice aliasing. Frequency, not TF-IDF, for v1 (plan open question #1 lean). Easy swap later if the cloud reads noisy in practice. Step 2 of the digest plan (mind-map/plans/digest).
Digest() returns a structured Digest{PageCount, Cloud, Recents,
Areas, Markdown} — the typed fields drive the WebUI / HTTP JSON, the
markdown is what an LLM consumes in a per-conversation orientation
prompt. Shape matches the example in the plan:
This wiki contains N pages across M areas. About:
term1, term2, term3, …
## Areas
- foo (45) — foo/index: "Foo Area"
- bar (12)
- …
## Recently active
- path/one
- …
Full skill: SKILL.md. Use `get_wiki_digest` for the live version.
Trim discipline when over the soft cap (default 4096 bytes): drop
recents from the tail first, then cloud, never areas. Areas are the
smallest section and the most structurally important — losing them
means losing the map of the wiki. Footer hint is also preserved.
Caching is version-keyed: cloudCache and recentsLRU each expose a
monotonic counter; digestCache stores (cloudVer, recentsSeq,
pageCount) alongside the cached *Digest and rebuilds on any
mismatch. pageCount is part of the key because pure content edits
that don't touch the LRU still change the header sentence. CRUD
operations automatically bust the cache through their existing LRU
touches, so callers don't need to invalidate explicitly.
Area summaries are driven by the indexed `pages` table, not by
filesystem listing — the source of truth for the digest is what's
queryable, not what's on disk. Flat-rooted pages (no slash) are
ignored: a top-level page is not an area.
Also: hook Reindex Phase 4 into recents.remove() so pages that
vanish via raw-filesystem delete + reindex (common after `git pull`
in sync) don't linger in the LRU as 404 candidates. With this hook,
the renderer can trust the LRU as-is — no filter, no purge — and
the LRU stays consistent with `pages` at all times.
Step 3 of the digest plan (mind-map/plans/digest).
A wiki_state table (key/value/updated) stores the LRU snapshot and the word/phrase cloud in JSON so a freshly-restarted server has a useful digest immediately, not after the first 5-minute ticker fires. The rendered digest markdown is NOT persisted — it's sub-ms to re-assemble from cloud + LRU, and the in-memory digestCache already handles 'don't re-format on every hit'. Adding a third write path buys nothing measurable. Load happens at the tail of Open(), after Reindex. Persisted recents are filtered against the current `pages` table so paths that vanished while the server was off (deleted on disk, or sync-pulled away) don't reappear in the LRU as 404 candidates. The cloud loads as-is — global frequency counts remain a reasonable approximation across small content changes, and the next rebuild ticker (Step 6) will refresh it within minutes. Save points: - persistRecents() — called by Close() for a clean shutdown flush and (in Step 6) on a 30s dirty-gated ticker. - persistCloud() — called by Step 6's 5m rebuild ticker. No-ops when the cloud has never been populated so we don't clobber a previously-good copy with an empty placeholder. Failure modes are deliberately lenient: a corrupt JSON row, a missing table, or an unreachable column logs at WARN and falls back to fresh-wiki state rather than panicking. The digest is an orientation signal, not a correctness boundary; losing it shouldn't take down the server. Also: made Close() idempotent via sync.Once. testWiki's t.Cleanup plus explicit defer Close in state tests would otherwise run the persistRecents flush against an already-closed DB. Step 4 of the digest plan (mind-map/plans/digest).
Three surfaces, one signal:
- MCP get_wiki_digest — new tool. Returns the structured Digest
(page count, cloud terms, recents LRU, per-area summaries, rendered
markdown). Tool description nudges agents to call it at the start
of every conversation.
- MCP get_wiki_context — the legacy {page_count, recent_pages,
top_level_dirs} shape is preserved verbatim so existing clients
(opencode, Claude Code in the wild, per plan open question #4)
keep working. New fields (cloud_terms, recents, areas, markdown)
are layered on the same response — old clients ignore them; new
clients get the orientation upgrade without a tool-name change.
- HTTP GET /api/digest — returns the full Digest as JSON. Intended
for the WebUI (so it can render its own word-cloud or recents
widgets off the structured fields rather than parsing the markdown)
and for non-MCP scripts/tests.
Implementation: WikiContext gets new optional fields (omitempty so
the JSON shape is additive). Wiki.Context() delegates to Digest()
to populate them; a digest failure logs at WARN but doesn't fail
the Context call — the legacy fields are still valuable on their
own, and the digest is an enhancement, not a contract.
Step 5 of the digest plan (mind-map/plans/digest).
A new internal/digest.Manager mirrors internal/sync.Manager's shape:
NewManager(*wiki.Wiki, Options) → Start(ctx) / Stop() lifecycle, with
the embedder (cmd/mind-map) supervising. Sync's separation between
storage engine and goroutine-owning supervisor is a good pattern;
the digest follows it so cmd/mind-map sees a uniform 'subsystems are
supervised, not implicit' model.
Two tickers in one goroutine:
- cloud_refresh (5m default): full cloud rebuild via Wiki.BuildCloud,
SetCloud, PersistCloud. Synchronous first build on Start() so the
very first post-open digest read has cloud terms — cold start
over a 1k-page wiki is < 100ms.
- recents_refresh (30s default): gated PersistRecents call. Skips
SQLite writes on idle servers via a non-mutating peekDirty
probe; only takeDirty (which clears the flag) runs after a
successful write.
Shutdown contract: Stop() cancels the loop's context, the loop runs
one final detached-context flushRecents so the last ~30s of touches
land on disk, then closes done. Idempotent via sync.Once on both
Start and Stop. The pairing 'defer dm.Stop(); defer w.Close()' in
cmd/mind-map ensures the ticker quiesces before the DB closes
(prevents 'sql: database is closed' races during shutdown).
Exposed helpers on *Wiki:
- BuildCloud / SetCloud / PersistCloud — public entry points
for the supervisor; the lowercase internals stay for tests.
- PersistRecents — clears dirty only on a successful write so
a failed persist retries on the next tick rather than dropping
the diff silently.
- RecentsDirty — read-only peek, used by the manager's gate.
Wiring: both runStdio and runHTTPServer in cmd/mind-map start a
manager after wiki.Open and Stop it before w.Close. The HTTP path
derives the manager's context from stopCh so /api/restart and
ctrl+C take down the tickers cleanly. The service-mode launcher
delegates to runHTTPServer so it picks up the wiring for free.
Step 6 of the digest plan (mind-map/plans/digest).
Adds the digest section to config.json with the five knobs called
out in the plan:
{
"digest": {
"cloud_size": 50, // top-K terms in cloud
"recents_size": 20, // active-use LRU capacity
"cloud_refresh": "5m", // rebuild interval (>=30s)
"stopwords_extra": ["TODO"], // appends to built-in EN list
"max_render_bytes": 4096 // soft cap on rendered markdown
}
}
All fields are optional. A legacy config without a digest section
loads cleanly and yields zero-valued fields that consumers
interpret as 'use built-in defaults' — covered by an explicit
backwards-compat test. ParseCloudRefresh floors at 30 seconds: any
faster is wasted CPU for a signal nobody reads that often.
Wiring:
- wiki.Open(dir, opts ...OpenOption) — added variadic options so
Open(dir) callers (10 in the tree, mostly tests) keep compiling
unchanged. WithOptions(wiki.Options{...}) sets RecentsSize,
MaxRenderBytes, and StopwordsExtra in one call. MaxRenderBytes
semantics: > 0 trims, == 0 uses default, < 0 disables trimming.
- cmd/mind-map: both runStdio and runHTTPServer now load config
before opening the wiki, pass digest tunables through helpers
wikiOptionsFromConfig / digestOptionsFromConfig. Stdio mode
previously bypassed config entirely; now both modes are
consistent and a single config.json controls both.
- digest.Manager: StopwordsExtra is now forwarded into BuildCloud
on every tick rebuild, not just the synchronous first build.
The plumbing existed but was dropped on the floor — fixed.
Docs:
- SKILL.md: rewritten Getting Oriented section to feature
get_wiki_digest as the canonical 'start of conversation' call,
with get_wiki_context retained for backwards compatibility.
Tool list updated.
- README.md: tool count 10 → 11, new get_wiki_digest row, the
legacy get_wiki_context row mentions it now returns digest
fields too, and Wiki Features gets a digest bullet.
Step 7 of the digest plan (mind-map/plans/digest). Plan now fully
implemented end-to-end.
Five new controls in the settings panel, between Sync and Index:
- Extra Stopwords → tag-input (comma / space / Enter to commit a
chip; Backspace on empty input pops the last)
- Cloud Size → number input, blank = server default (50)
- Recents Size → number input, blank = server default (20)
- Cloud Refresh → text input (5m, 10m, etc.), blank = 5m
- Max Render Bytes → number input, 0 disables trim, blank = 4096
A new TagInput component (webui/src/TagInput.tsx) implements the
chips UX: type → commit on separator → click × or Backspace to
remove. Pasted strings with commas or whitespace fan out into
multiple chips in one shot, so an operator can paste
'TODO, FIXME, see also' and get four tags. Duplicate detection is
case-insensitive but display preserves what the user typed; the
case-folding for matching happens server-side in the cloud
builder.
CSS uses the existing --accent / --border palette so chips themeIn
match the rest of the settings UI in both light and dark mode.
No backend changes: putSettings already unmarshals the full
config.Config (which gained the Digest section in step 7 of the
digest plan), so the new fields round-trip transparently. Changes
take effect on next restart — same contract as Sync.Interval; the
existing 'Settings saved. Restart to apply.' banner already says so.
Closes the loop on the digest plan's stopword tuning observation:
operators can now add domain-specific noise words from the UI
without editing config.json by hand.
aniongithub
added a commit
that referenced
this pull request
May 25, 2026
Resolves a single conflict in internal/mcp/server.go where main's digest PR (#51) and this branch both added new MCP tools. Resolution: - get_wiki_context: take main's revised description that mentions the new digest fields (auto-merged cleanly outside the conflict region). - get_wiki_digest: keep main's new tool registration AND handler. - get_page handler: drop the old main-side getPage that takes pagePathInput. This branch's slice 3 already replaced it with getPageWithFlags (in images.go) which accepts the new IncludeImages / IncludeImageMetadata flags via getPageInput. Keeping both would mean two handlers for the same tool name. - Placeholder comment in server.go points readers at images.go for the new get_page handler. Verified: go vet ./... clean, go test ./... passes (8 packages, including the new internal/digest package from main).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the Digest Plan — Persistent Per-Conversation Wiki Context end-to-end.
What
A compact, always-current orientation blob (~4 KB / ~1K tokens) that mind-map-aware agents can consume at the start of every conversation. Three signals, deterministic, no LLM in the regeneration loop:
mind-map,page_count,web-uisurvive intact thanks to a custom Go tokenizer (FTS5's tokenizer doesn't cleanly expose token sequences in pure-Go SQLite).recent_pages(mtime-sorted), which surfaces sync churn rather than intent.pagestable, with the area's index page title as a one-line description.How
Seven commits, one per plan step, plus one for the WebUI:
6c54f8df9a72132480dd65712fe1wiki_statetable for recents/cloud persistence8ac8ad0get_wiki_digest+ HTTPGET /api/digest+get_wiki_contextbackwards-compat389e114internal/digest.Managerbackground tickers (cloud 5m, recents 30s)97aa7f9config.DigestConfig+ SKILL.md / README updates380df0eSurfaces
get_wiki_digesttool.get_wiki_contextkeeps its legacy{page_count, recent_pages, top_level_dirs}shape and gains the new digest fields (plan open question Add structured logging with log/slog and panic recovery #4 — keep old clients working).GET /api/digestreturns the fullDigestJSON for WebUI / non-MCP callers.digestsection inconfig.json(cloud_size,recents_size,cloud_refresh,stopwords_extra,max_render_bytes). Backwards-compatible with pre-digest configs.Lifecycle
A new
internal/digest.Managermirrorsinternal/sync.Managerexactly:NewManager(*wiki.Wiki, Options) → Start(ctx) / Stop(). Synchronous first cloud build onStartso cold-start digests have anAbout:line immediately. Wired into both stdio and HTTP modes incmd/mind-map;defer dm.Stop()is registered beforedefer w.Close()so the ticker quiesces before the DB closes (nosql: database is closedraces).Persisted state survives restarts: a freshly-restarted server has a useful digest immediately, not after the first 5-minute ticker tick.
Live data
Sample from
GET /api/digestagainst the in-container mind-map docs wiki:Cloud surfaces real domain terms (
mind-map,mcp,wikilinks,web-ui); bigrams come through (web ui,agents mcp-tools,concepts wikilinks); the hyphen-preservation rule pays off. Noise terms (see,also,same,yes) motivated the WebUI stopword tag-input commit.Tests
internal/wikiinternal/digestinternal/configinternal/mcpget_wiki_digest)internal/httpapiGET /api/digest)go test ./...,go vet ./...,go build ./...andnpm run buildall clean.Out of scope (deliberate)
mind-map install-context-hooks— installer subcommand to write per-agent rules files. Plan calls this a separate follow-up./api/digest; rendering is a separate UI task.