🤖 feat: AI-generated sidebar status via small-model loop#3238
🤖 feat: AI-generated sidebar status via small-model loop#3238ammar-agent wants to merge 10 commits intomainfrom
Conversation
Adds an AgentStatusService that periodically produces a sidebar agent status using the same small model used for workspace title generation (NAME_GEN_PREFERRED_MODELS), replacing the legacy todo-derived status as the primary source while keeping todoStatus as a fallback. - New propose_status tool + workspaceStatusGenerator (mirrors title gen) - Trailing transcript window capped at ~8k tokens via TokenizerService - Focus-aware cadence: 30s focused, 2m unfocused (WindowService now emits focus-change events) - Idle/frozen-chat dedup: skips regeneration when the trailing-window hash is unchanged; hash persisted across restarts via ExtensionMetadataService - Sidebar precedence: displayStatus > aiStatus > todoStatus > fallback
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
@codex review Please review this PR. Key areas:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1c982f6c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that during long-running streams, the assistant's current text + tool activity lives in partial.json (via HistoryService.writePartial) before being committed to chat.jsonl. getLastMessages() only reads committed lines, so the trailing-window hash would stay constant for the whole stream and the small model would never see the live activity the status is supposed to surface. buildTrailingTranscript now appends readPartial(workspaceId) to the committed tail before formatting/trimming, so the hash changes — and the status refreshes — as the stream progresses.
|
@codex review Addressed the partial.json feedback: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b52db3d1d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that with MAX_CONCURRENT=1 and a fixed iteration order, the first workspace in the list would always become re-eligible (focused interval = 30s) before deeper workspaces got their turn. Workspaces 4+ would never produce a status. processEligibleWorkspaces now collects every eligible workspace first, sorts by lastRanAt ascending, and dispatches in that order. Workspaces that have never run sort with lastRanAt=0 so they preempt previously-run ones. Adds a 3-workspace round-robin test that asserts every workspace gets a turn before any repeats.
|
@codex review Addressed the fairness feedback: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 265f907bf5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that emitWorkspaceActivityUpdate (the historical wrapper) swallows disk errors. The previous code set state.lastInputHash BEFORE that non-throwing write, which meant a transient extensionMetadata.json write failure would leave the in-memory hash advanced even though the status never reached disk or the frontend. The next tick would dedup against the new hash and never retry — bricking the feature for that session. agentStatusService now persists directly via extensionMetadata.setAiStatus (which throws on failure) and only sets state.lastInputHash after the write resolves, then emits via the new public workspaceService.emitWorkspaceActivity. The legacy non-throwing updateAiStatus wrapper is removed since this was its only caller. Adds a regression test that injects a disk-write failure and asserts the next tick retries against the same transcript.
|
@codex review Fixed the persist-vs-hash ordering. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d2ac73d1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that save() previously logged-and-resolved on disk-write failures, so the new setAiStatus() would still return a snapshot to AgentStatusService even when nothing was persisted, advancing the dedup hash on a real disk-full / permission error and defeating the retry behavior added in the previous commit. save() now rethrows after logging. Existing callers that historically relied on the swallow behavior all wrap their setX/updateX calls in emitWorkspaceActivityUpdate, which still downgrades the throw to a logged warning, so their semantics are unchanged. AgentStatusService calls setAiStatus directly so it now sees real disk failures and skips the lastInputHash advance, leaving the next tick free to retry.
|
@codex review Made |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b118b76b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two issues from Codex: 1. ExtensionMetadataService.initialize() awaits clearStaleStreaming(), which calls save(). With save() now throwing, a transient disk-write failure could abort startup — violating the AGENTS.md rule that startup-time initialization must never crash the app. initialize() now wraps both the directory-create and clearStaleStreaming in try/catch and logs-and-continues, restoring the historical log-and-swallow boundary at the startup edge while keeping save()'s strict semantics for AgentStatusService. 2. The active-workspace branch reads the explicit transient status from aggregator.getAgentStatus() (hydrated from muxMetadata.displayStatus for heartbeat / idle-compaction / background turns). The previous precedence (displayStatus ?? aiStatus ?? todoStatus ?? fallbackAgentStatus) buried that aggregator-set status under aiStatus for active workspaces. We now collapse displayStatus and fallbackAgentStatus into a single transientStatus tier so explicit system-set messages always outrank an AI summary, regardless of whether the workspace is active or inactive.
|
@codex review Two follow-ups:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2264113ab8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex flagged that generateWorkspaceStatus can take seconds-to-minutes, and stop() only clears timers/bookkeeping — it does not cancel an in-flight runForWorkspace continuation. So a write/emit can still happen after the service has been stopped, leaking metadata mutations past the declared lifecycle. runForWorkspace now checks this.stopped both immediately after the generator returns and after the disk write resolves, returning early in both cases. Adds a regression test that pauses generation, calls stop(), then releases generation and asserts no setAiStatus / emit fires.
|
@codex review Addressed the lifecycle hazard: |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
- Drop cross-restart dedup: AgentStatusService no longer hydrates the trailing-window hash from disk. In-session dedup still skips identical inputs; cost is one extra LLM call per workspace per app launch. - Drop ExtensionMetadataService.getAiStatusInputHash + the aiStatusInputHash field on ExtensionMetadata. - Drop the inputHash arg on setAiStatus; collapse if/else into a direct assignment. - WindowService: drop EventEmitter inheritance + focus listeners. isFocused() now delegates to BrowserWindow.isFocused() directly. Drops the focused field, setFocused(), focus/blur wiring, and the test-stub defensive .on check. - AgentStatusService: drop __test__ exports, inFlightCount, markInFlight, drainInFlight, hashesHydrated, hydratePersistedHashes, resolveTokenizerModel; merge tick()/runTick(); use FALLBACK_TOKENIZER_- MODEL constant in buildTrailingTranscript so we don't double-call getWorkspaceTitleModelCandidates per workspace. - Trim verbose comments throughout; collapse multi-paragraph rationale to one-line summaries where the code is self-explanatory. Net: -438 lines, no behavior changes for the documented use case (in-session dedup, focus-aware cadence, round-robin fairness, persist- before-hash, lifecycle stop guard, partial-stream inclusion).
|
@codex review Refactor pass for simplicity / minimal LoC / maintenance burden. Net delta: -438 lines.
All previously-resolved review concerns (round-robin fairness, persist-then-hash, partial-stream inclusion, lifecycle stop guard, transient status precedence) are preserved and still tested. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a67d08ddf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
AgentStatusService is a background scheduler with no causal connection to user activity, so its writes must not bump 'recency' — that would re-sort idle workspaces every tick and mark them unread. For existing entries this was already a no-op in practice (mutateWorkspaceSnapshot's recency arg only seeds new entries), but the Date.now() seed would fire for the rare case where a workspace has chat history but no metadata entry yet. Inline the create logic so the seed is recency=0 for genuinely new entries and the existing recency is preserved otherwise. Adds a test that pins the contract directly against ExtensionMetadataService.
|
@codex review Addressed P1: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5325acf7a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…etch Codex review: stop() during the earlier awaits (buildTrailingTranscript and getWorkspaceTitleModelCandidates) wouldn't prevent the multi-second provider call from firing — only the persist after it. Add a stopped check immediately before generateWorkspaceStatus. Also pin the new behavior with a test that gates the candidates fetch on a release signal: stop fires while candidates are pending, the generator never runs.
|
@codex review Addressed P2: added a |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Adds an
AgentStatusServicethat produces the sidebar agent-status text using the same "small model" path as workspace title generation (NAME_GEN_PREFERRED_MODELS— Claude Haiku → GPT-mini), instead of inferring it from the workspace's TODO list. The status is regenerated on a focus-aware cadence, skips when the trailing transcript hash is unchanged, and is fed at most ~8k trailing transcript tokens.Background
Today the sidebar's per-workspace status string is derived from
deriveTodoStatus, which picks the first in-progress / pending / last-completed TODO. That's a noisy proxy for "what is the agent doing right now":todo_write, so a long bash run or a multi-step assistant turn shows a stale TODO.The user asked for the status to come from a small LLM call instead, with sensible cost controls (token cap, cadence, dedup, focus gating).
Implementation
Constants (
src/constants/agentStatus.ts)30s, unfocused cadence120s, scheduler tick10s, startup delay30s.80messages, ≤8000tokens, with a per-message char cap of4000.1global generation in flight.Small-model path (
src/node/services/workspaceStatusGenerator.ts)workspaceTitleGenerator.ts.streamText+ a newpropose_statustool (added totoolDefinitions.ts); reusesmapNameGenerationError/mapModelCreationErrorso all the existing 401/402/403/429/5xx classifications apply.<transcript>markers and instructs the model to focus on current activity rather than overall task scope.propose_statusis excluded from defaulttools.tsregistration with the same comment aspropose_name.Scheduler (
src/node/services/agentStatusService.ts)setIntervalticking every 10s. Per-workspacelastRanAt+ a current focused/unfocused interval gate the actual generation.getLastMessages(80)+readPartial→ format → trim oldest entries until under the token budget (usingTokenizerServicewith a fallback model constant), then SHA-256 the transcript.lastRanAtascending soMAX_CONCURRENT=1doesn't starve workspaces deeper in the list (round-robin without an explicit queue).runTickawaits all dispatched workspaces (via trackedinFlightPromises) instead of fire-and-forget, so production schedulers and tests can observe completion deterministically.if (this.stopped) returnguards (post-generation, post-persist) so astop()mid-flight never leaks writes past the service's lifecycle.Focus tracking (
src/node/services/windowService.ts)WindowService.isFocused()delegates toBrowserWindow.isFocused()directly. Falls back totruein non-desktop contexts (CLI server, tests) so backend services don't throttle themselves to "unfocused" cadence when there is no window.Schema + persistence
aiStatusfield onExtensionMetadata+WorkspaceActivitySnapshotSchema.extensionMetadata.setAiStatusdirectly (bypassing the historicalemitWorkspaceActivityUpdatewrapper that would have swallowed disk errors) and emits the activity snapshot viaworkspaceService.emitWorkspaceActivityonly after the disk write resolves successfully.ExtensionMetadataService.save()re-throws on failure to support this;initialize()keeps its startup-safety semantics by wrapping its mutation calls in a logged try/catch per AGENTS.md.Frontend (
src/browser/stores/WorkspaceStore.ts)transientStatus ?? aiStatus ?? todoStatus.transientStatuscollapsesdisplayStatus(inactive workspaces) andaggregator.getAgentStatus()(active workspaces) so explicit system-set messages always beat aiStatus regardless of which branch produces them.applyWorkspaceActivitySnapshotdiffsaiStatusso a fresh small-model status triggers re-render.Validation
workspaceStatusGenerator.test.ts: prompt structure (transcript delimiters, empty-transcript sentinel) + empty-candidates short-circuit.agentStatusService.test.ts: generates on first tick, dedups when transcript is unchanged, regenerates after history changes, includes the partial assistant message mid-stream, skips empty workspaces, focused-vs-unfocused cadence boundaries, round-robin fairness across multiple workspaces, archived workspaces are skipped, stops cleanly mid-generation, persist failures don't advance the dedup hash.WorkspaceStore.test.ts: aiStatus beats todoStatus; displayStatus still beats aiStatus.make static-checkclean (eslint, typecheck, prettier, doc-link check).make testshows only pre-existing flaky-timing failures present onmain(CLI auth + GitStatusStore subset). My new files pass cleanly.Risks
mapNameGenerationErrorretry budget capped at 3 candidates.Codex review history (resolved)
chat.jsonl. Fixed by appendingreadPartialso the hash refreshes mid-stream.MAX_CONCURRENT=1and fixed iteration order, deep workspaces would starve. Fixed withlastRanAtascending sort.updateAiStatuswent through a wrapper that silently swallowed disk errors. Fixed by callingsetAiStatusdirectly and updating the hash only on success.save()swallowing errors:ExtensionMetadataService.save()itself swallowed errors. Fixed by re-throwing;initialize()wraps its calls in try/catch to preserve startup-safety.aggregator.getAgentStatus()was being buried under aiStatus for active workspaces. Fixed with thetransientStatustier.runForWorkspacecould persist/emit afterstop(). Fixed with twothis.stoppedchecks at await boundaries.Simplification refactor
After the original implementation landed and Codex approved, the diff was simplified for minimal LoC / maintenance burden (8 files, +232 / -670 lines):
aiStatusInputHashround-trip turned out to add ~50 LoC of complexity (hydration sweep, getter method, conditional-spread coercion) for ~one extra LLM call per workspace per app launch. Removed the field from the type, the coercion, the setter signature, andgetAiStatusInputHashentirely; in-session dedup still works.WindowService: droppedEventEmitterinheritance,setFocused,focusedfield, focus/blur listeners, and the test-stub defensive check.isFocused()delegates toBrowserWindow.isFocused()directly.AgentStatusService: dropped__test__exports,inFlightCount(useinFlightPromises.size),markInFlighthelper,drainInFlighthelper,hashesHydrated+hydratePersistedHashes,resolveTokenizerModel. Mergedtick()+runTick(). UsedFALLBACK_TOKENIZER_MODELconstant inbuildTrailingTranscriptso we don't double-callgetWorkspaceTitleModelCandidatesper workspace.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:max• Cost:$58.06