-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Objective
Create a keyboard-first local terminal dashboard for AgentV that lets users browse run history, recent summaries, and per-run details while keeping the user-facing install and launch experience simple.
Architecture Boundary
separate package, unified launcher
Prefer implementing the terminal dashboard as a dedicated app/package (for example apps/tui) rather than folding the full runtime into the existing CLI command package.
Reasoning:
- cleaner ownership boundaries for UI-specific code
- easier navigation for AI coding agents and humans
- lower risk of mixing command-runner concerns with long-lived TUI runtime concerns
- easier parallel development as the UI grows beyond a narrow prompt flow
Even with a separate package, the user-facing experience should stay unified:
npm install agentvagentv <command>- no extra package install required for dashboard/TUI users
The CLI should remain the launcher surface. The separate package is an internal architecture choice, not a user-facing packaging burden.
Current Ground Truth
- Current AgentV
mainhas no dedicated TUI app/package yet. - The existing interactive flow is an Inquirer-based wizard for selecting evals/targets, not a full terminal dashboard.
- Latest
mainnow hasagentv results summary,agentv results failures, andagentv results show, which are the best near-term artifact/query surface to reuse. - AgentV already ships
agentv servefrom the CLI package, which establishes the pattern that CLI can launch richer local UI surfaces. - The current browser review UI in
agentv servealready implements a useful content model that the TUI should borrow rather than reinvent.
V1 Content Model
The first TUI version should explicitly mirror the current agentv serve results-review surface at a terminal-appropriate level.
View 1: Overview
Show aggregate run stats equivalent to the current browser review UI:
- total tests
- passed
- failed
- execution errors
- pass rate
- total duration
- token usage
- estimated cost
When multiple targets are present, also show a per-target summary table with:
- target name
- pass rate
- passed / failed / errors
- average score
- duration
- tokens
- cost
Include a compact score distribution view if feasible in the terminal.
View 2: Test List
Show a filterable/sortable test list equivalent to the current browser review UI table.
Per row, include at minimum:
- status
- test id
- target when relevant
- overall score
- evaluator columns or a compact evaluator summary
- duration
- cost
Support filtering by at least:
- status
- target when relevant
- text search by test id
View 3: Test Detail
Selecting a test should open a detail view or detail pane with the same high-value review content already present in agentv serve:
- input preview
- output preview
- evaluator score breakdown
- passed/failed expectations or assertions
- execution error details when relevant
- lightweight metadata such as timing/target identifiers
For v1, this should be read-only review/debug content first.
Design Latitude
A dedicated apps/tui package is preferred for the first substantial implementation.
Prefer reusing existing AgentV result/history abstractions and output artifacts over inventing a new plugin system. If shared dashboard data/query logic is needed, keep that layer UI-agnostic so it can support browser and terminal surfaces.
The TUI should reuse the current agentv serve content logic where practical:
- aggregate stats computation
- per-target aggregation
- per-test row shaping
- evaluator score extraction
- expectation/assertion rendering data
Renderer choice is intentionally open. If the TUI needs a dedicated renderer/runtime boundary, options such as OpenTUI or a schema-driven Ink renderer like @json-render/ink are both in bounds.
The same architectural direction likely applies to browser UI over time: agentv serve may remain the CLI entrypoint, but the browser dashboard/runtime should eventually be able to live in its own package (for example apps/wui) rather than being permanently owned by CLI internals.
Acceptance Signals
agentvexposes a keyboard-first terminal dashboard entrypoint.- The TUI implementation lives in a dedicated package/app or is clearly structured so it can be isolated without major churn.
- The dashboard can read existing AgentV run artifacts or the same history storage used by dashboard/reporting features.
- The TUI provides these core review surfaces:
- overview stats
- test list
- per-test detail
- The per-test detail includes evaluator breakdown plus failed expectations/assertions or execution-error information.
- The UI is usable entirely in the terminal.
npm install agentvusers do not need a second manual install step to use the dashboard.- Shared data-loading logic, if introduced, is reusable by other dashboard surfaces.
Non-Goals
- Replacing or merging with the web dashboard work in feat: self-hosted dashboard — historical trends, dataset management, YAML editor #563.
- Designing a general third-party plugin architecture for the dashboard.
- Moving core evaluation logic into UI code.
- Forcing users to install separate UI packages manually.
- Full parity with the broader web dashboard roadmap on the first pass.
- Historical trends, dataset browser, and live SSE parity in the first TUI version.
- Making model-in-the-loop UI generation a requirement for the first version.
Related
- feat: self-hosted dashboard — historical trends, dataset management, YAML editor #563 self-hosted dashboard — web/local browser dashboard
- Interactive eval TUI should list available models by provider #520 interactive eval flow should list available models by provider
- Research:
research/agentv/tui-architecture-json-render.mdin agentevals-research
Metadata
Metadata
Assignees
Labels
Type
Projects
Status