feat: add terminal dashboard app with CLI launcher

## Objective
Create a keyboard-first local terminal dashboard for AgentV that lets users browse run history, recent summaries, and per-run details while keeping the user-facing install and launch experience simple.

## Architecture Boundary
separate package, unified launcher

Prefer implementing the terminal dashboard as a dedicated app/package (for example `apps/tui`) rather than folding the full runtime into the existing CLI command package.

Reasoning:
- cleaner ownership boundaries for UI-specific code
- easier navigation for AI coding agents and humans
- lower risk of mixing command-runner concerns with long-lived TUI runtime concerns
- easier parallel development as the UI grows beyond a narrow prompt flow

Even with a separate package, the user-facing experience should stay unified:
- `npm install agentv`
- `agentv <command>`
- no extra package install required for dashboard/TUI users

The CLI should remain the launcher surface. The separate package is an internal architecture choice, not a user-facing packaging burden.

## Current Ground Truth
- Current AgentV `main` has no dedicated TUI app/package yet.
- The existing interactive flow is an Inquirer-based wizard for selecting evals/targets, not a full terminal dashboard.
- Latest `main` now has `agentv results summary`, `agentv results failures`, and `agentv results show`, which are the best near-term artifact/query surface to reuse.
- AgentV already ships `agentv serve` from the CLI package, which establishes the pattern that CLI can launch richer local UI surfaces.
- The current browser review UI in `agentv serve` already implements a useful content model that the TUI should borrow rather than reinvent.

## V1 Content Model
The first TUI version should explicitly mirror the current `agentv serve` results-review surface at a terminal-appropriate level.

### View 1: Overview
Show aggregate run stats equivalent to the current browser review UI:
- total tests
- passed
- failed
- execution errors
- pass rate
- total duration
- token usage
- estimated cost

When multiple targets are present, also show a per-target summary table with:
- target name
- pass rate
- passed / failed / errors
- average score
- duration
- tokens
- cost

Include a compact score distribution view if feasible in the terminal.

### View 2: Test List
Show a filterable/sortable test list equivalent to the current browser review UI table.

Per row, include at minimum:
- status
- test id
- target when relevant
- overall score
- evaluator columns or a compact evaluator summary
- duration
- cost

Support filtering by at least:
- status
- target when relevant
- text search by test id

### View 3: Test Detail
Selecting a test should open a detail view or detail pane with the same high-value review content already present in `agentv serve`:
- input preview
- output preview
- evaluator score breakdown
- passed/failed expectations or assertions
- execution error details when relevant
- lightweight metadata such as timing/target identifiers

For v1, this should be read-only review/debug content first.

## Design Latitude
A dedicated `apps/tui` package is preferred for the first substantial implementation.

Prefer reusing existing AgentV result/history abstractions and output artifacts over inventing a new plugin system. If shared dashboard data/query logic is needed, keep that layer UI-agnostic so it can support browser and terminal surfaces.

The TUI should reuse the current `agentv serve` content logic where practical:
- aggregate stats computation
- per-target aggregation
- per-test row shaping
- evaluator score extraction
- expectation/assertion rendering data

Renderer choice is intentionally open. If the TUI needs a dedicated renderer/runtime boundary, options such as OpenTUI or a schema-driven Ink renderer like `@json-render/ink` are both in bounds.

The same architectural direction likely applies to browser UI over time: `agentv serve` may remain the CLI entrypoint, but the browser dashboard/runtime should eventually be able to live in its own package (for example `apps/wui`) rather than being permanently owned by CLI internals.

## Acceptance Signals
- `agentv` exposes a keyboard-first terminal dashboard entrypoint.
- The TUI implementation lives in a dedicated package/app or is clearly structured so it can be isolated without major churn.
- The dashboard can read existing AgentV run artifacts or the same history storage used by dashboard/reporting features.
- The TUI provides these core review surfaces:
  - overview stats
  - test list
  - per-test detail
- The per-test detail includes evaluator breakdown plus failed expectations/assertions or execution-error information.
- The UI is usable entirely in the terminal.
- `npm install agentv` users do not need a second manual install step to use the dashboard.
- Shared data-loading logic, if introduced, is reusable by other dashboard surfaces.

## Non-Goals
- Replacing or merging with the web dashboard work in #563.
- Designing a general third-party plugin architecture for the dashboard.
- Moving core evaluation logic into UI code.
- Forcing users to install separate UI packages manually.
- Full parity with the broader web dashboard roadmap on the first pass.
- Historical trends, dataset browser, and live SSE parity in the first TUI version.
- Making model-in-the-loop UI generation a requirement for the first version.

## Related
- #563 self-hosted dashboard — web/local browser dashboard
- #520 interactive eval flow should list available models by provider
- Research: `research/agentv/tui-architecture-json-render.md` in agentevals-research




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add terminal dashboard app with CLI launcher #686

Objective

Architecture Boundary

Current Ground Truth

V1 Content Model

View 1: Overview

View 2: Test List

View 3: Test Detail

Design Latitude

Acceptance Signals

Non-Goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: add terminal dashboard app with CLI launcher #686

Description

Objective

Architecture Boundary

Current Ground Truth

V1 Content Model

View 1: Overview

View 2: Test List

View 3: Test Detail

Design Latitude

Acceptance Signals

Non-Goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions