Case

Case is the reliability layer for agent-authored WorkOS OSS pull requests.

Its job is narrow: turn a clearly scoped WorkOS OSS task into a reviewed PR with evidence, and make the next run better when this one fails. Case is not a generic agent platform, a dashboard product, or a place to accumulate every possible workflow idea. Humans steer. Agents execute. The harness keeps the work reviewable.

Why It Exists

Agents are useful when the surrounding system makes good work easier than bad work. Case provides that surrounding system for the WorkOS open source repos:

A shared map of target repos, commands, architecture notes, and conventions.
A task format that separates human intent from machine-updated state.
A small multi-agent pipeline with isolated responsibilities.
Evidence gates for tests, manual verification, review, and PR creation.
Retrospective learning so repeated failures become docs, playbooks, or enforcement.

The north star:

Case exists to make agent-authored WorkOS OSS PRs reliable, reviewable, and self-improving.

Core Loop

From a target repo:

ca 1234

Case detects the repo, fetches the GitHub issue, creates task files, runs a baseline check, and dispatches the pipeline:

implementer -> verifier -> reviewer -> closer -> retrospective

For unclear work, use the human steering path:

ca --agent
ca --agent 1234

ca --agent starts an interactive orchestrator session. It can inspect context, fetch issues, help shape the task, create the task file, and then run the pipeline. It should not implement directly. This is the primary interface for “humans steer.”

For an existing task file:

ca run --task .case/tasks/active/cli-1-issue-53.task.json

To resume an interrupted issue run, re-run the same command:

ca 1234

Case reuses the existing task when it finds one and resumes from stored state.

What Belongs

Case should stay focused on the PR loop. A feature belongs when it does at least one of these:

Makes ca <issue> or ca --agent <issue> more likely to produce a correct PR.
Converts an observed agent failure into a repeatable guardrail.
Preserves context isolation, evidence, or resumability.
Can be tested hermetically without depending on one user's machine.

Current non-goals:

Generic agent platform features.
Local dashboards and webhook services.
Human approval browser UI between pipeline phases.
Specialized reviewer fleets.
Ideation/spec execution as a first-class runtime.

Those ideas may be revisited only after the core PR loop is boringly reliable.

Setup

Requires Bun >= 1.0.

bun install
bun link
ca init

ca init creates ~/.config/case/ and migrates local state from the repo when run from the case checkout. Re-running it is safe.

Build a standalone binary:

bun run build:binary
cp dist/ca /usr/local/bin/ca

build:binary regenerates the embedded package asset manifest before compiling. The resulting dist/ca is portable: agent prompts, docs, playbooks, and AST rules are bundled into the executable. The binary is ca because case is a reserved word in bash and zsh.

CLI

Primary commands:

ca 1234                 # create or resume a GitHub issue run
ca DX-1234              # create or resume a Linear issue run
ca --agent              # interactive steering session
ca --agent 1234         # steering session with issue context
ca run --task <file>    # run an existing task JSON
ca watch <task-slug>    # live-tail the event log

Agent-facing commands:

ca session <repo-path> --task <task.json>
ca status <task.json> [field value...]
ca mark-tested
ca mark-manual-tested
ca mark-reviewed --critical 0
ca upload <file>
ca snapshot <agent-name>
ca create --repo <name> --title <title> --description <text>
ca analyze-failure <task.json> <agent> <error>
ca bootstrap <repo>
ca check [--repo <repo>]

Common flags:

ca --model claude-opus-4-5 1234
ca run --task <file> --mode unattended
ca run --task <file> --dry-run
ca run --fresh 1234

Storage Layout

Package-level config lives under ~/.config/case/. Per-repo runtime state lives under each target repo's ignored .case/ directory:

~/.config/case/
  config.json
  projects.json
  agent-versions/

<target-repo>/.case/
  active
  learnings.md
  amendments/
  run-log.jsonl
  tasks/
    active/
      <task-slug>.md
      <task-slug>.task.json
  <task-slug>/
    events/
    plan.json

Override the config/cache directory with:

CASE_DATA_DIR=/tmp/case-test ca init

Static package assets are versioned with Case and embedded into the standalone binary: agents/, markdown under docs/, and text rules under ast-rules/. When running from a checkout, disk files win so local prompt/doc edits are picked up immediately; set CASE_PACKAGE_ROOT=/path/to/case to force a specific checkout as the disk override.

For portable binary installs, keep projects.json in ~/.config/case/ via ca init --projects <path> or ca init --migrate-from <case-checkout>. Repo paths in a portable projects.json should be absolute or relative to that projects.json file.

Pipeline

The runtime uses a deterministic TypeScript DAG executor for phase transitions. The LLMs do the work inside each phase; TypeScript decides which phase runs next.

Profiles:

standard: implement, verify, review, close, retrospective.
tiny: implement, review, close, retrospective. Use only for docs, typos, and mechanical config changes where independent verification is not useful.

Revision loops are evaluator-driven. A verifier or reviewer rubric failure can send structured feedback back to the implementer. The default revision budget is two cycles.

Every run writes an append-only event log under <target-repo>/.case/<task-slug>/events/. ca watch <task-slug> renders those events while a run is active.

Agent Roles

Agent	Responsibility	Does Not Do
Orchestrator¹	Parses issues, creates tasks, runs baseline, dispatches the pipeline	Implement code
Implementer	Writes the fix, runs automated tests, commits	Manual browser testing, PR creation
Verifier	Tests the specific user-facing scenario and records evidence	Edit code
Reviewer	Reviews the diff against golden principles and conventions	Edit code or create PRs
Closer	Creates the PR after evidence gates pass	Implement or test
Retrospective	Records learnings and proposes harness improvements	Edit target repo code

¹ The orchestrator is TypeScript runtime code (src/agent/orchestrator-session.ts), not an LLM agent prompt like the others.

The key boundary is context isolation. Implementer context includes task details, playbooks, repo learnings, and revision feedback. Verifier context is intentionally fresher. Reviewer context is focused on the diff and principles.

Evidence Gates

Evidence markers live under the target repo's .case/<task-slug>/ directory:

tested: created by ca mark-tested from real test output.
manual-tested: created by ca mark-manual-tested from manual/browser verification evidence.
reviewed: created by ca mark-reviewed --critical 0.

The closer checks these markers before opening a PR. The point is not ceremony; it is making the PR auditable without trusting a chat transcript.

Self-Improvement

After a run, the retrospective agent should leave the harness smarter:

Append tactical repo learnings under <target-repo>/.case/learnings.md.
Propose broader harness changes under <target-repo>/.case/amendments/.
Escalate repeated failures into docs, playbooks, conventions, or enforcement.

Retrospective output is constrained. It should not expand the product surface by default. The fix for repeated agent failure is usually a clearer task, a better playbook, a sharper convention, or a mechanical guardrail.

Model Configuration

Configure models in ~/.config/case/config.json:

{
  "$schema": "https://raw.githubusercontent.com/workos/case/main/config.schema.json",
  "models": {
    "default": { "provider": "anthropic", "model": "claude-sonnet-4-20250514" },
    "reviewer": { "provider": "google", "model": "gemini-2.5-pro" },
    "verifier": null
  }
}

Priority:

--model flag > explicit spawn options > config file > hardcoded default

Repository Map

Target repos are listed in projects.json.

Repo	Path	Purpose
cli	`../cli/main`	WorkOS CLI
skills	`../skills`	WorkOS integration skills
authkit-session	`../authkit-session`	Framework-agnostic session management
authkit-tanstack-start	`../authkit-tanstack-start`	AuthKit TanStack Start SDK
authkit-nextjs	`../authkit-nextjs`	AuthKit Next.js SDK
workos-node	`../workos-node/main`	WorkOS Node.js SDK

Add a repo by updating projects.json, adding any needed architecture notes under docs/architecture/, and verifying with:

ca check --repo <name>

Development Checks

For case itself:

bun run typecheck
bun test ./src/__tests__/
bun run lint
bun run format:check

For target repos:

ca bootstrap <repo>
ca check --repo <repo>

Philosophy

The short version:

Humans steer. Agents execute.
The harness is the product; target repo code is the output.
When agents struggle, fix the harness.
Enforce mechanically, not rhetorically.
Test the specific fix, not the happy path.
Keep the tool small unless reliability demands complexity.

See docs/philosophy.md for the fuller version.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
agents		agents
ast-rules		ast-rules
docs		docs
src		src
tasks		tasks
test/standalone		test/standalone
tests/ast-rules/fixtures		tests/ast-rules/fixtures
.gitignore		.gitignore
.mcp.json		.mcp.json
.oxfmtrc.json		.oxfmtrc.json
.oxlintrc.json		.oxlintrc.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
README.md		README.md
bun.lock		bun.lock
bunfig.toml		bunfig.toml
config.schema.json		config.schema.json
improvements.md		improvements.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
projects.json		projects.json
projects.schema.json		projects.schema.json
sgconfig.yml		sgconfig.yml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Case

Why It Exists

Core Loop

What Belongs

Setup

CLI

Storage Layout

Pipeline

Agent Roles

Evidence Gates

Self-Improvement

Model Configuration

Repository Map

Development Checks

Philosophy

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Case

Why It Exists

Core Loop

What Belongs

Setup

CLI

Storage Layout

Pipeline

Agent Roles

Evidence Gates

Self-Improvement

Model Configuration

Repository Map

Development Checks

Philosophy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages