Skip to content

Conversation

@nicknisi
Copy link
Member

@nicknisi nicknisi commented Feb 1, 2026

Summary

  • Add comprehensive eval framework with fixtures for all 5 frameworks × 3 states (15 scenarios)
  • Wire AgentExecutor to use Claude Agent SDK directly with direct auth mode
  • Add retry logic, debug tooling, and history tracking for eval runs

Why

Need automated testing to validate installer agent behavior across different project configurations before releases.

Notes

  • Run evals with pnpm eval --framework=nextjs --state=fresh
  • Requires ANTHROPIC_API_KEY in .env.local
  • Fixtures cover fresh projects, existing apps, and apps with competing auth (Auth0)

Introduces a structured evaluation system to validate the WorkOS installer
agent against framework fixtures. Phase 1 includes:

- Core types and interfaces for grading results
- File and build graders with pattern matching
- Next.js-specific grader checking AuthKit integration
- Fixture manager for temp dir setup/cleanup
- Eval runner orchestrating fixture → agent → grade flow
- CLI entry point with --framework and --verbose flags
- Minimal Next.js 14 App Router fixture

The agent executor is stubbed to validate framework structure first.
Run with: pnpm eval
Add CLI with filtering (--framework, --state, --json), matrix reporter,
and graders for all 5 frameworks. Create fixtures for fresh, existing,
and existing-auth0 states across Next.js, React SPA, React Router,
TanStack Start, and Vanilla JS.
- Add history.ts for results persistence with compare functionality
- Extend CLI with --debug, --keep-on-fail, --retry, --no-retry flags
- Add history and compare subcommands (pnpm eval:history, eval:compare)
- Implement retry loop in runner for handling LLM non-determinism
- Add verbose failure output with expected/actual values
- Create README documentation for eval framework usage
Replace stub implementation with real agent execution:
- Add env-loader for credentials from .env.local
- Configure SDK with direct auth mode (bypasses gateway)
- Capture tool calls and output from message stream
- Add ToolCall interface to types
- Use glob + content matching for callback route (path is configurable)
- Remove process.env.WORKOS_ check (SDK abstracts env access)
- Add checkFileWithPattern helper for flexible file discovery
@nicknisi
Copy link
Member Author

nicknisi commented Feb 1, 2026

Example run:

Framework State
Next.js Fresh
> tsx tests/evals/index.ts --framework=nextjs --verbose --keepOnFail --state=fresh


Running: nextjs/fresh
  Installing dependencies...
[dotenv@17.2.3] injecting env (2) from .env.local -- tip: ⚙️  load multiple .env files with { path: ['.env.local', '.env'] }
  Initializing agent for nextjs...
  Tool: Skill
  Agent: I'll start by fetching the SDK documentation from the README, which is the source of truth....
  Tool: WebFetch
  Agent: Now let me validate the project structure and check the existing environment variables....
  Tool: Glob
  Tool: Read
  Tool: Read
  Tool: Glob
  Agent: Let me also check the existing app structure:...
  Tool: Glob
  Tool: Read
  Tool: Read
  Tool: TodoWrite
  Agent: Now let me proceed with the integration. First, I'll add the missing environment variable and instal...
  Tool: Edit
  Tool: Bash
  Agent: Let me check if there's a pnpm or yarn lockfile to determine the correct package manager:...
  Tool: Bash
  Tool: Bash
  Tool: Bash
  Tool: TodoWrite
  Agent: Now let me verify the SDK is installed and create the middleware. Since this is Next.js 14, I'll cre...
  Tool: Bash
  Agent: Now let me create the middleware, callback route, update layout, and update the page:...
  Tool: Write
  Tool: Bash
  Tool: Write
  Tool: TodoWrite
  Agent: Now let me update the layout with AuthKitProvider and the page with auth UI:...
  Tool: Write
  Tool: Write
  Tool: TodoWrite
  Agent: Now let me run the verification checklist and build:...
  Tool: Bash
  Tool: Bash
  Tool: Bash
  Agent: All verification checks pass. Now let's run the build:...
  Tool: Bash
  Agent: Let me check the correct import paths from the SDK:...
  Tool: Bash
  Tool: Bash
  Tool: Bash
  Agent: I see - the SDK doesn't have `getUser` and `AuthKitProvider` in the main exports. Let me check the c...
  Tool: Bash
  Tool: Bash
  Agent: I see - `AuthKitProvider` is in `@workos-inc/authkit-nextjs/components`, and I need to use `withAuth...
  Tool: Bash
  Tool: Bash
  Tool: Bash
  Agent: Now I understand. The SDK uses `withAuth` instead of `getUser`. Let me update the files with the cor...
  Tool: Write
  Tool: Write
  Agent: Now let's run the build again:...
  Tool: Bash
  Agent: The build succeeded. Let me update the todo list to show completion:...
  Tool: TodoWrite
  Agent: I've successfully integrated WorkOS AuthKit into your Next.js application. Here's a summary of what ...
✓ PASSED

Results saved to: /Users/nicknisi/Developer/cli/prompt-improvements/tests/eval-results/2026-02-01T05-39-28-884Z.json

┌─────────────────┬─────────┬──────────┬───────────────┐
│ Framework       │  Fresh  │ Existing │ Existing+Auth │
├─────────────────┼─────────┼──────────┼───────────────┤
│ nextjs          │   ✓   │   -     │   -        │
│ react           │   -   │   -     │   -        │
│ react-router    │   -   │   -     │   -        │
│ tanstack-start  │   -   │   -     │   -        │
│ vanilla-js      │   -   │   -     │   -        │
└─────────────────┴─────────┴──────────┴───────────────┘

Results: 1/1 passed (100.0%)
pnpm eval --framework=nextjs --verbose --keepOnFail --state=fresh  66.79s user 23.09s system 32% cpu 4:39.04 total

- Grader: support src/ directory (v1.132+) in addition to app/
- Grader: check for authkitMiddleware instead of createServerFn
- Grader: fix package name to @workos/authkit-tanstack-react-start
- Grader: remove AuthKitProvider requirement (optional for server-only)
- Grader: support both flat and nested route patterns for callback
- Skill: add directory detection guidance (src/ vs app/)
- Skill: fix handleAuth() → handleCallbackRoute()
- Skill: add SDK exports reference section
- Remove callback component check (SDK handles OAuth internally)
- Use glob pattern to find useAuth anywhere in src/**/*.tsx
- Support both Vite (main.tsx) and CRA (index.tsx) entry points
- Add comprehensive header documenting SDK patterns
- Fix package name: @workos-inc/authkit-react-router (was @workos-inc/authkit)
- Use glob patterns instead of hardcoded file paths
- Check for authLoader in callback routes (flexible location)
- Check for authkitLoader in route files for auth state
- Remove unnecessary ProtectedRoute.tsx/auth.ts checks (SDK has ensureSignedIn)
- Support both app/ and src/ directory structures
- Remove callback.html/callback.js checks (SDK handles OAuth internally)
- Remove auth.js with getAuthorizationUrl (old pattern)
- Check for createClient from @workos-inc/authkit-js or CDN WorkOS.createClient
- Check for auth methods (signIn, signOut, getUser, getAccessToken)
- Support both bundled (ESM import) and CDN (script tag) patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants