Skip to content

feat(admin): model experiments tRPC router + gateway tab#3302

Open
markijbema wants to merge 3 commits into
mainfrom
mark/experimental-models-admin
Open

feat(admin): model experiments tRPC router + gateway tab#3302
markijbema wants to merge 3 commits into
mainfrom
mark/experimental-models-admin

Conversation

@markijbema
Copy link
Copy Markdown
Contributor

Summary

Phase 5 of .plans/experimental-models-1.md — admin editing surface for model experiments. Stacked on top of #3299 (schema). Lands the tRPC router, hooks, and a fourth tab inside /admin/gateway (no new sidebar entry, mirroring how custom-llms already lives there).

What this PR does not do:

  • No gateway routing yet (Phases 2/3) — admin can create/activate experiments but no traffic is bucketed.
  • No getLiveStats (needs Phase 4 request rows) and no getPromptByHash (needs the R2 prompt-store module + bucket env var).

Backend

  • apps/web/src/lib/ai-gateway/experiments/upstream-schema.tsExperimentUpstreamSchema, the strict subset of CustomLlmDefinitionSchema the plan describes. No api_key (lives in the sibling encrypted_api_key column) and no extra_headers (deliberately deferred — see §"Phase 1 — Schema + Migration" in the plan).
  • apps/web/src/lib/redis-keys.ts — adds EXPERIMENTED_PUBLIC_IDS_REDIS_KEY and modelExperimentRedisKey(publicId). Phase 3 will read these; this PR writes them on every routing-affecting mutation so the cache is already correct when routing lands.
  • apps/web/src/lib/redis.ts — adds redisDel(key) helper (alongside existing redisGet / redisSet).
  • apps/web/src/routers/admin/model-experiments-router.ts — full router:
    • list, get, create, update, delete (draft only), activate, pause, complete, setArchived
    • addVariant, removeVariant, updateVariantLabel, swapVariantVersion, rotateApiKey
    • State-machine guards match the plan: structural edits draft-only, hot-swap any non-terminal state, archive forbidden on active, only one routing-relevant experiment per public_model_id (DB partial unique index plus a friendly pre-check).
    • encrypted_api_key is never selected by any read or write that returns to the client. variantVersionPublicColumns is the explicit non-key projection used everywhere; the column is only written to during swapVariantVersion / rotateApiKey after encryptApiKey().
    • BYOK_ENCRYPTION_KEY missing → INTERNAL_SERVER_ERROR on key-touching ops, matching byok-router.ts:188.
    • Every routing-affecting mutation invalidates the per-public-id cache and recomputes the membership set.

Frontend

  • apps/web/src/app/admin/api/model-experiments/hooks.ts — React Query hooks for every router procedure.
  • apps/web/src/app/admin/model-experiments/ModelExperimentsContent.tsx — list view, inline detail view, create dialog, add-variant dialog, Monaco-based hot-swap dialog (validates ExperimentUpstreamSchema strict before submit), rotate-key dialog. Status badges, share rendered as weight / sum(weights), structural-edit lock for non-draft.
  • apps/web/src/app/admin/gateway/page.tsx — fourth tab "Model Experiments" added.
  • apps/web/src/app/admin/model-experiments/page.tsx — redirects to /admin/gateway?tab=model-experiments (mirrors /admin/custom-llms).

Plan doc

Updated .plans/experimental-models-1.md with concrete Phase 5 status and the deferred items (getLiveStats, getPromptByHash).

Verification

  • pnpm --filter web typecheck passes.
  • pnpm --filter web lint passes (0 warnings, 0 errors).
  • pnpm format clean.
  • pnpm --filter web test -- admin-router passes (24/24 unrelated tests still green; no new tests added in this PR — admin router tests for experiments will land alongside Phase 6).
  • Manually: navigate to /admin/gateway?tab=model-experiments, create a draft experiment, add two variants, hot-swap a version with a JSON body and an api key, confirm the api key is never echoed back, attempt to activate without enough variants and observe the friendly error.

Skipped the full pnpm typecheck because the per-package check covers the changes — flagging per AGENTS.md.

Visual Changes

A new "Model Experiments" tab appears inside /admin/gateway. Reuses existing tokens (text-2xl font-bold heading, text-muted-foreground helpers, shadcn Dialog/Table/Badge/Button primitives, Monaco editor with vs-dark matching the custom-LLM editor). No new visual primitives.

Reviewer Notes

  • Stacked on feat(db): add model experiment schema for preview A/B tests #3299. Cannot run end-to-end until feat(db): add model experiment schema for preview A/B tests #3299 merges and the migration applies in dev/preview.
  • Cache invalidation is best-effort and recomputes the full set. recomputeExperimentedPublicIds runs a small SELECT public_model_id FROM model_experiment WHERE status IN ('active','paused') and writes a JSON array string. At any plausible scale (≪1k experiments) this is fine; if it ever grows, swap to SADD/SREM deltas.
  • update.public_model_id is gated to draft only. Non-draft routes through assertDraft because changing the public id of a routing-relevant experiment would invalidate the partial unique index and existing buckets.
  • No tests for the new router yet. Phase 6 owns the dedicated test file (model-experiments-router.test.ts) per the plan; intentionally deferred to keep this PR reviewable.
  • status is read from the DB as plain text and narrowed via as Status in handful of places where TypeScript can't infer the CHECK-constrained shape. Acceptable in this code per AGENTS.md "boundary cast" guidance — the DB column is constrained by a CHECK to exactly those four values.

Phase 5 of .plans/experimental-models-1.md. Schema-only PR #3299
provides the tables; this PR adds the editing surface.

- ExperimentUpstreamSchema (strict subset of CustomLlmDefinitionSchema;
  no api_key, no extra_headers)
- Redis key helpers: EXPERIMENTED_PUBLIC_IDS_REDIS_KEY and
  modelExperimentRedisKey(publicId), plus redisDel() helper
- adminModelExperimentsRouter with CRUD + state machine
  (activate / pause / complete / setArchived / delete-on-draft) and
  variant ops (addVariant / removeVariant / updateVariantLabel /
  swapVariantVersion / rotateApiKey). encrypted_api_key is never
  returned by list/get/swap/rotate; admin selects enumerate non-key
  columns explicitly. BYOK_ENCRYPTION_KEY missing rejects key-touching
  ops.
- React Query hooks + ModelExperimentsContent (list + inline detail +
  Monaco hot-swap dialog + rotate-key dialog).
- Mounted as a 4th tab inside /admin/gateway with a redirect at
  /admin/model-experiments mirroring /admin/custom-llms.

Cache invalidation runs on every routing-affecting mutation
(per-public-id cache + recomputed membership set). Phase 3 gateway
routing reads these caches; until that lands, invalidation is a no-op
for routing but writes the membership set so it's ready.
26 tests covering:
- CRUD lifecycle (create, list w/ archive filter, delete-on-draft)
- Activation guards: <2 variants, missing variant_version, duplicate
  active|paused per public_model_id, draft-can-coexist
- State machine: draft -> active -> paused -> active -> completed,
  pause-only-active, can't activate completed
- Archive forbidden on active, allowed on paused/completed/draft
- Structural edits (add/remove) draft-only after activation
- updateVariantLabel allowed in non-terminal, rejected on completed
- swapVariantVersion encrypts api key; response never includes
  encrypted_api_key column; DB roundtrip via decryptApiKey
- Hot-swap inserts new version row; old rows remain; allowed on
  active/paused; rejected on completed
- rotateApiKey copies prior upstream; rejects when no prior version
- ExperimentUpstreamSchema strict rejects extra fields (e.g. api_key)
- get(id) never returns encrypted_api_key on any version row
- cascade delete on draft removes variants and versions
- non-admin callers rejected
}
if (existing.status === 'active') return existing;
await assertActivatable(input.id, existing.public_model_id);
const [updated] = await db
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: TOCTOU gap between the conflict pre-check and the DB write

The assertActivatable SELECT and the subsequent UPDATE are not in a transaction. In the (unlikely but possible) case where two admin sessions concurrently activate different experiments for the same public_model_id, both could pass the pre-check, and one will hit the DB partial-unique-index constraint (UQ_model_experiment_public_model_id_routing). That constraint violation surfaces as an unhandled Postgres 23505 error, which tRPC will map to an INTERNAL_SERVER_ERROR rather than the user-friendly CONFLICT message the pre-check was designed to produce.

The safest fix is to catch the unique constraint violation on the UPDATE and re-throw it as a TRPCError({ code: 'CONFLICT', ... }). This way the DB is always the authoritative guard and the pre-check remains just a UX convenience, but both paths produce a friendly message:

try {
  const [updated] = await db
    .update(model_experiment)
    .set({ status: 'active', started_at: existing.started_at ?? sql`now()` })
    .where(eq(model_experiment.id, input.id))
    .returning();
  await invalidateExperimentCaches(updated.public_model_id);
  return updated;
} catch (err) {
  if (err instanceof Error && 'code' in err && (err as { code: string }).code === '23505') {
    throw new TRPCError({
      code: 'CONFLICT',
      message: `Another active or paused experiment exists for ${existing.public_model_id}`,
    });
  }
  throw err;
}

.set(next)
.where(eq(model_experiment.id, input.id))
.returning();
await invalidateExperimentCaches(updated.public_model_id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Cache is invalidated even for purely cosmetic updates

invalidateExperimentCaches is called on every update mutation, even when only name or description changes. The per-public-id Redis cache contains variant/version routing data — not experiment metadata — so a name or description change doesn't affect the cached payload. This is harmless but causes unnecessary Redis churn.

Consider gating the invalidation on whether the update affects routing-relevant data:

const routingAffected = 'public_model_id' in next;
if (routingAffected) {
  await invalidateExperimentCaches(updated.public_model_id);
  if (existing.public_model_id !== updated.public_model_id) {
    await invalidateExperimentCaches(existing.public_model_id);
  }
}

const rows = await db
.select({ public_model_id: model_experiment.public_model_id })
.from(model_experiment)
.where(inArray(model_experiment.status, RoutingStatuses as readonly Status[] as Status[]));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Double cast as readonly Status[] as Status[] is noise

The same double cast appears in two places (recomputeExperimentedPublicIds and assertActivatable). RoutingStatuses is ['active', 'paused'] as const, which is readonly ['active', 'paused']. Drizzle's inArray accepts readonly arrays directly in recent versions — if TypeScript is accepting the double cast here it's working around a real or apparent type mismatch.

If the project's Drizzle version does require the cast, consider extracting a helper or a typed constant so the cast is in one place:

// Once at the top of the file
const routingStatusList = RoutingStatuses as unknown as Status[];
// Then use routingStatusList in inArray() calls

At minimum, consider adding a brief comment explaining why the cast is needed so the next reader doesn't wonder if it's a mistake.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 18, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Executive Summary

A TOCTOU race in the activate mutation means a concurrent DB unique-constraint violation surfaces as an unhandled INTERNAL_SERVER_ERROR rather than a friendly conflict message. The latest commit is a clean refactor (extracts trpcThrow/notFound/badRequest/loadVariantOrThrow helpers) with no new issues introduced.

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 2
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/routers/admin/model-experiments-router.ts 284 TOCTOU gap: concurrent activations hitting the DB partial unique index produce an unhandled 23505 error (INTERNAL_SERVER_ERROR) rather than the friendly CONFLICT message

SUGGESTION

File Line Issue
apps/web/src/routers/admin/model-experiments-router.ts 265 Cache is invalidated on every update mutation even for name/description-only changes that don't affect routing data
apps/web/src/routers/admin/model-experiments-router.ts 63 Double cast as readonly Status[] as Status[] appears in two places; worth consolidating or commenting
Incremental Review Note

The latest commit (6ccdb05) is a pure refactor that extracts trpcThrow, notFound, badRequest, and loadVariantOrThrow helpers, replacing 11 verbose throw new TRPCError(...) blocks with one-liners. No new issues were introduced. The never return types on the helpers are correctly propagated through TypeScript's control flow. Previous findings carry forward unchanged.

Files Reviewed (9 files)
  • apps/web/src/routers/admin/model-experiments-router.ts — 3 issues
  • apps/web/src/routers/admin/model-experiments-router.test.ts
  • apps/web/src/app/admin/model-experiments/ModelExperimentsContent.tsx
  • apps/web/src/app/admin/model-experiments/page.tsx
  • apps/web/src/app/admin/api/model-experiments/hooks.ts
  • apps/web/src/lib/ai-gateway/experiments/upstream-schema.ts
  • apps/web/src/lib/redis.ts
  • apps/web/src/lib/redis-keys.ts
  • .plans/experimental-models-1.md

Fix these issues in Kilo Cloud


Review guidance: REVIEW.md from base branch main

Replaces 11 verbose 'throw new TRPCError({ code, message })' blocks with
oneliner calls. The router now reads 60 lines shorter at the same
behavior; tests are unchanged and all 26 still pass.

- trpcThrow(code, message) -> never (single point that constructs+throws)
- notFound(entity), badRequest(message) -> oneliner shortcuts
- loadVariantOrThrow consolidates the four duplicated 'load variant or
  return NOT_FOUND' blocks
return { success: true };
}),

activate: adminProcedure.input(idSchema).mutation(async ({ input }) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these state transitions seem to contain a lot of duplication

@markijbema markijbema force-pushed the mark/experimental-models-admin branch from 42e9fb2 to 6ccdb05 Compare May 18, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant