feat(admin): model experiments tRPC router + gateway tab#3302
feat(admin): model experiments tRPC router + gateway tab#3302markijbema wants to merge 3 commits into
Conversation
Phase 5 of .plans/experimental-models-1.md. Schema-only PR #3299 provides the tables; this PR adds the editing surface. - ExperimentUpstreamSchema (strict subset of CustomLlmDefinitionSchema; no api_key, no extra_headers) - Redis key helpers: EXPERIMENTED_PUBLIC_IDS_REDIS_KEY and modelExperimentRedisKey(publicId), plus redisDel() helper - adminModelExperimentsRouter with CRUD + state machine (activate / pause / complete / setArchived / delete-on-draft) and variant ops (addVariant / removeVariant / updateVariantLabel / swapVariantVersion / rotateApiKey). encrypted_api_key is never returned by list/get/swap/rotate; admin selects enumerate non-key columns explicitly. BYOK_ENCRYPTION_KEY missing rejects key-touching ops. - React Query hooks + ModelExperimentsContent (list + inline detail + Monaco hot-swap dialog + rotate-key dialog). - Mounted as a 4th tab inside /admin/gateway with a redirect at /admin/model-experiments mirroring /admin/custom-llms. Cache invalidation runs on every routing-affecting mutation (per-public-id cache + recomputed membership set). Phase 3 gateway routing reads these caches; until that lands, invalidation is a no-op for routing but writes the membership set so it's ready.
26 tests covering: - CRUD lifecycle (create, list w/ archive filter, delete-on-draft) - Activation guards: <2 variants, missing variant_version, duplicate active|paused per public_model_id, draft-can-coexist - State machine: draft -> active -> paused -> active -> completed, pause-only-active, can't activate completed - Archive forbidden on active, allowed on paused/completed/draft - Structural edits (add/remove) draft-only after activation - updateVariantLabel allowed in non-terminal, rejected on completed - swapVariantVersion encrypts api key; response never includes encrypted_api_key column; DB roundtrip via decryptApiKey - Hot-swap inserts new version row; old rows remain; allowed on active/paused; rejected on completed - rotateApiKey copies prior upstream; rejects when no prior version - ExperimentUpstreamSchema strict rejects extra fields (e.g. api_key) - get(id) never returns encrypted_api_key on any version row - cascade delete on draft removes variants and versions - non-admin callers rejected
| } | ||
| if (existing.status === 'active') return existing; | ||
| await assertActivatable(input.id, existing.public_model_id); | ||
| const [updated] = await db |
There was a problem hiding this comment.
WARNING: TOCTOU gap between the conflict pre-check and the DB write
The assertActivatable SELECT and the subsequent UPDATE are not in a transaction. In the (unlikely but possible) case where two admin sessions concurrently activate different experiments for the same public_model_id, both could pass the pre-check, and one will hit the DB partial-unique-index constraint (UQ_model_experiment_public_model_id_routing). That constraint violation surfaces as an unhandled Postgres 23505 error, which tRPC will map to an INTERNAL_SERVER_ERROR rather than the user-friendly CONFLICT message the pre-check was designed to produce.
The safest fix is to catch the unique constraint violation on the UPDATE and re-throw it as a TRPCError({ code: 'CONFLICT', ... }). This way the DB is always the authoritative guard and the pre-check remains just a UX convenience, but both paths produce a friendly message:
try {
const [updated] = await db
.update(model_experiment)
.set({ status: 'active', started_at: existing.started_at ?? sql`now()` })
.where(eq(model_experiment.id, input.id))
.returning();
await invalidateExperimentCaches(updated.public_model_id);
return updated;
} catch (err) {
if (err instanceof Error && 'code' in err && (err as { code: string }).code === '23505') {
throw new TRPCError({
code: 'CONFLICT',
message: `Another active or paused experiment exists for ${existing.public_model_id}`,
});
}
throw err;
}| .set(next) | ||
| .where(eq(model_experiment.id, input.id)) | ||
| .returning(); | ||
| await invalidateExperimentCaches(updated.public_model_id); |
There was a problem hiding this comment.
SUGGESTION: Cache is invalidated even for purely cosmetic updates
invalidateExperimentCaches is called on every update mutation, even when only name or description changes. The per-public-id Redis cache contains variant/version routing data — not experiment metadata — so a name or description change doesn't affect the cached payload. This is harmless but causes unnecessary Redis churn.
Consider gating the invalidation on whether the update affects routing-relevant data:
const routingAffected = 'public_model_id' in next;
if (routingAffected) {
await invalidateExperimentCaches(updated.public_model_id);
if (existing.public_model_id !== updated.public_model_id) {
await invalidateExperimentCaches(existing.public_model_id);
}
}| const rows = await db | ||
| .select({ public_model_id: model_experiment.public_model_id }) | ||
| .from(model_experiment) | ||
| .where(inArray(model_experiment.status, RoutingStatuses as readonly Status[] as Status[])); |
There was a problem hiding this comment.
SUGGESTION: Double cast as readonly Status[] as Status[] is noise
The same double cast appears in two places (recomputeExperimentedPublicIds and assertActivatable). RoutingStatuses is ['active', 'paused'] as const, which is readonly ['active', 'paused']. Drizzle's inArray accepts readonly arrays directly in recent versions — if TypeScript is accepting the double cast here it's working around a real or apparent type mismatch.
If the project's Drizzle version does require the cast, consider extracting a helper or a typed constant so the cast is in one place:
// Once at the top of the file
const routingStatusList = RoutingStatuses as unknown as Status[];
// Then use routingStatusList in inArray() callsAt minimum, consider adding a brief comment explaining why the cast is needed so the next reader doesn't wonder if it's a mistake.
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Executive SummaryA TOCTOU race in the Overview
Issue Details (click to expand)WARNING
SUGGESTION
Incremental Review NoteThe latest commit ( Files Reviewed (9 files)
Fix these issues in Kilo Cloud Review guidance: REVIEW.md from base branch |
Replaces 11 verbose 'throw new TRPCError({ code, message })' blocks with
oneliner calls. The router now reads 60 lines shorter at the same
behavior; tests are unchanged and all 26 still pass.
- trpcThrow(code, message) -> never (single point that constructs+throws)
- notFound(entity), badRequest(message) -> oneliner shortcuts
- loadVariantOrThrow consolidates the four duplicated 'load variant or
return NOT_FOUND' blocks
| return { success: true }; | ||
| }), | ||
|
|
||
| activate: adminProcedure.input(idSchema).mutation(async ({ input }) => { |
There was a problem hiding this comment.
these state transitions seem to contain a lot of duplication
42e9fb2 to
6ccdb05
Compare
Summary
Phase 5 of
.plans/experimental-models-1.md— admin editing surface for model experiments. Stacked on top of #3299 (schema). Lands the tRPC router, hooks, and a fourth tab inside/admin/gateway(no new sidebar entry, mirroring howcustom-llmsalready lives there).What this PR does not do:
getLiveStats(needs Phase 4 request rows) and nogetPromptByHash(needs the R2 prompt-store module + bucket env var).Backend
apps/web/src/lib/ai-gateway/experiments/upstream-schema.ts—ExperimentUpstreamSchema, the strict subset ofCustomLlmDefinitionSchemathe plan describes. Noapi_key(lives in the siblingencrypted_api_keycolumn) and noextra_headers(deliberately deferred — see §"Phase 1 — Schema + Migration" in the plan).apps/web/src/lib/redis-keys.ts— addsEXPERIMENTED_PUBLIC_IDS_REDIS_KEYandmodelExperimentRedisKey(publicId). Phase 3 will read these; this PR writes them on every routing-affecting mutation so the cache is already correct when routing lands.apps/web/src/lib/redis.ts— addsredisDel(key)helper (alongside existingredisGet/redisSet).apps/web/src/routers/admin/model-experiments-router.ts— full router:list,get,create,update,delete(draft only),activate,pause,complete,setArchivedaddVariant,removeVariant,updateVariantLabel,swapVariantVersion,rotateApiKeyactive, only one routing-relevant experiment perpublic_model_id(DB partial unique index plus a friendly pre-check).encrypted_api_keyis never selected by any read or write that returns to the client.variantVersionPublicColumnsis the explicit non-key projection used everywhere; the column is only written to duringswapVariantVersion/rotateApiKeyafterencryptApiKey().BYOK_ENCRYPTION_KEYmissing →INTERNAL_SERVER_ERRORon key-touching ops, matchingbyok-router.ts:188.Frontend
apps/web/src/app/admin/api/model-experiments/hooks.ts— React Query hooks for every router procedure.apps/web/src/app/admin/model-experiments/ModelExperimentsContent.tsx— list view, inline detail view, create dialog, add-variant dialog, Monaco-based hot-swap dialog (validatesExperimentUpstreamSchemastrict before submit), rotate-key dialog. Status badges, share rendered asweight / sum(weights), structural-edit lock for non-draft.apps/web/src/app/admin/gateway/page.tsx— fourth tab "Model Experiments" added.apps/web/src/app/admin/model-experiments/page.tsx— redirects to/admin/gateway?tab=model-experiments(mirrors/admin/custom-llms).Plan doc
Updated
.plans/experimental-models-1.mdwith concrete Phase 5 status and the deferred items (getLiveStats,getPromptByHash).Verification
pnpm --filter web typecheckpasses.pnpm --filter web lintpasses (0 warnings, 0 errors).pnpm formatclean.pnpm --filter web test -- admin-routerpasses (24/24 unrelated tests still green; no new tests added in this PR — admin router tests for experiments will land alongside Phase 6)./admin/gateway?tab=model-experiments, create a draft experiment, add two variants, hot-swap a version with a JSON body and an api key, confirm the api key is never echoed back, attempt to activate without enough variants and observe the friendly error.Skipped the full
pnpm typecheckbecause the per-package check covers the changes — flagging per AGENTS.md.Visual Changes
A new "Model Experiments" tab appears inside
/admin/gateway. Reuses existing tokens (text-2xl font-boldheading,text-muted-foregroundhelpers, shadcn Dialog/Table/Badge/Button primitives, Monaco editor withvs-darkmatching the custom-LLM editor). No new visual primitives.Reviewer Notes
recomputeExperimentedPublicIdsruns a smallSELECT public_model_id FROM model_experiment WHERE status IN ('active','paused')and writes a JSON array string. At any plausible scale (≪1k experiments) this is fine; if it ever grows, swap toSADD/SREMdeltas.update.public_model_idis gated to draft only. Non-draft routes throughassertDraftbecause changing the public id of a routing-relevant experiment would invalidate the partial unique index and existing buckets.model-experiments-router.test.ts) per the plan; intentionally deferred to keep this PR reviewable.statusis read from the DB as plain text and narrowed viaas Statusin handful of places where TypeScript can't infer the CHECK-constrained shape. Acceptable in this code per AGENTS.md "boundary cast" guidance — the DB column is constrained by a CHECK to exactly those four values.