feat(llm-gateway): complete Cloudflare Worker implementation (Phases 1–7)#752
feat(llm-gateway): complete Cloudflare Worker implementation (Phases 1–7)#752
Conversation
Add llm-gateway Cloudflare Worker package with: - pnpm workspace registration - package.json, tsconfig.json, wrangler.jsonc, eslint.config.mjs - vitest.config.ts / vitest.workers.config.ts (unit + integration) - src/index.ts: Hono app stub returning 501 on POST /chat/completions - src/env.ts: Cloudflare.Env type alias - src/types.ts: HonoContext, Variables, OpenRouterChatCompletionRequest - src/logger.ts: workers-tagged-logger setup - worker-configuration.d.ts: type stub (replace with wrangler types output) - Smoke test confirming worker module loads cleanly pnpm typecheck passes, pnpm test passes.
Remove all bindings (Hyperdrive, KV, services, secrets store, vars) — these will be added incrementally as each phase needs them. Simplify worker-configuration.d.ts stub and env.ts to match.
- Use shared Hyperdrive id (624ec80...dd10) with localConnectionString inline, no dev env override - Create and use real KV namespace id for USER_CACHE_KV (c92d83fa...) - Secret binding name matches secret name: NEXTAUTH_SECRET_PROD (mirrors session-ingest pattern) - Drop env.dev block entirely
- Middleware chain: request-timing, parse-body, extract-ip, resolve-auto-model, auth, anonymous-gate - Routes /api/gateway/chat/completions and /api/openrouter/chat/completions - lib/jwt.ts: verifyGatewayJwt + isPepperValid (jose ERR_JWT_EXPIRED fix) - worker-utils: add userExistsWithCache (single source of truth for KV-backed user existence check); consumed by both session-ingest and llm-gateway - llm-gateway uses shared USER_EXISTS_CACHE KV (ab836697) from session-ingest - Unit tests: jwt, parse-body, models, kilo-auto-model (39 tests)
- lib/rate-limit.ts: KV sliding window (1h/200 free models; 24h/10k promotions) - middleware/free-model-rate-limit.ts: 429 for Kilo free models over limit - middleware/promotion-limit.ts: 401 for anonymous users over promotion limit - middleware/log-free-model-usage.ts: waitUntil DB insert + KV increment - lib/byok.ts: BYOK lookup with Web Crypto AES-256-GCM decryption - lib/providers.ts: Provider type, PROVIDERS map, getProvider() with BYOK/custom LLM/free model routing - lib/provider-specific.ts: applyProviderSpecificLogic() porting all provider sub-modules - lib/tool-calling.ts: normalizeToolCallIds/repairTools/hasAttemptCompletionTool (async Web Crypto SHA-256) - middleware/provider-resolution.ts: pre-fetches secrets in parallel, wires provider result to context - types/hono.ts: added provider, userByok, customLlm, secrets variables - wrangler.jsonc + worker-configuration.d.ts: RATE_LIMIT_KV + provider secrets bindings - 54 tests passing (7 new rate-limit + 8 providers)
- lib/provider-hash.ts: async Web Crypto SHA-256 (generateProviderSpecificHash) - lib/promotions.ts: isActiveReviewPromo, isActiveCloudAgentPromo - lib/extract-headers.ts: getFraudDetectionHeaders, extractProjectHeaders, normalizeProjectId - lib/prompt-info.ts: extractPromptInfo, estimateChatTokens - lib/org-restrictions.ts: checkOrganizationModelRestrictions, getBalanceAndOrgSettings (Drizzle query, no credit expiration) - middleware/request-validation.ts: max_tokens cap, dead model 404, rate-limited-to-death 404 - middleware/balance-and-org.ts: balance check, org model/provider restrictions, data collection gate - middleware/request-transform.ts: safety_identifier, prompt_cache_key, tool repair, provider-specific logic, extraHeaders on context - lib/tool-calling.ts: export ENABLE_TOOL_REPAIR - types/hono.ts: add fraudHeaders, projectId, taskId, editorName, machineId, xKiloCodeVersion, numericKiloCodeVersion, extraHeaders - src/index.ts: wire requestValidationMiddleware, balanceAndOrgCheckMiddleware, requestTransformMiddleware - test/unit/provider-hash.test.ts: 6 tests - test/unit/org-restrictions.test.ts: 11 tests - 71 tests passing, typecheck clean
- handler/proxy.ts: upstream fetch (openRouterRequest + customLlmRequest), abuse classification (non-blocking, 2s timeout), 402→503 conversion, error logging, makeErrorReadable, free model SSE rewrite, pass-through - lib/response-helpers.ts: getOutputHeaders, wrapResponse, makeErrorReadable - lib/rewrite-free-model-response.ts: SSE stream transformer (eventsource-parser), model name rewrite, reasoning_details normalisation, cost field stripping - lib/abuse-service.ts: classifyAbuse + reportAbuseCost via CF Access headers - lib/custom-llm/: full port of customLlmRequest (Vercel AI SDK, Anthropic + OpenAI-compatible, streaming + non-streaming, reasoning, tool calling, temp_phase DB tracking with Web Crypto SHA-256 instead of Node.js crypto.hash) - index.ts: replace notImplemented stub with proxyHandler - package.json: add ai, @ai-sdk/anthropic, @ai-sdk/openai, eventsource-parser - wrangler.jsonc + worker-configuration.d.ts: ABUSE_SERVICE_URL, ABUSE_CF_ACCESS_CLIENT_ID, ABUSE_CF_ACCESS_CLIENT_SECRET bindings - All 71 existing unit tests pass, typecheck clean
…metrics, request logging, abuse cost) - background/usage-accounting.ts: parse SSE/non-streaming response, compute cost, insert microdollar_usage + metadata via CTE, update kilocode_users.microdollars_used, ingest org token usage. Fixes duplicate parseMicrodollarUsageFromString that was left in the stub. - background/api-metrics.ts: drain background stream to extract inferenceProvider, emit ApiMetricsParams to O11Y service binding via POST /ingest/api-metrics - background/request-logging.ts: insert api_request_log for Kilo employees (@kilo.ai / @kilocode.ai emails or KILO_ORGANIZATION_ID) - background/abuse-cost.ts: report upstream cost to abuse service after usage is computed - handler/proxy.ts: rewritten to tee response.body into 4 streams (client + accounting + metrics + logging) using ReadableStream.tee() chains; all background tasks scheduled via ctx.waitUntil() with 25s internal timeout; free model path only tees for metrics - wrangler.jsonc + worker-configuration.d.ts: add O11Y_KILO_GATEWAY_CLIENT_SECRET Secrets Store binding needed for O11Y service auth
… passing) Add comprehensive unit tests covering every middleware and library module: - middleware-chain: full e2e flow with mocked DB/KV/Secrets, health check, 404, body validation, anonymous gate, route parity, missing IP - anonymous-gate: free model anonymous access, paid model 401, authUser passthrough - free-model-rate-limit: KV sliding window 200/hr, per-IP isolation, non-Kilo skip - request-validation: max_tokens overflow, dead free models, rate-limited-to-death - response-helpers: header whitelisting, wrapResponse, BYOK error messages - rewrite-free-model-response: SSE cost stripping, model replacement, reasoning_content→reasoning conversion, JSON non-streaming path - tool-calling: repairTools deduplication/insertion/orphan removal, dropToolStrictProperties, normalizeToolCallIds, hasAttemptCompletionTool - extract-headers: fraud detection headers, project ID normalization (HTTPS/SSH git URLs) - prompt-info: extractPromptInfo with multipart content, estimateChatTokens - feature-detection: validateFeatureHeader with valid/invalid/null inputs - anonymous: createAnonymousContext, isAnonymousContext type guards - promotions: isActiveReviewPromo/isActiveCloudAgentPromo with time mocking - abuse-service: classifyRequest/classifyAbuse/reportCost/reportAbuseCost with mocked fetch, error handling, CF Access headers - request-logging: isKiloEmployee guard (@kilo.ai, @kilocode.ai, org ID), DB insert, error handling - shared test helpers (helpers.ts): JWT signing, mock Env, mock KV/Secrets, SSE stream builders
Use the Workers-native scheduler.wait() API instead of setTimeout for all timer-based patterns. scheduler.wait() is an awaitable alternative that integrates properly with the Workers I/O scheduler. Replaced in: - handler/proxy.ts: withTimeout helper and abuse classification 2s timeout - background/api-metrics.ts: stream read timeout during inference provider extraction - background/usage-accounting.ts: retry backoff delay for DB concurrency failures - worker-configuration.d.ts: added scheduler global type declaration
…etch Add ingestApiMetrics RPC method to the O11Y worker's WorkerEntrypoint, matching the existing ingestSessionMetrics pattern used by session-ingest. This eliminates HTTP routing overhead, JSON serialization, and the X-O11Y-ADMIN-TOKEN auth header for internal service-to-service calls. Changes: - cloudflare-o11y/src/index.ts: add ingestApiMetrics RPC method that validates with ApiMetricsParamsSchema and writes to Analytics Engine + Pipeline - llm-gateway/src/o11y-binding.d.ts: declare O11YBinding type extending Fetcher with the ingestApiMetrics RPC method signature - llm-gateway/src/env.ts: override Cloudflare.Env O11Y type with O11YBinding - llm-gateway/src/background/api-metrics.ts: replace o11y.fetch() HTTP call with o11y.ingestApiMetrics() RPC call, remove clientSecret parameter - llm-gateway/src/handler/proxy.ts: remove o11yClientSecretPromise pre-fetch and clientSecret plumbing, simplify BackgroundTaskParams o11y type
Code Review SummaryStatus: 2 New Issues Found (101 total) | Recommendation: Address before merge Fix these issues in Kilo Cloud Overview
New Issues in This Review PassIssue Details (click to expand)WARNING
Prior Review Issues (99 comments)Many issues from prior review passes have been fixed by the author, including:
Several issues were acknowledged as matching the reference implementation (not regressions):
Outstanding issues from prior passes that remain open:
Files Reviewed (110+ files)New llm-gateway worker (core):
Middleware:
Background tasks:
Libraries:
Durable Objects:
Shared packages:
Other workers (refactored):
Config & tests: 40+ test files, config files (eslint, vitest, wrangler, tsconfig, package.json) |
GIGAPOTATO_API_URL, OPENROUTER_ORG_ID, and ABUSE_SERVICE_URL were plain env vars but should be secrets. Move them to secrets_store_secrets and call .get() at the callsites so the Secrets Store Fetcher objects are resolved to strings before use.
Replace .tee()-based response body splitting with a buffer-and-replay approach for background tasks. With .tee(), all stream branches share the same source — if any consumer (usage accounting DB writes, request logging) stalls, backpressure propagates to the client stream, causing periodic 5s freezes during streaming responses. Now the pass-through path pipes upstream chunks through a TransformStream directly to the client while accumulating a copy. Background tasks (accounting, metrics, logging) replay the buffered data after the stream completes, fully decoupled from client delivery. Also: restore O11Y service binding (service name: o11y), cancel unconsumed tee branches in free-model and fallback paths.
- Type-safe ReadableStream readers (cast to ReadableStream<Uint8Array>)
- Remove unnecessary type assertions (as string, as number, as unknown[])
- Replace non-null assertions with local variables or typeof narrowing
- Use import type for type-only imports
- Await async applyProviderSpecificLogic (was a floating promise)
- Remove redundant union type ('vercel' | string → string)
Separate RATE_LIMIT_KV from USER_EXISTS_CACHE into its own KV namespace (llm-gateway-rate-limit, b22ee150a8fb4f63970bd3ff69f23e4d).
Move ApiMetricsParamsSchema and SessionMetricsParamsSchema from local definitions in cloudflare-o11y and hand-maintained .d.ts files into @kilocode/worker-utils/src/o11y-schemas.ts as the single source of truth. Consumers updated: - cloudflare-o11y: imports schema + inferred types from worker-utils - llm-gateway: o11y-binding.d.ts uses ApiMetricsParams from worker-utils - cloudflare-session-ingest: o11y-binding.d.ts uses SessionMetricsParams - Deleted cloudflare-o11y/src/session-metrics-schema.ts (moved)
…mers Export ApiMetricsParams/SessionMetricsParams as z.input (accepts undefined for fields with .default()) so callers don't need ?? '' workarounds. Add ApiMetricsParamsParsed/SessionMetricsParamsParsed (z.infer) for post-.parse() consumers in o11y analytics.
Reference calls usageLimitExceededResponse() which queries credit_transactions to distinguish first-time users from returning payers: - first-time: "Paid Model - Credits Required" + credits-required message - returning: "Low Credit Warning!" + add-credits message Worker was always returning the first-time variant, misidentifying returning users. Add hasUserMadePaidTopup() (mirrors summarizeUserPayments) and branch on the result.
Reference uses Sentry for: invalid JSON parse errors, upstream 402 (converted to 503) and upstream 5xx responses. Worker was using console.error/console.warn only, creating a blind spot in production. - Add @sentry/cloudflare dependency - Create src/lib/sentry.ts with captureException helper - Wrap Hono app with withSentry() in index.ts (uses same DSN as reference) - Capture invalid JSON parse exceptions in parse-body middleware - Capture 402-to-503 conversion and upstream 5xx errors in proxy handler - Capture unhandled errors in app.onError
d6dc4f1 to
3ef5600
Compare
| // Port of src/lib/processUsage.ts — simplified: | ||
| // - No Sentry spans/captures (use console.error/warn) | ||
| // - No PostHog first-usage events | ||
| // - No KiloPass threshold check |
There was a problem hiding this comment.
[WARNING]: Stale comments — the file header claims "No PostHog first-usage events" and "No KiloPass threshold check", but the code below actually implements both features (PostHog events around line 729+, KiloPass threshold check around line 545+). These misleading comments should be removed or updated to reflect the current implementation.
The reference route.ts calls modelDoesNotExistResponse() which returns
HTTP 404 with { error: 'Model not found', message: 'The requested model
could not be found.' }. The worker was returning HTTP 400 with
{ error: 'model is required' } — wrong status code, wrong error text,
and missing message field.
- Add @sentry/cloudflare stub (no-op withSentry + captureException) to
vitest alias config, matching the existing cloudflare:workers stub
pattern. Fixes 4 suites that couldn't load due to transitive import.
- Fix free-model-rate-limit test: expected body.error.code but the
middleware (correctly) returns { error: string, message: string },
not a nested object.
- Update middleware-chain test for the 404 model-not-found change.
…ateway and /api/openrouter
The reference validates that the [...path] portion is exactly
/chat/completions and returns invalidPathResponse() (HTTP 400) for
anything else. The worker was falling through to the generic notFound
handler returning HTTP 404 { error: 'Not found' }.
Now requests to e.g. /api/gateway/other or /api/openrouter/v1/models
return HTTP 400 with the exact reference error shape. Truly unknown
paths (outside /api/gateway/ and /api/openrouter/) still return 404.
…ponse The reference modelNotAllowedResponse() returns: error: 'Model not allowed for your team.' message: 'The requested model is not allowed for your team.' The worker was using restrictionError.message for both fields, producing identical values. Now the message field matches the reference.
… new users The reference usageLimitExceededResponse() appends 'Get $20 free on your first topup!' for users with no payment history. The worker was omitting this, showing only the generic 'you need to add credits' text.
…ilo free models The reference makeErrorReadable() checks whether the estimated token count (JSON.stringify(request).length/4 + max output tokens) exceeds the model's context_length for Kilo free models and returns a clear user-facing message. The worker had a placeholder comment instead. - Add context_length to KiloFreeModel type and all model entries - Add getKiloFreeModelContextLength() lookup - Add estimateTokenCount() matching the reference algorithm - Implement the context-length check in makeErrorReadable() - Destructure requestedModel (was in type but not used)
The reference returns stealthModelError() for Kilo stealth models
(inference_providers includes 'stealth'), producing
{ error: 'Stealth model unable to process request', message: same }
with the upstream status code preserved. The worker had no stealth
model handling at all.
- Add isKiloStealthModel() to models.ts
- Add stealth check as the final fallback in makeErrorReadable(),
matching the reference ordering (BYOK → context-length → stealth)
Port the reference shouldRouteToVercel() logic that routes ~10% of eligible non-BYOK requests to Vercel AI Gateway, increasing to ~90% when OpenRouter error rate exceeds 50% (automatic failover). - Add vercel-routing.ts with shouldRouteToVercel(), getGatewayErrorRate(), and deterministic SHA-256 hash-based routing using taskId or userId - Add preferredModels list to models.ts (tested/recommended models) - Wire into getProvider() step 3 (after BYOK + custom LLM, before free model gateway routing) - Pass randomSeed (x-kilocode-taskid header or user.id) from provider-resolution middleware Exclusions match the reference: Anthropic models, data_collection=deny, non-preferred models, openrouter/* prefix, Kilo free models with non-openrouter gateways.
| id: 'custom', | ||
| apiUrl: customLlmRow.base_url, | ||
| apiKey: customLlmRow.api_key, | ||
| hasGenerationEndpoint: true, |
There was a problem hiding this comment.
[WARNING]: hasGenerationEndpoint: true is incorrect for custom LLMs.
Custom LLMs don't expose an OpenRouter-style /generation?id= endpoint. Setting this to true causes fetchGeneration in usage-accounting.ts (line 815) to make a spurious HTTP request to customLlmRow.base_url/generation?id=<messageId> after every successful custom LLM request.
While the call is wrapped in a try/catch (line 835) so it won't crash, it:
- Adds unnecessary latency to the background accounting task
- Sends unexpected requests to the customer's custom LLM endpoint
- Logs a warning on every custom LLM request, creating noise
| hasGenerationEndpoint: true, | |
| apiKey: customLlmRow.api_key, | |
| hasGenerationEndpoint: false, |
| /** RPC method called by llm-gateway via service binding. */ | ||
| async ingestApiMetrics(params: ApiMetricsParams): Promise<void> { | ||
| const parsed = ApiMetricsParamsSchema.parse(params); | ||
| writeApiMetricsDataPoint(parsed, 'kilo-gateway', this.env, (p) => this.ctx.waitUntil(p)); |
There was a problem hiding this comment.
[WARNING]: writeApiMetricsDataPoint is not awaited, unlike the sibling ingestSessionMetrics method (line 30) which awaits writeSessionMetricsDataPoint.
writeApiMetricsDataPoint is synchronous for the Analytics Engine write but calls waitUntil for the Stream send. Since this is an RPC method called via service binding, the caller (sendApiMetrics in api-metrics.ts) awaits the result. The method will return before the Stream send completes, which is likely fine since waitUntil extends the execution context. However, the inconsistency with ingestSessionMetrics is worth noting — if writeApiMetricsDataPoint ever becomes async (e.g., for error handling), the missing await would silently swallow errors.
|
|
||
| const app = new Hono<HonoContext>(); | ||
|
|
||
| app.use('*', useWorkersLogger('llm-gateway') as Parameters<typeof app.use>[1]); |
There was a problem hiding this comment.
[WARNING]: as Parameters<typeof app.use>[1] cast violates the project coding rule to "STRONGLY AVOID typescript's as operator".
Consider using a type-safe wrapper or satisfies instead. If the type mismatch is due to workers-tagged-logger returning a handler typed against an older Hono version, a thin adapter function would be safer:
const logger = useWorkersLogger('llm-gateway');
app.use('*', (c, next) => logger(c, next));Move the 402→503 conversion after scheduleBackgroundTasks() so that API metrics, usage accounting, and request logging are emitted even when the upstream provider returns 402 Payment Required. Previously the 402 check returned early before any background tasks were scheduled, causing metrics gaps. The reference implementation (route.ts) always calls emitApiMetricsForResponse() before the 402 check — this change matches that behavior. Add proxy-402.test.ts with tests confirming: - Background tasks are scheduled before the 503 response is returned - BYOK 402 responses are NOT converted to 503 (pass through) - Non-402 errors also schedule background tasks
The free model response path in proxy.ts was explicitly setting accountingStream: null and loggingStream: null, skipping both usage accounting and request logging. The reference implementation (route.ts) calls accountForMicrodollarUsage() and handleRequestLogging() for ALL non-error responses including free models. Now the free model path provides accounting and logging streams gated by !isAnon, matching the paid path behavior. Anonymous free model requests still correctly skip accounting and logging (only metrics). Add tests confirming: - Authenticated free model requests get all three streams - Anonymous free model requests get only metricsStream
…uto-models When the client requests kilo/auto*, resolveAutoModelMiddleware mutates requestBody.model to the resolved model (e.g., anthropic/claude-sonnet-4.6). The API metrics code was using requestBody.model as requestedModel, which lost the original kilo/auto* identifier. The reference (route.ts:399) captures requestedModelLowerCased BEFORE auto-model resolution and passes that to emitApiMetricsForResponse. Fix: use autoModel ?? resolvedModel instead of requestBody.model. When autoModel is set (kilo/auto*), it preserves the original identifier; when null (non-auto requests), resolvedModel IS the requested model. Add background-tasks.test.ts verifying both paths.
…o suffixes The reference (route.ts:400) passes normalizeModelId(originalModelIdLowerCased) which strips :free, :exacto etc. suffixes from the resolvedModel in API metrics. The worker was passing resolvedModel as-is, preserving the :free suffix. - Export normalizeModelId() from models.ts (port of src/lib/model-utils.ts) - Replace local copy in org-restrictions.ts with the shared export - Apply normalizeModelId() to resolvedModel in background-tasks.ts metrics Add tests verifying :free and :exacto suffixes are stripped, and models without colon suffixes are unchanged.
The reference (route.ts:220) awaits logFreeModelRequest() synchronously before the upstream request, ensuring the rate-limit entry is counted even if the upstream fails. The worker was firing the DB insert via waitUntil() non-blocking. Now the DB insert is awaited before next(), matching the reference. DO increments remain non-blocking (worker-specific optimization). DB insert failure is caught and logged but does not block the request. Add log-free-model-usage.test.ts verifying: - DB insert happens before the handler (next()) - Handler still runs if DB insert fails - Non-free models skip the insert entirely
The PostHog API key was fetched via fire-and-forget .then() without being awaited. Whether posthogApiKey was populated when background tasks captured it into bgCommon depended on microtask ordering — first-usage PostHog events could silently drop if the Secrets Store round-trip hadn't completed. Now POSTHOG_API_KEY.get() runs in parallel with ABUSE_SERVICE_URL.get() inside a single Promise.all(), guaranteeing both are resolved before the upstream request and bgCommon construction. Still fail-open on error.
Reference (route.ts:276) defaults to false when transforms array is absent. Worker was defaulting to null, which could violate a NOT NULL constraint on the DB column.
The reference (creditTransactions.ts:58) uses notExists(kilo_pass_issuance_items) to exclude KiloPass bonus credits from the payment count. Without this, a user who received only KiloPass bonus credits would be classified as a 'returning user' (no first-topup bonus shown) in the worker, but as a 'new user' (first-topup bonus shown) in the reference. Add the notExists subquery to hasUserMadePaidTopup, matching the reference's summarizeUserPayments behavior.
Use the same pattern as vercel-routing.test.ts: cast globalThis to Record<string, unknown> once and check via the alias, avoiding the TS2339 'scheduler does not exist on typeof globalThis' error.
| const db = getWorkerDb(c.env.HYPERDRIVE.connectionString); | ||
|
|
||
| const rows = await db | ||
| .select() |
There was a problem hiding this comment.
[WARNING]: select() with no column arguments fetches ALL columns from kilocode_users. This includes potentially large/sensitive fields that are never used by the gateway (e.g., profile data). Consider selecting only the columns actually needed downstream:
.select({
id: kilocode_users.id,
api_token_pepper: kilocode_users.api_token_pepper,
google_user_email: kilocode_users.google_user_email,
microdollars_used: kilocode_users.microdollars_used,
total_microdollars_acquired: kilocode_users.total_microdollars_acquired,
is_admin: kilocode_users.is_admin,
})This reduces data transfer from Postgres and avoids leaking unnecessary PII into the request context.
| let response: Response; | ||
| if (customLlm) { | ||
| const db = getWorkerDb(c.env.HYPERDRIVE.connectionString); | ||
| const isLegacyExtension = !!fraudHeaders.http_user_agent?.startsWith('Kilo-Code/'); |
There was a problem hiding this comment.
[WARNING]: fraudHeaders is read from context here (line 76), but it was set by requestTransformMiddleware which runs before proxyHandler. However, isLegacyExtension checks fraudHeaders.http_user_agent — if requestTransformMiddleware ever fails to set fraudHeaders, this would throw a runtime error accessing .http_user_agent on undefined. The middleware chain ordering is correct today, but this implicit dependency is fragile. Consider adding a null check:
const isLegacyExtension = !!fraudHeaders?.http_user_agent?.startsWith('Kilo-Code/');| requestedModel: resolvedModel, | ||
| request: requestBody, | ||
| response: new Response(errorBodyBytes, response), | ||
| isUserByok: !!userByok, |
There was a problem hiding this comment.
[WARNING]: new Response(errorBodyBytes, response) passes the original response as the ResponseInit. The Response constructor accepts { status, statusText, headers } from the second argument. Since response is a full Response object (not a plain ResponseInit), this relies on the runtime extracting only the relevant fields. While this works in practice on Cloudflare Workers, it's an implicit behavior. Consider being explicit:
new Response(errorBodyBytes, { status: response.status, statusText: response.statusText, headers: response.headers })| function extractMessageTextParts(content: unknown): string[] { | ||
| if (typeof content === 'string') return [content]; | ||
| if (!Array.isArray(content)) return []; | ||
| return (content as Array<Record<string, unknown>>) |
There was a problem hiding this comment.
[WARNING]: as Array<Record<string, unknown>> cast violates the project coding rule to "STRONGLY AVOID typescript's as operator". Since content is already narrowed to Array by the Array.isArray(content) check, you can use a type guard in the filter predicate without the cast:
return content
.filter(
(part): part is { type: string; text: string } =>
part !== null &&
typeof part === 'object' &&
('type' in part) &&
(part.type === 'input_text' || part.type === 'output_text') &&
('text' in part) &&
typeof part.text === 'string'
)
.map(p => p.text);| console.debug('[getGatewayErrorRate] query timeout'); | ||
| return fallback; | ||
| } | ||
| const rows = result.rows as unknown as Array<{ gateway: string; errorRate: number }>; |
There was a problem hiding this comment.
[WARNING]: as unknown as Array<{ gateway: string; errorRate: number }> is a double as cast that violates the project coding rules. The db.execute generic parameter already types the rows, so this cast is unnecessary. Consider:
const rows = result.rows;
return {
openrouter: rows.find(r => r.gateway === 'openrouter')?.errorRate ?? 0,
vercel: rows.find(r => r.gateway === 'vercel')?.errorRate ?? 0,
};If the types don't align, use satisfies or a runtime check instead of as unknown as.
| const providerConfig: OpenRouterProviderConfig = {}; | ||
|
|
||
| if (params.organizationPlan === 'enterprise' && providerAllowList.length > 0) { | ||
| const requiredProviders = extraRequiredProviders(normalizedModelId); |
There was a problem hiding this comment.
[WARNING]: extraRequiredProviders is called with normalizedModelId (which strips the colon suffix via normalizeModelId), but the kiloFreeModelProviders map keys include colons (e.g., corethink:free, x-ai/grok-code-fast-1:optimized:free). This means the lookup will NEVER match for models like corethink:free (normalized to corethink) or x-ai/grok-code-fast-1:optimized:free (normalized to x-ai/grok-code-fast-1). The provider restriction check for enterprise orgs will silently skip these models.
While the previous reviewer noted this is "safe because the default behavior is permissive", this means enterprise orgs with a provider_allow_list that excludes corethink or stealth will NOT be blocked from using those free models — the restriction is silently bypassed. This could be a compliance issue for enterprise customers who expect their allow list to be enforced.
| ): Promise<Response> { | ||
| const maxElapsedMs = 20_000; | ||
| const startedAt = Date.now(); | ||
| let nextDelayMs = 200 * (1 + (Math.random() - 0.5) / 10); |
There was a problem hiding this comment.
[WARNING]: fetchWithBackoff retries on ALL 4xx responses (r.status >= 400), including 401 (auth failure) and 403 (forbidden). Auth failures will never succeed on retry. Consider only retrying on 404 and 5xx:
r => r.status === 404 || r.status >= 500| requested_model: usageContext.requested_model, | ||
| cache_discount: usageStats.cacheDiscount_mUsd ?? null, | ||
| has_error: usageStats.hasError, | ||
| abuse_classification: 0, |
There was a problem hiding this comment.
[WARNING]: abuse_classification is hardcoded to 0 here, even though usageContext.abuse_request_id is available. The reference implementation stores the abuse classification request ID in this field. This means the abuse_classification column in microdollar_usage will always be 0, making it impossible to correlate usage records with abuse classifications in the database.
| abuse_classification: 0, | |
| abuse_classification: usageContext.abuse_request_id ?? 0, |
| } | ||
|
|
||
| // 3. Vercel AI Gateway A/B routing (non-BYOK, non-custom-LLM) | ||
| if (await shouldRouteToVercel(db, requestedModel, request, randomSeed)) { |
There was a problem hiding this comment.
[WARNING]: shouldRouteToVercel is called with requestedModel (already lowercased by parseBodyMiddleware), but preferredModels in models.ts contains mixed-case entries like anthropic/claude-opus-4.6. The preferredModels.includes(requestedModel) check in vercel-routing.ts:109 will fail for any model whose preferredModels entry uses different casing than the lowercased requestedModel. Currently all entries happen to be lowercase, but this is fragile — adding a model with uppercase characters to preferredModels would silently break Vercel routing for it.
| } | ||
|
|
||
| function isToolMessage(msg: ChatMessage): msg is ToolMessage { | ||
| return msg.role === 'tool' && typeof (msg as Record<string, unknown>).tool_call_id === 'string'; |
There was a problem hiding this comment.
[WARNING]: as Record<string, unknown> cast violates the project coding rules. This can be replaced with an in check:
| return msg.role === 'tool' && typeof (msg as Record<string, unknown>).tool_call_id === 'string'; | |
| return msg.role === 'tool' && 'tool_call_id' in msg && typeof msg.tool_call_id === 'string'; |
Switch from the shared Next.js Sentry DSN to a dedicated llm-gateway project. Enable sendDefaultPii for automatic IP collection on events.
| }, | ||
| { | ||
| public_id: 'minimax/minimax-m2.5:free', | ||
| context_length: 204_800, |
There was a problem hiding this comment.
[WARNING]: context_length mismatch — minimax/minimax-m2.5:free is 204_800 here but 1_000_000 in providers.ts. The value here feeds getKiloFreeModelContextLength() → makeErrorReadable(), which will incorrectly report context-length errors at ~205K tokens when the model actually supports 1M tokens. This will cause false "context length exceeded" error messages for users with large prompts between 205K and 1M tokens.
| context_length: 204_800, | |
| context_length: 1_000_000, |
| } catch (err) { | ||
| if (err instanceof Error) { | ||
| // jose uses error.code for JWT-specific errors | ||
| if ((err as { code?: string }).code === 'ERR_JWT_EXPIRED') { |
There was a problem hiding this comment.
[SUGGESTION]: as cast violates the project coding rule to "STRONGLY AVOID typescript's as operator". Since err is already narrowed to Error by the instanceof check, you can use an in guard instead:
| if ((err as { code?: string }).code === 'ERR_JWT_EXPIRED') { | |
| if ('code' in err && err.code === 'ERR_JWT_EXPIRED') { |
| export const extractIpMiddleware = createMiddleware<HonoContext>(async (c, next) => { | ||
| // CF-Connecting-IP is the authoritative source on Cloudflare Workers | ||
| const cfIp = c.req.header('CF-Connecting-IP'); | ||
| const xffIp = c.req.header('x-forwarded-for')?.split(',')[0]?.trim(); |
There was a problem hiding this comment.
[WARNING]: x-forwarded-for is trivially spoofable by clients. On Cloudflare Workers, CF-Connecting-IP is the only trustworthy source — it's set by Cloudflare's edge and cannot be forged. If CF-Connecting-IP is absent, the request likely didn't traverse Cloudflare's edge (e.g., direct-to-origin), so trusting x-forwarded-for means an attacker could spoof their IP to bypass rate limiting (freeModelRateLimitMiddleware, promotionLimitMiddleware) and abuse detection.
Consider either:
- Rejecting requests without
CF-Connecting-IP(they shouldn't reach a Workers deployment normally) - Or at minimum, logging a warning when falling back to
x-forwarded-forso it's visible in monitoring
- Include test/integration/**/*.test.ts in Vitest config and remove the previous exclusion to enable running integration tests - Add POSTHOG_API_KEY to test makeEnv helper for analytics-dependent tests
| const effectiveThreshold = getEffectiveKiloPassThreshold(user.kiloPassThreshold ?? null); | ||
| if (effectiveThreshold === null || user.microdollarsUsed < effectiveThreshold) return; | ||
|
|
||
| const subscriptionState = await getKiloPassStateForUser(tx as unknown as Tx, kiloUserId); |
There was a problem hiding this comment.
[WARNING]: Triple as unknown as Tx cast violates the project coding rule to "STRONGLY AVOID typescript's as operator".
This pattern appears three times (lines 510, 512, 516). The Tx type is extracted via a conditional type from WorkerDb['transaction'], but the Drizzle transaction callback parameter should already be compatible. Consider defining Tx directly from the transaction callback's parameter type, or using a wrapper function that accepts the transaction and calls the inner functions, avoiding the cast entirely:
type Tx = Parameters<Parameters<WorkerDb['transaction']>[0]>[0];If the types genuinely don't align, a single cast at the db.transaction call site (wrapping the callback) would be cleaner than casting on every usage.
| if (!anchor) return null; | ||
| // currentPeriodStart = nextYearlyIssueAt - 1 month (or startedAt) | ||
| const currentPeriodStart = nextYearlyIssueAtIso | ||
| ? addMonths(new Date(nextYearlyIssueAtIso), -1) |
There was a problem hiding this comment.
[WARNING]: new Date(nextYearlyIssueAtIso) bypasses the parseIso validation used on line 308.
If nextYearlyIssueAtIso is a truthy but malformed string (e.g., "not-a-date"), parseIso on line 308 returns null and anchor falls through to startedAtIso. But on line 311, the truthy check passes and new Date(nextYearlyIssueAtIso) produces Invalid Date, which propagates through addMonths → computeIssueMonth producing "NaN-NaN-01" as the issue month.
Use the already-validated result instead:
| ? addMonths(new Date(nextYearlyIssueAtIso), -1) | |
| ? addMonths(parseIso(nextYearlyIssueAtIso) ?? anchor, -1) |
Or reuse the parseIso result from line 308 by extracting it into a local variable.
Summary
llm-gatewayCloudflare Worker — a transparent drop-in replacement for the Next.js/api/openrouter/[...path]/route.tsproxyPhases
5a6b119775f8407ce0844faeacf184202a86121e81b31334aFollow-up improvements (also in this PR)
b07e629— Replace allsetTimeoutwith Workers-nativescheduler.wait()b5eaf47— Convert O11Y service binding from.fetch()HTTP to RPC (ingestApiMetricsmethod on WorkerEntrypoint), eliminating HTTP routing overhead and admin token auth for internal callsArchitecture
c.set()/c.get()on Hono contextafter()→ctx.waitUntil(),response.clone()→ReadableStream.tee(),setTimeout→scheduler.wait(), Sentry → structured loggingKey files