-
Notifications
You must be signed in to change notification settings - Fork 8
feat(llm-gateway): complete Cloudflare Worker implementation (Phases 1–7) #752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
iscekic
wants to merge
82
commits into
main
Choose a base branch
from
feat/llm-gateway
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
5a6b119
feat(llm-gateway): phase 1 scaffolding
iscekic d11d11e
chore(llm-gateway): strip wrangler.jsonc to bare minimum
iscekic 55ad075
chore(llm-gateway): tidy wrangler.jsonc bindings
iscekic 775f840
feat(llm-gateway): phase 2 — request parsing, auth, anonymous gate
iscekic 7ce0844
feat(llm-gateway): Phase 3 — rate limiting + provider resolution
iscekic faeacf1
Phase 4: balance/org checks, request validation, request transform
iscekic 84202a8
feat(llm-gateway): Phase 5 — upstream proxy + response handling
iscekic 6121e81
feat(llm-gateway): Phase 6 — background tasks (usage accounting, api …
iscekic b31334a
feat(llm-gateway): Phase 7 — testing + parity verification (168 tests…
iscekic b07e629
refactor(llm-gateway): replace setTimeout with scheduler.wait
iscekic b5eaf47
refactor(llm-gateway): use O11Y service binding RPC instead of HTTP f…
iscekic c10a8a0
chore(llm-gateway): configure custom domain and remove dev settings
iscekic 1e14ac4
refactor(llm-gateway): move vars to Secrets Store bindings
iscekic ada05a2
fix(llm-gateway): eliminate .tee() backpressure stalling client stream
iscekic 75a1a56
chore(llm-gateway): fix all lint errors across source files
iscekic fec8eb9
chore(llm-gateway): use dedicated KV namespace for RATE_LIMIT_KV
iscekic d226c64
refactor: extract O11Y schemas to @kilocode/worker-utils
iscekic 6a88d63
fix: use z.input for O11Y schema types, add Parsed variants for consu…
iscekic 0af8fac
Merge branch 'main' into feat/llm-gateway
iscekic 8108f0a
Revert unnecessary changes to SessionIngestDO.ts
iscekic f197802
Regenerate llm-gateway/worker-configuration.d.ts via wrangler types
iscekic d632dad
Remove redundant comment and empty vars from llm-gateway wrangler.jsonc
iscekic 9a54652
Remove dead llm-gateway/src/types.ts (superseded by src/types/)
iscekic 7daef78
Stop leaking err.message to clients in onError handler
iscekic cd488ee
Remove unused logger.ts singleton (console.* is intercepted by worker…
iscekic b3ab5b8
Remove unused /health endpoint from llm-gateway
iscekic 76592e9
Tighten auth middleware: reject invalid tokens, remove redundant user…
iscekic a4ecbc2
Remove USER_EXISTS_CACHE KV binding from llm-gateway (no longer used …
iscekic 8e735ff
Remove outdated promotion-limit comment from anonymous-gate
iscekic 6c547ea
Remove 'as' cast in balance-and-org by narrowing status type at the s…
iscekic 275ad5b
Remove phase references from request-transform comments
iscekic e20d58c
Refactor proxy.ts: extract background tasks, fix error logging, remov…
iscekic 2c45058
Remove dead abuse-cost.ts (logic is inline in background-tasks.ts)
iscekic ab0449d
Clean up api-metrics.ts: reuse getToolsAvailable, remove casts, widen…
iscekic 18af05d
Remove stale cross-project path reference from request-logging comment
iscekic 893e936
Remove redundant casts from usage-accounting.ts
iscekic f8b3f17
Replace Vercel platform headers with Cloudflare request.cf geo data
iscekic f456dec
Address remaining bot review comments
iscekic 939711a
Remove stale @ts-expect-error directives for workers-tagged-logger
iscekic c367f99
Use constant-time comparison for JWT pepper validation
iscekic 212b13b
fix(llm-gateway): fix zai double push and wrap parseAwsCredentials in…
iscekic 176291e
fix(llm-gateway): remove as cast in isAnonymousContext
iscekic 8bb7bd3
fix(llm-gateway): validate max_completion_tokens in addition to max_t…
iscekic ff7bacd
fix(llm-gateway): verify org membership before granting custom LLM ac…
iscekic a1b7336
fix(llm-gateway): replace KV rate limiter with Durable Object
iscekic 4911edc
fix(llm-gateway): fix eslint errors
iscekic f05c021
fix(llm-gateway): fix rate limit double-counting in Durable Object
iscekic 71862d2
fix(llm-gateway): background tasks, TTFB, toolsUsed, query params, cl…
iscekic e5228c5
fix(llm-gateway): match rate limit error codes and messages to reference
iscekic ef728fe
fix(llm-gateway): use app.kilo.ai/profile for buyCreditsUrl in 402 re…
iscekic 72f8e57
fix(llm-gateway): scope free_model_usage logging to Kilo-hosted model…
iscekic bd94d1d
fix: move freeModelRateLimitMiddleware before authMiddleware
iscekic f9ff2a1
feat: add generation endpoint refetch for accurate cost/token data
iscekic b0d82a2
feat: add KiloPass threshold check and bonus credit issuance
iscekic 8879ea1
feat: add PostHog first_usage and first_microdollar_usage event tracking
iscekic 0ab25db
fix(llm-gateway): match invalid JSON error shape to reference
iscekic 9363967
fix(llm-gateway): match 402 balance error title/message to reference
iscekic 3ef5600
fix(llm-gateway): add Sentry error observability
iscekic 2964a89
fix(llm-gateway): return 404 for missing/empty model to match reference
iscekic cce57ac
fix(llm-gateway): fix all pre-existing test failures (169/169 passing)
iscekic 8b2a8bf
fix(llm-gateway): return 400 invalid-path for sub-routes under /api/g…
iscekic 5bc3dec
fix(llm-gateway): use distinct error/message in model-not-allowed res…
iscekic a440305
fix(llm-gateway): include first-topup bonus amount in 402 message for…
iscekic c4ff5be
fix(llm-gateway): add context-length exceeded error translation for K…
iscekic 4ce8d26
fix(llm-gateway): add stealth model error handling in makeErrorReadable
iscekic 8897f4c
feat(llm-gateway): add Vercel AI Gateway A/B routing
iscekic 0ec7518
fix(B1): emit background tasks for 402 upstream responses
iscekic dd56377
fix(B2): emit accounting and logging for free model responses
iscekic e1d1fc3
fix(B3): use original model id as requestedModel in API metrics for a…
iscekic 5480ac6
fix(B4): normalize resolvedModel in API metrics to strip :free/:exact…
iscekic e9d7c37
fix(B5): await free_model_usage DB insert before upstream request
iscekic 397173e
fix(B8): await POSTHOG_API_KEY fetch to eliminate race condition
iscekic 6b54ee4
fix(B9): default has_middle_out_transform to false instead of null
iscekic da262c6
fix(B10): exclude KiloPass credits from paid top-up check
iscekic 8592163
fix: resolve typecheck errors for scheduler stub in test files
iscekic cddfbae
chore: use dedicated Sentry project for llm-gateway worker
iscekic 614af83
chore: remove tracesSampleRate from Sentry config
iscekic 4086045
chore: simplify deploy script to single env-less command
iscekic faf9a50
test(vitest): run integration tests and add PostHog test key
iscekic 406ce37
fix: resolve eslint errors in llm-gateway
iscekic 80fd849
test(llm-gateway): add request integration tests
iscekic f633a6b
test(llm-gateway): tighten typings and mocks in integration tests
iscekic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Submodule llm-gateway-fixes
added at
d6dc4f
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,3 @@ | ||
| import type { O11YBinding } from './o11y-binding'; | ||
|
|
||
| export type Env = Omit<Cloudflare.Env, 'O11Y'> & { O11Y: O11YBinding }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,41 +1,5 @@ | ||
| /** | ||
| * Augment the wrangler-generated Env to give the O11Y service binding its RPC | ||
| * method types. `wrangler types` only sees `Fetcher` for service bindings; | ||
| * the actual RPC shape comes from the o11y worker's WorkerEntrypoint and is | ||
| * declared here so the generated file can be freely regenerated. | ||
| * | ||
| * Keep in sync with: cloudflare-o11y/src/session-metrics-schema.ts | ||
| */ | ||
| import type { SessionMetricsParams } from '@kilocode/worker-utils'; | ||
|
|
||
| type O11YSessionMetricsParams = { | ||
| kiloUserId: string; | ||
| organizationId?: string; | ||
| sessionId: string; | ||
| platform: string; | ||
| sessionDurationMs: number; | ||
| timeToFirstResponseMs?: number; | ||
| totalTurns: number; | ||
| totalSteps: number; | ||
| toolCallsByType: Record<string, number>; | ||
| toolErrorsByType: Record<string, number>; | ||
| totalErrors: number; | ||
| errorsByType: Record<string, number>; | ||
| stuckToolCallCount: number; | ||
| totalTokens: { | ||
| input: number; | ||
| output: number; | ||
| reasoning: number; | ||
| cacheRead: number; | ||
| cacheWrite: number; | ||
| }; | ||
| totalCost: number; | ||
| compactionCount: number; | ||
| autoCompactionCount: number; | ||
| terminationReason: 'completed' | 'error' | 'interrupted' | 'abandoned' | 'unknown'; | ||
| model?: string; | ||
| ingestVersion: number; | ||
| }; | ||
|
|
||
| type O11YBinding = Fetcher & { | ||
| ingestSessionMetrics(params: O11YSessionMetricsParams): Promise<void>; | ||
| export type O11YBinding = Fetcher & { | ||
| ingestSessionMetrics(params: SessionMetricsParams): Promise<void>; | ||
| }; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| import { dirname } from 'path'; | ||
| import { fileURLToPath } from 'url'; | ||
| import { defineConfig } from 'eslint/config'; | ||
| import baseConfig from '@kilocode/eslint-config'; | ||
|
|
||
| const __dirname = dirname(fileURLToPath(import.meta.url)); | ||
|
|
||
| export default defineConfig([ | ||
| ...baseConfig(__dirname), | ||
| { | ||
| files: ['**/*.ts'], | ||
| rules: { | ||
| '@typescript-eslint/restrict-template-expressions': 'off', | ||
| }, | ||
| }, | ||
| ]); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| { | ||
| "name": "llm-gateway", | ||
| "version": "1.0.0", | ||
| "type": "module", | ||
| "private": true, | ||
| "description": "LLM Gateway Cloudflare Worker — transparent drop-in replacement for /api/openrouter", | ||
| "scripts": { | ||
| "preinstall": "npx only-allow pnpm", | ||
| "deploy": "wrangler deploy", | ||
| "dev": "wrangler dev", | ||
| "start": "wrangler dev", | ||
| "types": "wrangler types", | ||
| "lint": "eslint --config eslint.config.mjs --cache 'src/**/*.ts'", | ||
| "lint:fix": "eslint --config eslint.config.mjs --cache --fix 'src/**/*.ts'", | ||
| "format": "prettier --write 'src/**/*.ts'", | ||
| "format:check": "prettier --check 'src/**/*.ts'", | ||
| "test": "vitest run", | ||
| "test:watch": "vitest", | ||
| "test:integration": "vitest run --config vitest.workers.config.ts", | ||
| "test:integration:watch": "vitest --config vitest.workers.config.ts", | ||
| "typecheck": "tsgo --noEmit --incremental false" | ||
| }, | ||
| "dependencies": { | ||
| "@sentry/cloudflare": "^10.25.0", | ||
| "@ai-sdk/anthropic": "^3.0.41", | ||
| "@ai-sdk/openai": "^3.0.27", | ||
| "@kilocode/db": "workspace:*", | ||
| "@kilocode/encryption": "workspace:*", | ||
| "@kilocode/worker-utils": "workspace:*", | ||
| "ai": "^6.0.78", | ||
| "drizzle-orm": "catalog:", | ||
| "eventsource-parser": "^3.0.6", | ||
| "hono": "catalog:", | ||
| "workers-tagged-logger": "catalog:", | ||
| "zod": "catalog:" | ||
| }, | ||
| "devDependencies": { | ||
| "@cloudflare/vitest-pool-workers": "^0.12.8", | ||
| "jose": "catalog:", | ||
| "@kilocode/eslint-config": "workspace:*", | ||
| "@types/node": "^22", | ||
| "@typescript/native-preview": "7.0.0-dev.20251019.1", | ||
| "@vitest/ui": "^3.2.4", | ||
| "drizzle-kit": "catalog:", | ||
| "eslint": "catalog:", | ||
| "prettier": "catalog:", | ||
| "typescript": "catalog:", | ||
| "vitest": "^3.2.4", | ||
| "wrangler": "catalog:" | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[WARNING]:
writeApiMetricsDataPointis not awaited, unlike the siblingingestSessionMetricsmethod (line 30) whichawaitswriteSessionMetricsDataPoint.writeApiMetricsDataPointis synchronous for the Analytics Engine write but callswaitUntilfor the Stream send. Since this is an RPC method called via service binding, the caller (sendApiMetricsinapi-metrics.ts)awaits the result. The method will return before the Stream send completes, which is likely fine sincewaitUntilextends the execution context. However, the inconsistency withingestSessionMetricsis worth noting — ifwriteApiMetricsDataPointever becomes async (e.g., for error handling), the missingawaitwould silently swallow errors.