Skip to content

feat(ai-gateway): expose Terminal Bench model stats#3703

Open
lambertjosh wants to merge 3 commits into
mainfrom
feat/terminal-bench-model-details
Open

feat(ai-gateway): expose Terminal Bench model stats#3703
lambertjosh wants to merge 3 commits into
mainfrom
feat/terminal-bench-model-details

Conversation

@lambertjosh
Copy link
Copy Markdown
Contributor

Summary

  • Enrich the existing OpenRouter model catalog with optional Terminal Bench 2.0 completion and per-attempt cost metadata.
  • Publish only active, non-stealth results with at least five attempts and a complete-attempt cost.
  • Cache the compact benchmark map for five minutes and fail open to stale or empty data so stats lookup failures never block model selection.

Verification

  • Not manually verified locally.

Visual Changes

N/A

Reviewer Notes

  • Prefer deploying this catalog enrichment before the client PR, although either order is safe because the new field is optional.
  • Metrics attach only to canonical OpenRouter catalog entries, with the existing safe outer kilo/ prefix fallback.

@lambertjosh lambertjosh marked this pull request as ready for review June 5, 2026 20:49
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Jun 5, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

PR cleanly enriches the OpenRouter model catalog with optional Terminal Bench summary data, using a well-structured 5-minute in-process cache with fail-open semantics; no security, correctness, or performance issues found.

Files Reviewed (5 files)
  • apps/web/src/lib/model-stats/terminal-bench.ts
  • apps/web/src/lib/model-stats/terminal-bench.test.ts
  • apps/web/src/lib/ai-gateway/providers/openrouter/index.ts
  • apps/web/src/lib/organizations/organization-types.ts
  • apps/web/src/tests/openrouter-models.test.ts
Notes (non-blocking)
  • terminal-bench.ts:34-35: bench.avgAttemptCostUsd === null || bench.avgAttemptCostUsd === undefined is equivalent to bench.avgAttemptCostUsd == null — minor style nit, not worth changing.
  • openrouter-models.test.ts:112-115: model could be undefined if 'some-other-model' is absent from mockOpenRouterModels, which would cause an unhelpful TypeError rather than a meaningful assertion failure. Consider adding expect(model).toBeDefined() before the terminalBench assertion for better failure messages.
  • The terminalBenchFor mock in the integration test bypasses prefix-stripping logic, which is acceptable for the scope being tested (direct ID match), but the unit tests in terminal-bench.test.ts cover that path correctly.

Fix these issues in Kilo Cloud


Reviewed by claude-4.6-sonnet-20260217 · 553,311 tokens

Review guidance: REVIEW.md from base branch main

…h-cloud

# Conflicts:
#	apps/web/src/lib/ai-gateway/providers/openrouter/index.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant