Skip to content

Latest commit

 

History

History
1345 lines (1034 loc) · 35.7 KB

File metadata and controls

1345 lines (1034 loc) · 35.7 KB

API Reference

Kolya BR Proxy exposes four API groups:

Group Prefix Auth Purpose
Gateway API (OpenAI) /v1 Authorization: Bearer OpenAI-compatible chat completions
Gateway API (Anthropic) /v1 x-api-key header Anthropic Messages API compatible
Admin API /admin Bearer JWT User management, tokens, usage, audit
Health API /health None Load balancer probes

Base URL examples:

  • Local: http://localhost:8000
  • Production: https://api.kbp.kolya.fun

1. Gateway API (OpenAI-compatible)

All Gateway endpoints require an API token in the Authorization header:

Authorization: Bearer kbr_<your_token>

POST /v1/chat/completions

Create a chat completion. Accepts OpenAI-format requests and proxies them to AWS Bedrock. Supports both Anthropic models (Claude) via InvokeModel API and non-Anthropic models (Amazon Nova, DeepSeek, Mistral, Llama, etc.) via Converse API.

Request body (ChatCompletionRequest):

Field Type Default Description
model string required Bedrock model ID (e.g. global.anthropic.claude-sonnet-4-5-20250929-v1:0, us.amazon.nova-pro-v1:0, deepseek.r1-v1:0)
messages array required Array of ChatMessage objects
stream boolean false Enable SSE streaming
temperature float 1.0 Sampling temperature (0.0 - 2.0)
top_p float 1.0 Nucleus sampling (0.0 - 1.0)
max_tokens integer null Maximum tokens to generate
stop string | array null Stop sequence(s)
n integer 1 Number of choices
presence_penalty float 0.0 Presence penalty (-2.0 to 2.0)
frequency_penalty float 0.0 Frequency penalty (-2.0 to 2.0)
user string null End-user identifier
tools array null Tool/function definitions
tool_choice string | object null Tool selection strategy

Bedrock extension fields (set via body or X-Bedrock-* headers):

Field Header Description
bedrock_guardrail_config X-Bedrock-Guardrail-Id + X-Bedrock-Guardrail-Version Guardrail configuration
bedrock_additional_model_request_fields X-Bedrock-Additional-Fields (JSON) Extra model request fields
bedrock_trace X-Bedrock-Trace Trace mode (ENABLED / DISABLED)
bedrock_performance_config X-Bedrock-Performance-Config (JSON) Performance tuning
bedrock_prompt_caching X-Bedrock-Prompt-Caching (JSON) Prompt caching config

Thinking and effort (via bedrock_additional_model_request_fields):

The gateway supports Anthropic's extended thinking and effort parameters. Pass them through bedrock_additional_model_request_fields:

{
  "bedrock_additional_model_request_fields": {
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "effort": "medium"
  }
}

The effort parameter (low / medium / high) controls how much thinking effort the model uses. The gateway automatically wraps it into output_config.effort and injects the required anthropic_beta flag. When thinking.budget_tokens is set, max_tokens is automatically adjusted to satisfy the max_tokens > budget_tokens constraint.

Headers override body fields when both are present.

ChatMessage schema:

Field Type Description
role string system, user, assistant, or tool
content string | array Text string or array of ContentPart (multimodal)
name string Optional participant name
tool_calls array Tool calls (assistant messages)
tool_call_id string Tool call reference (tool messages)

ContentPart (for multimodal):

{ "type": "text", "text": "describe this image" }
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }

Non-streaming example

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Response (ChatCompletionResponse):

{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming example

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Streaming response (SSE text/event-stream):

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Heartbeat comments (: heartbeat) are sent every 15 seconds to keep the connection alive.

Tool use (function calling)

The gateway supports OpenAI-compatible tool use. Tool call deltas stream incrementally in the same format as OpenAI. The finish_reason is tool_calls when the model invokes a tool.

Error responses

{
  "error": {
    "message": "Token quota exceeded. Used: $5.00, Quota: $5.00",
    "type": "server_error",
    "code": "internal_error",
    "param": null
  }
}
Status Meaning
400 Invalid request (bad model name, malformed body)
401 Missing or invalid API token
403 Token lacks access to requested model
429 Token quota exceeded
500 Internal server error

OpenAI SDK usage

from openai import OpenAI

client = OpenAI(
    api_key="kbr_your_token_here",  # pragma: allowlist secret
    base_url="http://localhost:8000/v1",
)

stream = client.chat.completions.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

1b. Gateway API (Anthropic Messages API Compatible)

All Anthropic-compatible endpoints require an API key in the x-api-key header:

x-api-key: kbr_<your_token>

The same kbr_ API key works for both OpenAI and Anthropic endpoints. The proxy routes based on the endpoint path and authentication header format.

POST /v1/messages

Create a message. Accepts Anthropic Messages API format requests and proxies them to AWS Bedrock. Supports thinking, tool use, prompt caching, and all Anthropic-native features.

Request body (AnthropicMessagesRequest):

Field Type Default Description
model string required Bedrock model ID or Anthropic short name (e.g. global.anthropic.claude-sonnet-4-5-20250929-v1:0 or claude-sonnet-4-5-20250929). The proxy normalizes both formats for access control and routes using the Bedrock ID.
messages array required Array of message objects (role: user or assistant)
max_tokens integer required Maximum tokens to generate
system string | array null System prompt (string or array of content blocks with optional cache_control)
temperature float null Sampling temperature (0.0 - 1.0)
top_p float null Nucleus sampling (0.0 - 1.0)
top_k integer null Top-K sampling
stop_sequences array null Stop sequences
stream boolean false Enable SSE streaming
tools array null Tool definitions (Anthropic format)
tool_choice object null Tool selection strategy
metadata object null Request metadata
thinking object null Extended thinking config ({"type": "enabled"/"adaptive"/"disabled", "budget_tokens": N})

Message content blocks:

{"type": "text", "text": "Hello!"}
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}
{"type": "tool_use", "id": "toolu_...", "name": "get_weather", "input": {"city": "London"}}
{"type": "tool_result", "tool_use_id": "toolu_...", "content": "Sunny, 22C"}

Non-streaming example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response (AnthropicMessagesResponse):

{
  "id": "msg_abc123...",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hello! How can I help you?"}
  ],
  "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 10,
    "output_tokens": 8
  }
}

Streaming example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Streaming response (SSE text/event-stream, Anthropic format):

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"...","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":10,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":8}}

event: message_stop
data: {"type":"message_stop"}

Extended thinking example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 8192,
    "thinking": {"type": "enabled", "budget_tokens": 4096},
    "messages": [{"role": "user", "content": "Solve this step by step: 15 * 27 + 33"}]
  }'

The proxy supports three thinking types:

  • "enabled" - Full extended thinking with visible thinking blocks
  • "adaptive" - Adaptive thinking (signature-only mode)
  • "disabled" - No extended thinking

Note: When sending conversation history that includes thinking or redacted_thinking content blocks, the proxy automatically strips these blocks before forwarding to Bedrock, since Bedrock doesn't support adaptive signature-only thinking blocks in conversation history.

Claude Code CLI compatibility

All tokens are created with the sk-ant-api03 prefix by default, making them compatible with Claude Code CLI and the Anthropic SDK:

curl -X POST http://localhost:8000/admin/tokens \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <jwt_token>" \
  -d '{
    "name": "Claude Code Token",
    "quota_usd": 50.00
  }'

Then configure your environment:

export ANTHROPIC_BASE_URL="https://your-api-domain"
export ANTHROPIC_API_KEY="sk-ant-api03_xxxxxxx"  # pragma: allowlist secret
export CLAUDE_MODEL="us.anthropic.claude-sonnet-4-5-20250514-v1:0"

For persistent configuration, add to ~/.claude/settings.json:

{
  "model": "us.anthropic.claude-sonnet-4-5-20250514-v1:0",
  "smallModel": "your-haiku-model-id",
  "largeModel": "your-opus-model-id"
}

Error responses (Anthropic format)

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid or expired API key"
  }
}
Status Error Type Meaning
400 invalid_request_error Invalid request (bad model name, malformed body)
401 authentication_error Missing or invalid API key
403 permission_error Token lacks access to requested model
429 rate_limit_error Token quota exceeded
500 api_error Internal server error

Anthropic SDK usage

import anthropic

client = anthropic.Anthropic(
    api_key="kbr_your_token_here",  # pragma: allowlist secret
    base_url="http://localhost:8000/v1",
)

message = client.messages.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

# Streaming
with client.messages.stream(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

GET /v1/models

List models the current token has access to. Returns OpenAI-compatible model list.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
      "object": "model",
      "created": 1700000000,
      "owned_by": "anthropic"
    }
  ]
}

2. Admin API

All Admin endpoints (except OAuth login URLs) require a JWT access token:

Authorization: Bearer <jwt_access_token>

Authorization (RBAC)

The Admin API uses role-based access control with two roles:

Role Access
super_admin Full access to all endpoints and resources
admin Scoped access controlled by permissions object

Admin users have a permissions JSON object that controls what they can manage:

Permission Values Controls
manage_api_keys true/"all", [id, ...], false API token CRUD
manage_teams true/"all", [id, ...], false Team CRUD
manage_models true/"all", [id, ...], false Model configuration
view_usage true/false Usage statistics
view_monitor true/false Request monitor
  • true or "all": full access to all resources of that type
  • Array of UUIDs: access only to those specific resources
  • false or missing: no access (403)

Endpoints are guarded by permission requirements:

Endpoint Group Required Permission
/admin/tokens/* manage_api_keys
/admin/teams/* manage_teams
/admin/models/* manage_models
/admin/usage/* view_usage
/admin/monitor/* view_monitor
/admin/users/* super_admin role only
/admin/audit-logs super_admin role only
/admin/audit-logs/activity Any admin

2.1 Authentication

GET /admin/auth/microsoft/login

Get Microsoft OAuth authorization URL.

Parameter In Type Description
redirect_uri query string Redirect URI after authorization

Response:

{
  "authorization_url": "https://login.microsoftonline.com/.../authorize?...",
  "state": "random_csrf_state"
}

POST /admin/auth/microsoft/callback

Handle Microsoft OAuth callback. Creates or links user account.

Parameter In Type Description
code query string Authorization code from Microsoft
redirect_uri query string Redirect URI used in authorization
state query string State parameter for CSRF protection

Response (LoginResponse):

{
  "access_token": "eyJ...",
  "refresh_token": "eyJ...",
  "token_type": "bearer",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "first_name": "John",
    "last_name": "Doe",
    "is_active": true,
    "is_admin": true,
    "role": "admin",
    "permissions": {
      "manage_api_keys": "all", // pragma: allowlist secret
      "manage_teams": ["uuid1", "uuid2"],
      "manage_models": true,
      "view_usage": true,
      "view_monitor": true
    },
    "email_verified": true,
    "current_balance": "5.00"
  }
}

GET /admin/auth/cognito/login

Get AWS Cognito OAuth authorization URL.

Parameter In Type Description
redirect_uri query string Redirect URI after authorization

Response: Same structure as Microsoft login.

POST /admin/auth/cognito/callback

Handle AWS Cognito OAuth callback. Creates or links user account.

Parameter In Type Description
code query string Authorization code from Cognito
redirect_uri query string Redirect URI used in authorization
state query string State parameter for CSRF protection

Response: Same LoginResponse structure as Microsoft callback.

POST /admin/auth/refresh

Refresh access token using refresh token (with automatic rotation).

Request body:

{ "refresh_token": "eyJ..." }

Response: LoginResponse with new access_token and refresh_token.

Old refresh token is invalidated after rotation. Token reuse triggers theft detection.

POST /admin/auth/revoke

Revoke a specific refresh token.

Request body:

{ "refresh_token": "eyJ..." }

Response: { "message": "Refresh token revoked successfully" }

POST /admin/auth/revoke-all

Revoke all refresh tokens for the current user (logout from all devices). Requires JWT.

Response: { "message": "Revoked N refresh tokens successfully" }

GET /admin/auth/me

Get current user information.

Response: UserResponse object (see Microsoft callback response).

PUT /admin/auth/me

Update current user profile.

Request body:

{ "first_name": "Jane", "last_name": "Smith" }

Response: Updated UserResponse.

2.2 Token Management

POST /admin/tokens

Create a new API token. The token prefix is hardcoded as sk-ant-api03 and cannot be configured.

Request body:

{
  "name": "My API Key",
  "description": "Development token for team Alpha",
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": 100.00,
  "monthly_quota_usd": 20.00,
  "monthly_reset_policy": "reset",
  "allowed_ips": ["192.168.1.0/24"],
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}
Field Type Required Description
name string yes Token display name
description string no Token description
expires_at datetime no Expiration timestamp
quota_usd decimal no Total usage quota in USD
monthly_quota_usd decimal no Monthly usage quota in USD
monthly_reset_policy string no "reset" (default) or "rollover"
allowed_ips array no IP allowlist (CIDR)
token_metadata object no Additional config (e.g. prompt_cache_enabled, prompt_cache_ttl)

Response (201): TokenWithKeyResponse -- includes the plain token value. This is the only time the plain token is returned.

{
  "id": "uuid",
  "name": "My API Key",
  "description": "Development token for team Alpha",
  "token": "sk-ant-api03_abc123...",
  "key_prefix": "sk-ant-api03",
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": "100.00",
  "monthly_quota_usd": "20.00",
  "monthly_reset_policy": "reset",
  "used_usd": "0.00",
  "monthly_used_usd": "0.00",
  "daily_used_usd": "0.00",
  "remaining_quota": "100.00",
  "allowed_ips": ["192.168.1.0/24"],
  "allowed_models": [],
  "is_active": true,
  "is_expired": false,
  "is_quota_exceeded": false,
  "created_at": "2026-01-01T00:00:00",
  "last_used_at": null,
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}

GET /admin/tokens

List all tokens for the current user.

Parameter In Type Default Description
include_inactive query boolean false Include inactive/revoked tokens

Response: Array of TokenResponse.

GET /admin/tokens/{token_id}

Get token details by UUID.

PUT /admin/tokens/{token_id}

Update token settings.

Request body (all fields optional):

{
  "name": "Renamed Key",
  "description": "Updated description",
  "expires_at": "2027-06-30T00:00:00",
  "quota_usd": 200.00,
  "monthly_quota_usd": 50.00,
  "monthly_reset_policy": "rollover",
  "allowed_ips": ["10.0.0.0/8"],
  "is_active": true,
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "1h"}
}

DELETE /admin/tokens/{token_id}

Soft-delete a token. Sets is_deleted = true, is_active = false, and invalidates the Redis cache. The token is immediately blocked from API access. Historical usage data is preserved. Returns 204 No Content.

POST /admin/tokens/{token_id}/revoke

Deactivate a token (can be reactivated via PUT).

Response: Updated TokenResponse with is_active: false.

GET /admin/tokens/{token_id}/plain

Retrieve the decrypted plain token value.

Response: { "token": "kbr_..." }

2.3 Model Management

GET /admin/models/aws-available

List available models that the proxy can actually invoke from the deployment region. The response is built from the inference profile cache (_ProfileCache), which is populated at startup and refreshed daily at 03:00 UTC via AWS APIs.

The list includes:

  • Inference profiles available in the deployment region (e.g. us.anthropic.claude-sonnet-4-6, global.anthropic.claude-sonnet-4-5-20250929-v1:0)
  • Foundation models available only in the fallback region (e.g. zai.glm-5, deepseek.v3.2) — marked with is_fallback: true
  • Gemini models (if GEMINI_API_KEY is configured) — appended dynamically

Only models actually callable from the deployment region are shown. For example, if deployed in us-west-1 where global.anthropic.claude-sonnet-4-20250514-v1:0 is not available, it will not appear in the list.

Results are cached in-memory for 12 hours.

Response:

{
  "models": [
    {
      "model_id": "us.anthropic.claude-sonnet-4-6",
      "model_name": "anthropic.claude-sonnet-4-6",
      "friendly_name": "anthropic.claude-sonnet-4-6",
      "provider": "bedrock-converse",
      "is_cross_region": true,
      "cross_region_type": "us",
      "streaming_supported": true,
      "is_fallback": false
    },
    {
      "model_id": "zai.glm-5",
      "model_name": "zai.glm-5",
      "friendly_name": "zai.glm-5",
      "provider": "bedrock-converse",
      "is_cross_region": false,
      "cross_region_type": null,
      "streaming_supported": true,
      "is_fallback": true
    }
  ]
}

GET /admin/models

List enabled models from the database.

Parameter In Type Description
token_id query string Optional -- filter by token UUID

Response:

{
  "models": [
    {
      "id": "uuid",
      "model_name": "claude-sonnet-4-5",
      "model_id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
      "friendly_name": "claude-sonnet-4-5",
      "provider": "bedrock",
      "streaming_supported": true,
      "is_active": true
    }
  ]
}

POST /admin/models

Add a model to a token's allowed list.

Request body:

{
  "token_id": "uuid",
  "model_name": "claude-sonnet-4-5"
}

Response: { "message": "Model claude-sonnet-4-5 added successfully", "id": "uuid" }

DELETE /admin/models/{model_id}

Soft-delete a model configuration.

Response: { "message": "Model claude-sonnet-4-5 deleted successfully" }

2.4 Usage Statistics

GET /admin/usage/stats

Get aggregated usage statistics for the current user.

Parameter In Type Description
start_date query datetime Optional custom range start
end_date query datetime Optional custom range end

Response:

{
  "current_month_cost": "12.34",
  "current_month_requests": 150,
  "current_month_tokens": 50000,
  "last_30_days_cost": "45.67",
  "last_30_days_requests": 500,
  "total_cost": "123.45",
  "total_requests": 2000
}

GET /admin/usage/by-token

Usage statistics grouped by API token.

Parameter In Type Description
token_id query string Optional filter
start_date query datetime Optional start
end_date query datetime Optional end

Response: Array of UsageByTokenResponse.

GET /admin/usage/by-model

Usage statistics grouped by model.

Parameter In Type Description
model query string Optional model filter
start_date query datetime Optional start
end_date query datetime Optional end

Response: Array of UsageByModelResponse.

GET /admin/usage/aggregated-stats

Time-series usage data.

Parameter In Type Required Description
start_date query datetime yes Range start
end_date query datetime yes Range end
granularity query string no hourly, daily (default), weekly, monthly
token_id query string no Filter by token

Response:

{
  "granularity": "daily",
  "start_date": "2026-01-01T00:00:00",
  "end_date": "2026-01-31T23:59:59",
  "data": [
    {
      "time_bucket": "2026-01-15",
      "call_count": 42,
      "total_prompt_tokens": 10000,
      "total_completion_tokens": 5000,
      "total_tokens": 15000,
      "total_cost": "3.21"
    }
  ]
}

GET /admin/usage/token-summary

Per-token usage summary for a time period.

Parameter In Type Required Description
start_date query datetime yes Range start
end_date query datetime yes Range end

Response: Array of TokenUsageSummary:

[
  {
    "token_id": "uuid",
    "token_name": "My Key",
    "call_count": 100,
    "total_tokens": 25000,
    "total_cost": "8.50"
  }
]

GET /admin/usage/tokens-timeseries

Multi-token time-series data for chart overlays.

Parameter In Type Required Description
start_date query datetime yes Range start
end_date query datetime yes Range end
token_ids query string yes Comma-separated token UUIDs
granularity query string no hourly, daily, weekly, monthly
metric query string no calls (default), tokens, cost

Response:

{
  "granularity": "daily",
  "metric": "calls",
  "series": [
    {
      "token_id": "uuid",
      "token_name": "Key A",
      "data": [
        { "time_bucket": "2026-01-15", "value": 42 }
      ]
    }
  ]
}

2.5 Audit Logs

GET /admin/audit-logs

List audit logs with pagination and filters.

Parameter In Type Default Description
page query integer 1 Page number (1-based)
page_size query integer 50 Items per page (max 200)
user_id query uuid null Filter by user
action query string null Filter by action type
success query boolean null Filter by outcome
start_date query datetime null From date
end_date query datetime null To date

Response:

{
  "items": [
    {
      "id": "uuid",
      "user_id": "uuid",
      "action": "login_success",
      "success": true,
      "details": null,
      "error_message": null,
      "ip_address": "1.2.3.4",
      "user_agent": "Mozilla/5.0...",
      "resource_type": null,
      "resource_id": null,
      "created_at": "2026-01-15T10:30:00"
    }
  ],
  "total": 150,
  "page": 1,
  "page_size": 50,
  "total_pages": 3
}

GET /admin/audit-logs/summary

Audit activity summary with counts by action type.

Parameter In Type Description
start_date query datetime Optional from date
end_date query datetime Optional to date

Response:

{
  "total": 500,
  "success_count": 480,
  "failure_count": 20,
  "action_counts": {
    "login_success": 200,
    "token_refresh_success": 150,
    "login_failed": 20,
    "logout_all_devices": 5
  }
}

GET /admin/audit-logs/activity

Activity feed visible to all admins. Shows only management operations (token/team/model/admin CRUD) from the last N days. Non-super_admin users cannot see actions performed by super_admins.

Parameter In Type Default Description
page query integer 1 Page number (1-based)
page_size query integer 50 Items per page (max 100)
days query integer 7 Lookback window (1-30 days)

Response:

{
  "items": [
    {
      "id": "uuid",
      "user_id": "uuid",
      "user_email": "admin@example.com",
      "action": "token_created",
      "resource_type": "token",
      "resource_id": "uuid",
      "details": "{\"name\": \"New Token\"}",
      "created_at": "2026-01-15T10:30:00"
    }
  ],
  "total": 25,
  "page": 1,
  "page_size": 50,
  "total_pages": 1
}

2.6 Admin User Management

Super admin only. Manage admin user accounts.

GET /admin/users

List all active admin users (super_admin and admin roles).

Response: Array of AdminUserResponse:

[
  {
    "id": "uuid",
    "email": "admin@example.com",
    "first_name": "John",
    "last_name": "Doe",
    "role": "admin",
    "permissions": {
      "manage_api_keys": "all", // pragma: allowlist secret
      "manage_teams": "all",
      "manage_models": "all",
      "view_usage": true,
      "view_monitor": true
    },
    "is_active": true,
    "created_at": "2026-01-01T00:00:00",
    "last_login_at": "2026-01-15T10:30:00"
  }
]

POST /admin/users

Invite a new admin. Creates a Cognito user with a temporary password and a local user record.

Request body:

{
  "email": "newadmin@example.com",
  "username": "newadmin",
  "temp_password": "TempPass123!", // pragma: allowlist secret
  "role": "admin",
  "permissions": {
    "manage_api_keys": "all", // pragma: allowlist secret
    "manage_teams": ["team-uuid-1"],
    "manage_models": true,
    "view_usage": true,
    "view_monitor": true
  }
}
Field Type Required Description
email string yes Admin email address
username string yes Cognito login username
temp_password string yes Temporary password (must change on first login)
role string no "super_admin" or "admin" (default: "admin")
permissions object no Permission scope (only for admin role)

Response (201): AdminUserResponse

PUT /admin/users/{user_id}

Update admin user role or permissions.

Request body (all fields optional):

{
  "role": "admin",
  "permissions": { "manage_api_keys": "all", "view_usage": true }, // pragma: allowlist secret
  "is_active": true
}

Response: Updated AdminUserResponse.

DELETE /admin/users/{user_id}

Deactivate an admin user. Cannot deactivate yourself. Returns 204 No Content.

GET /admin/users/resources

List all assignable resources (tokens, teams, models) for the permission editor UI.

Response:

{
  "api_keys": [{ "id": "uuid", "name": "Token Name" }],
  "teams": [{ "id": "uuid", "name": "Team Name" }],
  "models": [{ "id": "model-id", "name": "model-id" }]
}

2.7 Teams Management

Manage team budgets and member allocations. Requires manage_teams permission.

POST /admin/teams

Create a new team.

Request body:

{
  "name": "Engineering Team",
  "monthly_budget_usd": 500.00,
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true
}
Field Type Required Description
name string yes Team name
monthly_budget_usd decimal yes Monthly budget in USD
monthly_reset_policy string no "reset" (default) or "rollover"
daily_limit_enabled boolean no Enable daily spending limit per member (default: true)

Response (201):

{
  "id": "uuid",
  "name": "Engineering Team",
  "monthly_budget_usd": "500.00",
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true,
  "unallocated_pool_usd": "500.00"
}

GET /admin/teams

List all teams. Returns TeamListItem array with member counts and usage summaries.

GET /admin/teams/{team_id}

Get team dashboard with detailed member usage.

Response (TeamDashboardResponse):

{
  "id": "uuid",
  "name": "Engineering Team",
  "monthly_budget_usd": "500.00",
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true,
  "total_allocated_usd": "300.00",
  "total_used_usd": "45.67",
  "unallocated_pool_usd": "200.00",
  "members": [
    {
      "token_id": "uuid",
      "token_name": "Alice",
      "allocated_usd": "150.00",
      "used_usd": "23.45",
      "remaining_usd": "126.55",
      "daily_limit_usd": "4.84",
      "daily_used_usd": "1.20",
      "is_active": true,
      "last_used_at": "2026-01-15T10:30:00"
    }
  ]
}

PUT /admin/teams/{team_id}

Update team settings (name, budget, reset policy, daily limit).

DELETE /admin/teams/{team_id}

Delete a team. Returns 204 No Content.

POST /admin/teams/{team_id}/members

Add a member (token) to the team with a budget allocation.

Request body:

{
  "token_id": "uuid",
  "allocated_usd": 100.00
}

DELETE /admin/teams/{team_id}/members/{token_id}

Remove a member from the team.

PUT /admin/teams/{team_id}/members/{token_id}

Adjust a member's allocated budget.

Request body:

{
  "allocated_usd": 150.00
}

POST /admin/teams/{team_id}/transfer

Transfer allocation between members.

Request body:

{
  "from_token_id": "uuid",
  "to_token_id": "uuid",
  "amount": 25.00
}

POST /admin/teams/{team_id}/members/batch

Batch create members with new tokens and equal allocations.

Request body:

{
  "names": "alice, bob, charlie",
  "per_member_allocation": 50.00,
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": 100.00,
  "allowed_ips": null,
  "token_metadata": null,
  "model_names": ["claude-sonnet-4-5"]
}

3. Health API

No authentication required.

GET /health/

Basic health check for load balancer.

{ "status": "healthy", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }

GET /health/ready

Readiness probe. Verifies database connectivity. Returns 503 if unhealthy.

{
  "status": "ready",
  "timestamp": "2026-01-15T10:00:00",
  "service": "kolya-br-proxy",
  "components": {
    "database": { "status": "healthy", "message": "Connected" }
  }
}

GET /health/live

Liveness probe for Kubernetes.

{ "status": "alive", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }

GET /health/metrics

Basic metrics endpoint.

{
  "service": "kolya-br-proxy",
  "timestamp": "2026-01-15T10:00:00",
  "version": "1.0.0",
  "debug_mode": false,
  "metrics": {
    "health_checks_total": "counter",
    "requests_total": "counter",
    "request_duration_seconds": "histogram"
  }
}