API Reference

Kolya BR Proxy exposes four API groups:

Group	Prefix	Auth	Purpose
Gateway API (OpenAI)	`/v1`	`Authorization: Bearer`	OpenAI-compatible chat completions
Gateway API (Anthropic)	`/v1`	`x-api-key` header	Anthropic Messages API compatible
Admin API	`/admin`	Bearer JWT	User management, tokens, usage, audit
Health API	`/health`	None	Load balancer probes

Base URL examples:

Local: http://localhost:8000
Production: https://api.kbp.kolya.fun

1. Gateway API (OpenAI-compatible)

All Gateway endpoints require an API token in the Authorization header:

Authorization: Bearer kbr_<your_token>

POST /v1/chat/completions

Create a chat completion. Accepts OpenAI-format requests and proxies them to AWS Bedrock. Supports both Anthropic models (Claude) via InvokeModel API and non-Anthropic models (Amazon Nova, DeepSeek, Mistral, Llama, etc.) via Converse API.

Request body (ChatCompletionRequest):

Field	Type	Default	Description
`model`	string	required	Bedrock model ID (e.g. `global.anthropic.claude-sonnet-4-5-20250929-v1:0`, `us.amazon.nova-pro-v1:0`, `deepseek.r1-v1:0`)
`messages`	array	required	Array of `ChatMessage` objects
`stream`	boolean	`false`	Enable SSE streaming
`temperature`	float	`1.0`	Sampling temperature (0.0 - 2.0)
`top_p`	float	`1.0`	Nucleus sampling (0.0 - 1.0)
`max_tokens`	integer	null	Maximum tokens to generate
`stop`	string \| array	null	Stop sequence(s)
`n`	integer	`1`	Number of choices
`presence_penalty`	float	`0.0`	Presence penalty (-2.0 to 2.0)
`frequency_penalty`	float	`0.0`	Frequency penalty (-2.0 to 2.0)
`user`	string	null	End-user identifier
`tools`	array	null	Tool/function definitions
`tool_choice`	string \| object	null	Tool selection strategy

Bedrock extension fields (set via body or X-Bedrock-* headers):

Field	Header	Description
`bedrock_guardrail_config`	`X-Bedrock-Guardrail-Id` + `X-Bedrock-Guardrail-Version`	Guardrail configuration
`bedrock_additional_model_request_fields`	`X-Bedrock-Additional-Fields` (JSON)	Extra model request fields
`bedrock_trace`	`X-Bedrock-Trace`	Trace mode (`ENABLED` / `DISABLED`)
`bedrock_performance_config`	`X-Bedrock-Performance-Config` (JSON)	Performance tuning
`bedrock_prompt_caching`	`X-Bedrock-Prompt-Caching` (JSON)	Prompt caching config

Thinking and effort (via bedrock_additional_model_request_fields):

The gateway supports Anthropic's extended thinking and effort parameters. Pass them through bedrock_additional_model_request_fields:

{
  "bedrock_additional_model_request_fields": {
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "effort": "medium"
  }
}

The effort parameter (low / medium / high) controls how much thinking effort the model uses. The gateway automatically wraps it into output_config.effort and injects the required anthropic_beta flag. When thinking.budget_tokens is set, max_tokens is automatically adjusted to satisfy the max_tokens > budget_tokens constraint.

Headers override body fields when both are present.

ChatMessage schema:

Field	Type	Description
`role`	string	`system`, `user`, `assistant`, or `tool`
`content`	string \| array	Text string or array of `ContentPart` (multimodal)
`name`	string	Optional participant name
`tool_calls`	array	Tool calls (assistant messages)
`tool_call_id`	string	Tool call reference (tool messages)

ContentPart (for multimodal):

{ "type": "text", "text": "describe this image" }
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }

Non-streaming example

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "temperature": 0.7,
    "max_tokens": 2048
  }'

Response (ChatCompletionResponse):

{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Streaming example

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Streaming response (SSE text/event-stream):

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Heartbeat comments (: heartbeat) are sent every 15 seconds to keep the connection alive.

Tool use (function calling)

The gateway supports OpenAI-compatible tool use. Tool call deltas stream incrementally in the same format as OpenAI. The finish_reason is tool_calls when the model invokes a tool.

Error responses

{
  "error": {
    "message": "Token quota exceeded. Used: $5.00, Quota: $5.00",
    "type": "server_error",
    "code": "internal_error",
    "param": null
  }
}

Status	Meaning
400	Invalid request (bad model name, malformed body)
401	Missing or invalid API token
403	Token lacks access to requested model
429	Token quota exceeded
500	Internal server error

OpenAI SDK usage

from openai import OpenAI

client = OpenAI(
    api_key="kbr_your_token_here",  # pragma: allowlist secret
    base_url="http://localhost:8000/v1",
)

stream = client.chat.completions.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

1b. Gateway API (Anthropic Messages API Compatible)

All Anthropic-compatible endpoints require an API key in the x-api-key header:

x-api-key: kbr_<your_token>

The same kbr_ API key works for both OpenAI and Anthropic endpoints. The proxy routes based on the endpoint path and authentication header format.

POST /v1/messages

Create a message. Accepts Anthropic Messages API format requests and proxies them to AWS Bedrock. Supports thinking, tool use, prompt caching, and all Anthropic-native features.

Request body (AnthropicMessagesRequest):

Field	Type	Default	Description
`model`	string	required	Bedrock model ID or Anthropic short name (e.g. `global.anthropic.claude-sonnet-4-5-20250929-v1:0` or `claude-sonnet-4-5-20250929`). The proxy normalizes both formats for access control and routes using the Bedrock ID.
`messages`	array	required	Array of message objects (role: `user` or `assistant`)
`max_tokens`	integer	required	Maximum tokens to generate
`system`	string \| array	null	System prompt (string or array of content blocks with optional `cache_control`)
`temperature`	float	null	Sampling temperature (0.0 - 1.0)
`top_p`	float	null	Nucleus sampling (0.0 - 1.0)
`top_k`	integer	null	Top-K sampling
`stop_sequences`	array	null	Stop sequences
`stream`	boolean	`false`	Enable SSE streaming
`tools`	array	null	Tool definitions (Anthropic format)
`tool_choice`	object	null	Tool selection strategy
`metadata`	object	null	Request metadata
`thinking`	object	null	Extended thinking config (`{"type": "enabled"/"adaptive"/"disabled", "budget_tokens": N}`)

Message content blocks:

{"type": "text", "text": "Hello!"}
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}
{"type": "tool_use", "id": "toolu_...", "name": "get_weather", "input": {"city": "London"}}
{"type": "tool_result", "tool_use_id": "toolu_...", "content": "Sunny, 22C"}

Non-streaming example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response (AnthropicMessagesResponse):

{
  "id": "msg_abc123...",
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Hello! How can I help you?"}
  ],
  "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 10,
    "output_tokens": 8
  }
}

Streaming example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Streaming response (SSE text/event-stream, Anthropic format):

event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"...","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":10,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":8}}

event: message_stop
data: {"type":"message_stop"}

Extended thinking example

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: kbr_your_token_here" \
  -d '{
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "max_tokens": 8192,
    "thinking": {"type": "enabled", "budget_tokens": 4096},
    "messages": [{"role": "user", "content": "Solve this step by step: 15 * 27 + 33"}]
  }'

The proxy supports three thinking types:

"enabled" - Full extended thinking with visible thinking blocks
"adaptive" - Adaptive thinking (signature-only mode)
"disabled" - No extended thinking

Note: When sending conversation history that includes thinking or redacted_thinking content blocks, the proxy automatically strips these blocks before forwarding to Bedrock, since Bedrock doesn't support adaptive signature-only thinking blocks in conversation history.

Claude Code CLI compatibility

All tokens are created with the sk-ant-api03 prefix by default, making them compatible with Claude Code CLI and the Anthropic SDK:

curl -X POST http://localhost:8000/admin/tokens \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <jwt_token>" \
  -d '{
    "name": "Claude Code Token",
    "quota_usd": 50.00
  }'

Then configure your environment:

export ANTHROPIC_BASE_URL="https://your-api-domain"
export ANTHROPIC_API_KEY="sk-ant-api03_xxxxxxx"  # pragma: allowlist secret
export CLAUDE_MODEL="us.anthropic.claude-sonnet-4-5-20250514-v1:0"

For persistent configuration, add to ~/.claude/settings.json:

{
  "model": "us.anthropic.claude-sonnet-4-5-20250514-v1:0",
  "smallModel": "your-haiku-model-id",
  "largeModel": "your-opus-model-id"
}

Error responses (Anthropic format)

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid or expired API key"
  }
}

Status	Error Type	Meaning
400	`invalid_request_error`	Invalid request (bad model name, malformed body)
401	`authentication_error`	Missing or invalid API key
403	`permission_error`	Token lacks access to requested model
429	`rate_limit_error`	Token quota exceeded
500	`api_error`	Internal server error

Anthropic SDK usage

import anthropic

client = anthropic.Anthropic(
    api_key="kbr_your_token_here",  # pragma: allowlist secret
    base_url="http://localhost:8000/v1",
)

message = client.messages.create(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

# Streaming
with client.messages.stream(
    model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

GET /v1/models

List models the current token has access to. Returns OpenAI-compatible model list.

Response:

{
  "object": "list",
  "data": [
    {
      "id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
      "object": "model",
      "created": 1700000000,
      "owned_by": "anthropic"
    }
  ]
}

2. Admin API

All Admin endpoints (except OAuth login URLs) require a JWT access token:

Authorization: Bearer <jwt_access_token>

Authorization (RBAC)

The Admin API uses role-based access control with two roles:

Role	Access
`super_admin`	Full access to all endpoints and resources
`admin`	Scoped access controlled by `permissions` object

Admin users have a permissions JSON object that controls what they can manage:

Permission	Values	Controls
`manage_api_keys`	`true`/`"all"`, `[id, ...]`, `false`	API token CRUD
`manage_teams`	`true`/`"all"`, `[id, ...]`, `false`	Team CRUD
`manage_models`	`true`/`"all"`, `[id, ...]`, `false`	Model configuration
`view_usage`	`true`/`false`	Usage statistics
`view_monitor`	`true`/`false`	Request monitor

true or "all": full access to all resources of that type
Array of UUIDs: access only to those specific resources
false or missing: no access (403)

Endpoints are guarded by permission requirements:

Endpoint Group	Required Permission
`/admin/tokens/*`	`manage_api_keys`
`/admin/teams/*`	`manage_teams`
`/admin/models/*`	`manage_models`
`/admin/usage/*`	`view_usage`
`/admin/monitor/*`	`view_monitor`
`/admin/users/*`	`super_admin` role only
`/admin/audit-logs`	`super_admin` role only
`/admin/audit-logs/activity`	Any admin

2.1 Authentication

GET /admin/auth/microsoft/login

Get Microsoft OAuth authorization URL.

Parameter	In	Type	Description
`redirect_uri`	query	string	Redirect URI after authorization

Response:

{
  "authorization_url": "https://login.microsoftonline.com/.../authorize?...",
  "state": "random_csrf_state"
}

POST /admin/auth/microsoft/callback

Handle Microsoft OAuth callback. Creates or links user account.

Parameter	In	Type	Description
`code`	query	string	Authorization code from Microsoft
`redirect_uri`	query	string	Redirect URI used in authorization
`state`	query	string	State parameter for CSRF protection

Response (LoginResponse):

{
  "access_token": "eyJ...",
  "refresh_token": "eyJ...",
  "token_type": "bearer",
  "user": {
    "id": "uuid",
    "email": "user@example.com",
    "first_name": "John",
    "last_name": "Doe",
    "is_active": true,
    "is_admin": true,
    "role": "admin",
    "permissions": {
      "manage_api_keys": "all", // pragma: allowlist secret
      "manage_teams": ["uuid1", "uuid2"],
      "manage_models": true,
      "view_usage": true,
      "view_monitor": true
    },
    "email_verified": true,
    "current_balance": "5.00"
  }
}

GET /admin/auth/cognito/login

Get AWS Cognito OAuth authorization URL.

Parameter	In	Type	Description
`redirect_uri`	query	string	Redirect URI after authorization

Response: Same structure as Microsoft login.

POST /admin/auth/cognito/callback

Handle AWS Cognito OAuth callback. Creates or links user account.

Parameter	In	Type	Description
`code`	query	string	Authorization code from Cognito
`redirect_uri`	query	string	Redirect URI used in authorization
`state`	query	string	State parameter for CSRF protection

Response: Same LoginResponse structure as Microsoft callback.

POST /admin/auth/refresh

Refresh access token using refresh token (with automatic rotation).

Request body:

{ "refresh_token": "eyJ..." }

Response: LoginResponse with new access_token and refresh_token.

Old refresh token is invalidated after rotation. Token reuse triggers theft detection.

POST /admin/auth/revoke

Revoke a specific refresh token.

Request body:

{ "refresh_token": "eyJ..." }

Response: { "message": "Refresh token revoked successfully" }

POST /admin/auth/revoke-all

Revoke all refresh tokens for the current user (logout from all devices). Requires JWT.

Response: { "message": "Revoked N refresh tokens successfully" }

GET /admin/auth/me

Get current user information.

Response: UserResponse object (see Microsoft callback response).

PUT /admin/auth/me

Update current user profile.

Request body:

{ "first_name": "Jane", "last_name": "Smith" }

Response: Updated UserResponse.

2.2 Token Management

POST /admin/tokens

Create a new API token. The token prefix is hardcoded as sk-ant-api03 and cannot be configured.

Request body:

{
  "name": "My API Key",
  "description": "Development token for team Alpha",
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": 100.00,
  "monthly_quota_usd": 20.00,
  "monthly_reset_policy": "reset",
  "allowed_ips": ["192.168.1.0/24"],
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}

Field	Type	Required	Description
`name`	string	yes	Token display name
`description`	string	no	Token description
`expires_at`	datetime	no	Expiration timestamp
`quota_usd`	decimal	no	Total usage quota in USD
`monthly_quota_usd`	decimal	no	Monthly usage quota in USD
`monthly_reset_policy`	string	no	`"reset"` (default) or `"rollover"`
`allowed_ips`	array	no	IP allowlist (CIDR)
`token_metadata`	object	no	Additional config (e.g. `prompt_cache_enabled`, `prompt_cache_ttl`)

Response (201): TokenWithKeyResponse -- includes the plain token value. This is the only time the plain token is returned.

{
  "id": "uuid",
  "name": "My API Key",
  "description": "Development token for team Alpha",
  "token": "sk-ant-api03_abc123...",
  "key_prefix": "sk-ant-api03",
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": "100.00",
  "monthly_quota_usd": "20.00",
  "monthly_reset_policy": "reset",
  "used_usd": "0.00",
  "monthly_used_usd": "0.00",
  "daily_used_usd": "0.00",
  "remaining_quota": "100.00",
  "allowed_ips": ["192.168.1.0/24"],
  "allowed_models": [],
  "is_active": true,
  "is_expired": false,
  "is_quota_exceeded": false,
  "created_at": "2026-01-01T00:00:00",
  "last_used_at": null,
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}

GET /admin/tokens

List all tokens for the current user.

Parameter	In	Type	Default	Description
`include_inactive`	query	boolean	`false`	Include inactive/revoked tokens

Response: Array of TokenResponse.

GET /admin/tokens/{token_id}

Get token details by UUID.

PUT /admin/tokens/{token_id}

Update token settings.

Request body (all fields optional):

{
  "name": "Renamed Key",
  "description": "Updated description",
  "expires_at": "2027-06-30T00:00:00",
  "quota_usd": 200.00,
  "monthly_quota_usd": 50.00,
  "monthly_reset_policy": "rollover",
  "allowed_ips": ["10.0.0.0/8"],
  "is_active": true,
  "token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "1h"}
}

DELETE /admin/tokens/{token_id}

Soft-delete a token. Sets is_deleted = true, is_active = false, and invalidates the Redis cache. The token is immediately blocked from API access. Historical usage data is preserved. Returns 204 No Content.

POST /admin/tokens/{token_id}/revoke

Deactivate a token (can be reactivated via PUT).

Response: Updated TokenResponse with is_active: false.

GET /admin/tokens/{token_id}/plain

Retrieve the decrypted plain token value.

Response: { "token": "kbr_..." }

2.3 Model Management

GET /admin/models/aws-available

List available models that the proxy can actually invoke from the deployment region. The response is built from the inference profile cache (_ProfileCache), which is populated at startup and refreshed daily at 03:00 UTC via AWS APIs.

The list includes:

Inference profiles available in the deployment region (e.g. us.anthropic.claude-sonnet-4-6, global.anthropic.claude-sonnet-4-5-20250929-v1:0)
Foundation models available only in the fallback region (e.g. zai.glm-5, deepseek.v3.2) — marked with is_fallback: true
Gemini models (if GEMINI_API_KEY is configured) — appended dynamically

Only models actually callable from the deployment region are shown. For example, if deployed in us-west-1 where global.anthropic.claude-sonnet-4-20250514-v1:0 is not available, it will not appear in the list.

Results are cached in-memory for 12 hours.

Response:

{
  "models": [
    {
      "model_id": "us.anthropic.claude-sonnet-4-6",
      "model_name": "anthropic.claude-sonnet-4-6",
      "friendly_name": "anthropic.claude-sonnet-4-6",
      "provider": "bedrock-converse",
      "is_cross_region": true,
      "cross_region_type": "us",
      "streaming_supported": true,
      "is_fallback": false
    },
    {
      "model_id": "zai.glm-5",
      "model_name": "zai.glm-5",
      "friendly_name": "zai.glm-5",
      "provider": "bedrock-converse",
      "is_cross_region": false,
      "cross_region_type": null,
      "streaming_supported": true,
      "is_fallback": true
    }
  ]
}

GET /admin/models

List enabled models from the database.

Parameter	In	Type	Description
`token_id`	query	string	Optional -- filter by token UUID

Response:

{
  "models": [
    {
      "id": "uuid",
      "model_name": "claude-sonnet-4-5",
      "model_id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
      "friendly_name": "claude-sonnet-4-5",
      "provider": "bedrock",
      "streaming_supported": true,
      "is_active": true
    }
  ]
}

POST /admin/models

Add a model to a token's allowed list.

Request body:

{
  "token_id": "uuid",
  "model_name": "claude-sonnet-4-5"
}

Response: { "message": "Model claude-sonnet-4-5 added successfully", "id": "uuid" }

DELETE /admin/models/{model_id}

Soft-delete a model configuration.

Response: { "message": "Model claude-sonnet-4-5 deleted successfully" }

2.4 Usage Statistics

GET /admin/usage/stats

Get aggregated usage statistics for the current user.

Parameter	In	Type	Description
`start_date`	query	datetime	Optional custom range start
`end_date`	query	datetime	Optional custom range end

Response:

{
  "current_month_cost": "12.34",
  "current_month_requests": 150,
  "current_month_tokens": 50000,
  "last_30_days_cost": "45.67",
  "last_30_days_requests": 500,
  "total_cost": "123.45",
  "total_requests": 2000
}

GET /admin/usage/by-token

Usage statistics grouped by API token.

Parameter	In	Type	Description
`token_id`	query	string	Optional filter
`start_date`	query	datetime	Optional start
`end_date`	query	datetime	Optional end

Response: Array of UsageByTokenResponse.

GET /admin/usage/by-model

Usage statistics grouped by model.

Parameter	In	Type	Description
`model`	query	string	Optional model filter
`start_date`	query	datetime	Optional start
`end_date`	query	datetime	Optional end

Response: Array of UsageByModelResponse.

GET /admin/usage/aggregated-stats

Time-series usage data.

Parameter	In	Type	Required	Description
`start_date`	query	datetime	yes	Range start
`end_date`	query	datetime	yes	Range end
`granularity`	query	string	no	`hourly`, `daily` (default), `weekly`, `monthly`
`token_id`	query	string	no	Filter by token

Response:

{
  "granularity": "daily",
  "start_date": "2026-01-01T00:00:00",
  "end_date": "2026-01-31T23:59:59",
  "data": [
    {
      "time_bucket": "2026-01-15",
      "call_count": 42,
      "total_prompt_tokens": 10000,
      "total_completion_tokens": 5000,
      "total_tokens": 15000,
      "total_cost": "3.21"
    }
  ]
}

GET /admin/usage/token-summary

Per-token usage summary for a time period.

Parameter	In	Type	Required	Description
`start_date`	query	datetime	yes	Range start
`end_date`	query	datetime	yes	Range end

Response: Array of TokenUsageSummary:

[
  {
    "token_id": "uuid",
    "token_name": "My Key",
    "call_count": 100,
    "total_tokens": 25000,
    "total_cost": "8.50"
  }
]

GET /admin/usage/tokens-timeseries

Multi-token time-series data for chart overlays.

Parameter	In	Type	Required	Description
`start_date`	query	datetime	yes	Range start
`end_date`	query	datetime	yes	Range end
`token_ids`	query	string	yes	Comma-separated token UUIDs
`granularity`	query	string	no	`hourly`, `daily`, `weekly`, `monthly`
`metric`	query	string	no	`calls` (default), `tokens`, `cost`

Response:

{
  "granularity": "daily",
  "metric": "calls",
  "series": [
    {
      "token_id": "uuid",
      "token_name": "Key A",
      "data": [
        { "time_bucket": "2026-01-15", "value": 42 }
      ]
    }
  ]
}

2.5 Audit Logs

GET /admin/audit-logs

List audit logs with pagination and filters.

Parameter	In	Type	Default	Description
`page`	query	integer	`1`	Page number (1-based)
`page_size`	query	integer	`50`	Items per page (max 200)
`user_id`	query	uuid	null	Filter by user
`action`	query	string	null	Filter by action type
`success`	query	boolean	null	Filter by outcome
`start_date`	query	datetime	null	From date
`end_date`	query	datetime	null	To date

Response:

{
  "items": [
    {
      "id": "uuid",
      "user_id": "uuid",
      "action": "login_success",
      "success": true,
      "details": null,
      "error_message": null,
      "ip_address": "1.2.3.4",
      "user_agent": "Mozilla/5.0...",
      "resource_type": null,
      "resource_id": null,
      "created_at": "2026-01-15T10:30:00"
    }
  ],
  "total": 150,
  "page": 1,
  "page_size": 50,
  "total_pages": 3
}

GET /admin/audit-logs/summary

Audit activity summary with counts by action type.

Parameter	In	Type	Description
`start_date`	query	datetime	Optional from date
`end_date`	query	datetime	Optional to date

Response:

{
  "total": 500,
  "success_count": 480,
  "failure_count": 20,
  "action_counts": {
    "login_success": 200,
    "token_refresh_success": 150,
    "login_failed": 20,
    "logout_all_devices": 5
  }
}

GET /admin/audit-logs/activity

Activity feed visible to all admins. Shows only management operations (token/team/model/admin CRUD) from the last N days. Non-super_admin users cannot see actions performed by super_admins.

Parameter	In	Type	Default	Description
`page`	query	integer	`1`	Page number (1-based)
`page_size`	query	integer	`50`	Items per page (max 100)
`days`	query	integer	`7`	Lookback window (1-30 days)

Response:

{
  "items": [
    {
      "id": "uuid",
      "user_id": "uuid",
      "user_email": "admin@example.com",
      "action": "token_created",
      "resource_type": "token",
      "resource_id": "uuid",
      "details": "{\"name\": \"New Token\"}",
      "created_at": "2026-01-15T10:30:00"
    }
  ],
  "total": 25,
  "page": 1,
  "page_size": 50,
  "total_pages": 1
}

2.6 Admin User Management

Super admin only. Manage admin user accounts.

GET /admin/users

List all active admin users (super_admin and admin roles).

Response: Array of AdminUserResponse:

[
  {
    "id": "uuid",
    "email": "admin@example.com",
    "first_name": "John",
    "last_name": "Doe",
    "role": "admin",
    "permissions": {
      "manage_api_keys": "all", // pragma: allowlist secret
      "manage_teams": "all",
      "manage_models": "all",
      "view_usage": true,
      "view_monitor": true
    },
    "is_active": true,
    "created_at": "2026-01-01T00:00:00",
    "last_login_at": "2026-01-15T10:30:00"
  }
]

POST /admin/users

Invite a new admin. Creates a Cognito user with a temporary password and a local user record.

Request body:

{
  "email": "newadmin@example.com",
  "username": "newadmin",
  "temp_password": "TempPass123!", // pragma: allowlist secret
  "role": "admin",
  "permissions": {
    "manage_api_keys": "all", // pragma: allowlist secret
    "manage_teams": ["team-uuid-1"],
    "manage_models": true,
    "view_usage": true,
    "view_monitor": true
  }
}

Field	Type	Required	Description
`email`	string	yes	Admin email address
`username`	string	yes	Cognito login username
`temp_password`	string	yes	Temporary password (must change on first login)
`role`	string	no	`"super_admin"` or `"admin"` (default: `"admin"`)
`permissions`	object	no	Permission scope (only for `admin` role)

Response (201): AdminUserResponse

PUT /admin/users/{user_id}

Update admin user role or permissions.

Request body (all fields optional):

{
  "role": "admin",
  "permissions": { "manage_api_keys": "all", "view_usage": true }, // pragma: allowlist secret
  "is_active": true
}

Response: Updated AdminUserResponse.

DELETE /admin/users/{user_id}

Deactivate an admin user. Cannot deactivate yourself. Returns 204 No Content.

GET /admin/users/resources

List all assignable resources (tokens, teams, models) for the permission editor UI.

Response:

{
  "api_keys": [{ "id": "uuid", "name": "Token Name" }],
  "teams": [{ "id": "uuid", "name": "Team Name" }],
  "models": [{ "id": "model-id", "name": "model-id" }]
}

2.7 Teams Management

Manage team budgets and member allocations. Requires manage_teams permission.

POST /admin/teams

Create a new team.

Request body:

{
  "name": "Engineering Team",
  "monthly_budget_usd": 500.00,
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true
}

Field	Type	Required	Description
`name`	string	yes	Team name
`monthly_budget_usd`	decimal	yes	Monthly budget in USD
`monthly_reset_policy`	string	no	`"reset"` (default) or `"rollover"`
`daily_limit_enabled`	boolean	no	Enable daily spending limit per member (default: `true`)

Response (201):

{
  "id": "uuid",
  "name": "Engineering Team",
  "monthly_budget_usd": "500.00",
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true,
  "unallocated_pool_usd": "500.00"
}

GET /admin/teams

List all teams. Returns TeamListItem array with member counts and usage summaries.

GET /admin/teams/{team_id}

Get team dashboard with detailed member usage.

Response (TeamDashboardResponse):

{
  "id": "uuid",
  "name": "Engineering Team",
  "monthly_budget_usd": "500.00",
  "monthly_reset_policy": "reset",
  "daily_limit_enabled": true,
  "total_allocated_usd": "300.00",
  "total_used_usd": "45.67",
  "unallocated_pool_usd": "200.00",
  "members": [
    {
      "token_id": "uuid",
      "token_name": "Alice",
      "allocated_usd": "150.00",
      "used_usd": "23.45",
      "remaining_usd": "126.55",
      "daily_limit_usd": "4.84",
      "daily_used_usd": "1.20",
      "is_active": true,
      "last_used_at": "2026-01-15T10:30:00"
    }
  ]
}

PUT /admin/teams/{team_id}

Update team settings (name, budget, reset policy, daily limit).

DELETE /admin/teams/{team_id}

Delete a team. Returns 204 No Content.

POST /admin/teams/{team_id}/members

Add a member (token) to the team with a budget allocation.

Request body:

{
  "token_id": "uuid",
  "allocated_usd": 100.00
}

DELETE /admin/teams/{team_id}/members/{token_id}

Remove a member from the team.

PUT /admin/teams/{team_id}/members/{token_id}

Adjust a member's allocated budget.

Request body:

{
  "allocated_usd": 150.00
}

POST /admin/teams/{team_id}/transfer

Transfer allocation between members.

Request body:

{
  "from_token_id": "uuid",
  "to_token_id": "uuid",
  "amount": 25.00
}

POST /admin/teams/{team_id}/members/batch

Batch create members with new tokens and equal allocations.

Request body:

{
  "names": "alice, bob, charlie",
  "per_member_allocation": 50.00,
  "expires_at": "2026-12-31T23:59:59",
  "quota_usd": 100.00,
  "allowed_ips": null,
  "token_metadata": null,
  "model_names": ["claude-sonnet-4-5"]
}

3. Health API

No authentication required.

GET /health/

Basic health check for load balancer.

{ "status": "healthy", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }

GET /health/ready

Readiness probe. Verifies database connectivity. Returns 503 if unhealthy.

{
  "status": "ready",
  "timestamp": "2026-01-15T10:00:00",
  "service": "kolya-br-proxy",
  "components": {
    "database": { "status": "healthy", "message": "Connected" }
  }
}

GET /health/live

Liveness probe for Kubernetes.

{ "status": "alive", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }

GET /health/metrics

Basic metrics endpoint.

{
  "service": "kolya-br-proxy",
  "timestamp": "2026-01-15T10:00:00",
  "version": "1.0.0",
  "debug_mode": false,
  "metrics": {
    "health_checks_total": "counter",
    "requests_total": "counter",
    "request_duration_seconds": "histogram"
  }
}

FilesExpand file tree

api-reference.md

Latest commit

History

api-reference.md

File metadata and controls

API Reference

1. Gateway API (OpenAI-compatible)

POST /v1/chat/completions

Non-streaming example

Streaming example

Tool use (function calling)

Error responses

OpenAI SDK usage

1b. Gateway API (Anthropic Messages API Compatible)

POST /v1/messages

Non-streaming example

Streaming example

Extended thinking example

Claude Code CLI compatibility

Error responses (Anthropic format)

Anthropic SDK usage

GET /v1/models

2. Admin API

Authorization (RBAC)

2.1 Authentication

GET /admin/auth/microsoft/login

POST /admin/auth/microsoft/callback

GET /admin/auth/cognito/login

POST /admin/auth/cognito/callback

POST /admin/auth/refresh

POST /admin/auth/revoke

POST /admin/auth/revoke-all

GET /admin/auth/me

PUT /admin/auth/me

2.2 Token Management

POST /admin/tokens

GET /admin/tokens

GET /admin/tokens/{token_id}

PUT /admin/tokens/{token_id}

DELETE /admin/tokens/{token_id}

POST /admin/tokens/{token_id}/revoke

GET /admin/tokens/{token_id}/plain

2.3 Model Management

GET /admin/models/aws-available

GET /admin/models

POST /admin/models

DELETE /admin/models/{model_id}

2.4 Usage Statistics

GET /admin/usage/stats

GET /admin/usage/by-token

GET /admin/usage/by-model

GET /admin/usage/aggregated-stats

GET /admin/usage/token-summary

GET /admin/usage/tokens-timeseries

2.5 Audit Logs

GET /admin/audit-logs

GET /admin/audit-logs/summary

GET /admin/audit-logs/activity

2.6 Admin User Management

GET /admin/users

POST /admin/users

PUT /admin/users/{user_id}

DELETE /admin/users/{user_id}

GET /admin/users/resources

2.7 Teams Management

POST /admin/teams

GET /admin/teams

GET /admin/teams/{team_id}

PUT /admin/teams/{team_id}

DELETE /admin/teams/{team_id}

POST /admin/teams/{team_id}/members

DELETE /admin/teams/{team_id}/members/{token_id}

PUT /admin/teams/{team_id}/members/{token_id}

POST /admin/teams/{team_id}/transfer

POST /admin/teams/{team_id}/members/batch

3. Health API

GET /health/

GET /health/ready

GET /health/live