Kolya BR Proxy exposes four API groups:
| Group | Prefix | Auth | Purpose |
|---|---|---|---|
| Gateway API (OpenAI) | /v1 |
Authorization: Bearer |
OpenAI-compatible chat completions |
| Gateway API (Anthropic) | /v1 |
x-api-key header |
Anthropic Messages API compatible |
| Admin API | /admin |
Bearer JWT | User management, tokens, usage, audit |
| Health API | /health |
None | Load balancer probes |
Base URL examples:
- Local:
http://localhost:8000 - Production:
https://api.kbp.kolya.fun
All Gateway endpoints require an API token in the Authorization header:
Authorization: Bearer kbr_<your_token>
Create a chat completion. Accepts OpenAI-format requests and proxies them to AWS Bedrock. Supports both Anthropic models (Claude) via InvokeModel API and non-Anthropic models (Amazon Nova, DeepSeek, Mistral, Llama, etc.) via Converse API.
Request body (ChatCompletionRequest):
| Field | Type | Default | Description |
|---|---|---|---|
model |
string | required | Bedrock model ID (e.g. global.anthropic.claude-sonnet-4-5-20250929-v1:0, us.amazon.nova-pro-v1:0, deepseek.r1-v1:0) |
messages |
array | required | Array of ChatMessage objects |
stream |
boolean | false |
Enable SSE streaming |
temperature |
float | 1.0 |
Sampling temperature (0.0 - 2.0) |
top_p |
float | 1.0 |
Nucleus sampling (0.0 - 1.0) |
max_tokens |
integer | null | Maximum tokens to generate |
stop |
string | array | null | Stop sequence(s) |
n |
integer | 1 |
Number of choices |
presence_penalty |
float | 0.0 |
Presence penalty (-2.0 to 2.0) |
frequency_penalty |
float | 0.0 |
Frequency penalty (-2.0 to 2.0) |
user |
string | null | End-user identifier |
tools |
array | null | Tool/function definitions |
tool_choice |
string | object | null | Tool selection strategy |
Bedrock extension fields (set via body or X-Bedrock-* headers):
| Field | Header | Description |
|---|---|---|
bedrock_guardrail_config |
X-Bedrock-Guardrail-Id + X-Bedrock-Guardrail-Version |
Guardrail configuration |
bedrock_additional_model_request_fields |
X-Bedrock-Additional-Fields (JSON) |
Extra model request fields |
bedrock_trace |
X-Bedrock-Trace |
Trace mode (ENABLED / DISABLED) |
bedrock_performance_config |
X-Bedrock-Performance-Config (JSON) |
Performance tuning |
bedrock_prompt_caching |
X-Bedrock-Prompt-Caching (JSON) |
Prompt caching config |
Thinking and effort (via bedrock_additional_model_request_fields):
The gateway supports Anthropic's extended thinking and effort parameters. Pass them through bedrock_additional_model_request_fields:
{
"bedrock_additional_model_request_fields": {
"thinking": {"type": "enabled", "budget_tokens": 5000},
"effort": "medium"
}
}The effort parameter (low / medium / high) controls how much thinking effort the model uses. The gateway automatically wraps it into output_config.effort and injects the required anthropic_beta flag. When thinking.budget_tokens is set, max_tokens is automatically adjusted to satisfy the max_tokens > budget_tokens constraint.
Headers override body fields when both are present.
ChatMessage schema:
| Field | Type | Description |
|---|---|---|
role |
string | system, user, assistant, or tool |
content |
string | array | Text string or array of ContentPart (multimodal) |
name |
string | Optional participant name |
tool_calls |
array | Tool calls (assistant messages) |
tool_call_id |
string | Tool call reference (tool messages) |
ContentPart (for multimodal):
{ "type": "text", "text": "describe this image" }
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer kbr_your_token_here" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [
{"role": "user", "content": "Hello!"}
],
"temperature": 0.7,
"max_tokens": 2048
}'Response (ChatCompletionResponse):
{
"id": "chatcmpl-abc123...",
"object": "chat.completion",
"created": 1700000000,
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer kbr_your_token_here" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Streaming response (SSE text/event-stream):
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","created":1700000000,"model":"...","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Heartbeat comments (: heartbeat) are sent every 15 seconds to keep the connection alive.
The gateway supports OpenAI-compatible tool use. Tool call deltas stream incrementally in the same format as OpenAI. The finish_reason is tool_calls when the model invokes a tool.
{
"error": {
"message": "Token quota exceeded. Used: $5.00, Quota: $5.00",
"type": "server_error",
"code": "internal_error",
"param": null
}
}| Status | Meaning |
|---|---|
| 400 | Invalid request (bad model name, malformed body) |
| 401 | Missing or invalid API token |
| 403 | Token lacks access to requested model |
| 429 | Token quota exceeded |
| 500 | Internal server error |
from openai import OpenAI
client = OpenAI(
api_key="kbr_your_token_here", # pragma: allowlist secret
base_url="http://localhost:8000/v1",
)
stream = client.chat.completions.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)All Anthropic-compatible endpoints require an API key in the x-api-key header:
x-api-key: kbr_<your_token>
The same
kbr_API key works for both OpenAI and Anthropic endpoints. The proxy routes based on the endpoint path and authentication header format.
Create a message. Accepts Anthropic Messages API format requests and proxies them to AWS Bedrock. Supports thinking, tool use, prompt caching, and all Anthropic-native features.
Request body (AnthropicMessagesRequest):
| Field | Type | Default | Description |
|---|---|---|---|
model |
string | required | Bedrock model ID or Anthropic short name (e.g. global.anthropic.claude-sonnet-4-5-20250929-v1:0 or claude-sonnet-4-5-20250929). The proxy normalizes both formats for access control and routes using the Bedrock ID. |
messages |
array | required | Array of message objects (role: user or assistant) |
max_tokens |
integer | required | Maximum tokens to generate |
system |
string | array | null | System prompt (string or array of content blocks with optional cache_control) |
temperature |
float | null | Sampling temperature (0.0 - 1.0) |
top_p |
float | null | Nucleus sampling (0.0 - 1.0) |
top_k |
integer | null | Top-K sampling |
stop_sequences |
array | null | Stop sequences |
stream |
boolean | false |
Enable SSE streaming |
tools |
array | null | Tool definitions (Anthropic format) |
tool_choice |
object | null | Tool selection strategy |
metadata |
object | null | Request metadata |
thinking |
object | null | Extended thinking config ({"type": "enabled"/"adaptive"/"disabled", "budget_tokens": N}) |
Message content blocks:
{"type": "text", "text": "Hello!"}
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}}
{"type": "tool_use", "id": "toolu_...", "name": "get_weather", "input": {"city": "London"}}
{"type": "tool_result", "tool_use_id": "toolu_...", "content": "Sunny, 22C"}curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: kbr_your_token_here" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'Response (AnthropicMessagesResponse):
{
"id": "msg_abc123...",
"type": "message",
"role": "assistant",
"content": [
{"type": "text", "text": "Hello! How can I help you?"}
],
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 10,
"output_tokens": 8
}
}curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: kbr_your_token_here" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Streaming response (SSE text/event-stream, Anthropic format):
event: message_start
data: {"type":"message_start","message":{"id":"msg_...","type":"message","role":"assistant","content":[],"model":"...","stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":10,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":8}}
event: message_stop
data: {"type":"message_stop"}
curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: kbr_your_token_here" \
-d '{
"model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"max_tokens": 8192,
"thinking": {"type": "enabled", "budget_tokens": 4096},
"messages": [{"role": "user", "content": "Solve this step by step: 15 * 27 + 33"}]
}'The proxy supports three thinking types:
"enabled"- Full extended thinking with visible thinking blocks"adaptive"- Adaptive thinking (signature-only mode)"disabled"- No extended thinking
Note: When sending conversation history that includes
thinkingorredacted_thinkingcontent blocks, the proxy automatically strips these blocks before forwarding to Bedrock, since Bedrock doesn't support adaptive signature-only thinking blocks in conversation history.
All tokens are created with the sk-ant-api03 prefix by default, making them compatible with Claude Code CLI and the Anthropic SDK:
curl -X POST http://localhost:8000/admin/tokens \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <jwt_token>" \
-d '{
"name": "Claude Code Token",
"quota_usd": 50.00
}'Then configure your environment:
export ANTHROPIC_BASE_URL="https://your-api-domain"
export ANTHROPIC_API_KEY="sk-ant-api03_xxxxxxx" # pragma: allowlist secret
export CLAUDE_MODEL="us.anthropic.claude-sonnet-4-5-20250514-v1:0"For persistent configuration, add to ~/.claude/settings.json:
{
"model": "us.anthropic.claude-sonnet-4-5-20250514-v1:0",
"smallModel": "your-haiku-model-id",
"largeModel": "your-opus-model-id"
}{
"type": "error",
"error": {
"type": "authentication_error",
"message": "Invalid or expired API key"
}
}| Status | Error Type | Meaning |
|---|---|---|
| 400 | invalid_request_error |
Invalid request (bad model name, malformed body) |
| 401 | authentication_error |
Missing or invalid API key |
| 403 | permission_error |
Token lacks access to requested model |
| 429 | rate_limit_error |
Token quota exceeded |
| 500 | api_error |
Internal server error |
import anthropic
client = anthropic.Anthropic(
api_key="kbr_your_token_here", # pragma: allowlist secret
base_url="http://localhost:8000/v1",
)
message = client.messages.create(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)
# Streaming
with client.messages.stream(
model="global.anthropic.claude-sonnet-4-5-20250929-v1:0",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)List models the current token has access to. Returns OpenAI-compatible model list.
Response:
{
"object": "list",
"data": [
{
"id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"object": "model",
"created": 1700000000,
"owned_by": "anthropic"
}
]
}All Admin endpoints (except OAuth login URLs) require a JWT access token:
Authorization: Bearer <jwt_access_token>
The Admin API uses role-based access control with two roles:
| Role | Access |
|---|---|
super_admin |
Full access to all endpoints and resources |
admin |
Scoped access controlled by permissions object |
Admin users have a permissions JSON object that controls what they can manage:
| Permission | Values | Controls |
|---|---|---|
manage_api_keys |
true/"all", [id, ...], false |
API token CRUD |
manage_teams |
true/"all", [id, ...], false |
Team CRUD |
manage_models |
true/"all", [id, ...], false |
Model configuration |
view_usage |
true/false |
Usage statistics |
view_monitor |
true/false |
Request monitor |
trueor"all": full access to all resources of that type- Array of UUIDs: access only to those specific resources
falseor missing: no access (403)
Endpoints are guarded by permission requirements:
| Endpoint Group | Required Permission |
|---|---|
/admin/tokens/* |
manage_api_keys |
/admin/teams/* |
manage_teams |
/admin/models/* |
manage_models |
/admin/usage/* |
view_usage |
/admin/monitor/* |
view_monitor |
/admin/users/* |
super_admin role only |
/admin/audit-logs |
super_admin role only |
/admin/audit-logs/activity |
Any admin |
Get Microsoft OAuth authorization URL.
| Parameter | In | Type | Description |
|---|---|---|---|
redirect_uri |
query | string | Redirect URI after authorization |
Response:
{
"authorization_url": "https://login.microsoftonline.com/.../authorize?...",
"state": "random_csrf_state"
}Handle Microsoft OAuth callback. Creates or links user account.
| Parameter | In | Type | Description |
|---|---|---|---|
code |
query | string | Authorization code from Microsoft |
redirect_uri |
query | string | Redirect URI used in authorization |
state |
query | string | State parameter for CSRF protection |
Response (LoginResponse):
{
"access_token": "eyJ...",
"refresh_token": "eyJ...",
"token_type": "bearer",
"user": {
"id": "uuid",
"email": "user@example.com",
"first_name": "John",
"last_name": "Doe",
"is_active": true,
"is_admin": true,
"role": "admin",
"permissions": {
"manage_api_keys": "all", // pragma: allowlist secret
"manage_teams": ["uuid1", "uuid2"],
"manage_models": true,
"view_usage": true,
"view_monitor": true
},
"email_verified": true,
"current_balance": "5.00"
}
}Get AWS Cognito OAuth authorization URL.
| Parameter | In | Type | Description |
|---|---|---|---|
redirect_uri |
query | string | Redirect URI after authorization |
Response: Same structure as Microsoft login.
Handle AWS Cognito OAuth callback. Creates or links user account.
| Parameter | In | Type | Description |
|---|---|---|---|
code |
query | string | Authorization code from Cognito |
redirect_uri |
query | string | Redirect URI used in authorization |
state |
query | string | State parameter for CSRF protection |
Response: Same LoginResponse structure as Microsoft callback.
Refresh access token using refresh token (with automatic rotation).
Request body:
{ "refresh_token": "eyJ..." }Response: LoginResponse with new access_token and refresh_token.
Old refresh token is invalidated after rotation. Token reuse triggers theft detection.
Revoke a specific refresh token.
Request body:
{ "refresh_token": "eyJ..." }Response: { "message": "Refresh token revoked successfully" }
Revoke all refresh tokens for the current user (logout from all devices). Requires JWT.
Response: { "message": "Revoked N refresh tokens successfully" }
Get current user information.
Response: UserResponse object (see Microsoft callback response).
Update current user profile.
Request body:
{ "first_name": "Jane", "last_name": "Smith" }Response: Updated UserResponse.
Create a new API token. The token prefix is hardcoded as sk-ant-api03 and cannot be configured.
Request body:
{
"name": "My API Key",
"description": "Development token for team Alpha",
"expires_at": "2026-12-31T23:59:59",
"quota_usd": 100.00,
"monthly_quota_usd": 20.00,
"monthly_reset_policy": "reset",
"allowed_ips": ["192.168.1.0/24"],
"token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Token display name |
description |
string | no | Token description |
expires_at |
datetime | no | Expiration timestamp |
quota_usd |
decimal | no | Total usage quota in USD |
monthly_quota_usd |
decimal | no | Monthly usage quota in USD |
monthly_reset_policy |
string | no | "reset" (default) or "rollover" |
allowed_ips |
array | no | IP allowlist (CIDR) |
token_metadata |
object | no | Additional config (e.g. prompt_cache_enabled, prompt_cache_ttl) |
Response (201): TokenWithKeyResponse -- includes the plain token value. This is the only time the plain token is returned.
{
"id": "uuid",
"name": "My API Key",
"description": "Development token for team Alpha",
"token": "sk-ant-api03_abc123...",
"key_prefix": "sk-ant-api03",
"expires_at": "2026-12-31T23:59:59",
"quota_usd": "100.00",
"monthly_quota_usd": "20.00",
"monthly_reset_policy": "reset",
"used_usd": "0.00",
"monthly_used_usd": "0.00",
"daily_used_usd": "0.00",
"remaining_quota": "100.00",
"allowed_ips": ["192.168.1.0/24"],
"allowed_models": [],
"is_active": true,
"is_expired": false,
"is_quota_exceeded": false,
"created_at": "2026-01-01T00:00:00",
"last_used_at": null,
"token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "5m"}
}List all tokens for the current user.
| Parameter | In | Type | Default | Description |
|---|---|---|---|---|
include_inactive |
query | boolean | false |
Include inactive/revoked tokens |
Response: Array of TokenResponse.
Get token details by UUID.
Update token settings.
Request body (all fields optional):
{
"name": "Renamed Key",
"description": "Updated description",
"expires_at": "2027-06-30T00:00:00",
"quota_usd": 200.00,
"monthly_quota_usd": 50.00,
"monthly_reset_policy": "rollover",
"allowed_ips": ["10.0.0.0/8"],
"is_active": true,
"token_metadata": {"prompt_cache_enabled": true, "prompt_cache_ttl": "1h"}
}Soft-delete a token. Sets is_deleted = true, is_active = false, and invalidates the Redis cache. The token is immediately blocked from API access. Historical usage data is preserved. Returns 204 No Content.
Deactivate a token (can be reactivated via PUT).
Response: Updated TokenResponse with is_active: false.
Retrieve the decrypted plain token value.
Response: { "token": "kbr_..." }
List available models that the proxy can actually invoke from the deployment region. The response is built from the inference profile cache (_ProfileCache), which is populated at startup and refreshed daily at 03:00 UTC via AWS APIs.
The list includes:
- Inference profiles available in the deployment region (e.g.
us.anthropic.claude-sonnet-4-6,global.anthropic.claude-sonnet-4-5-20250929-v1:0) - Foundation models available only in the fallback region (e.g.
zai.glm-5,deepseek.v3.2) — marked withis_fallback: true - Gemini models (if
GEMINI_API_KEYis configured) — appended dynamically
Only models actually callable from the deployment region are shown. For example, if deployed in us-west-1 where global.anthropic.claude-sonnet-4-20250514-v1:0 is not available, it will not appear in the list.
Results are cached in-memory for 12 hours.
Response:
{
"models": [
{
"model_id": "us.anthropic.claude-sonnet-4-6",
"model_name": "anthropic.claude-sonnet-4-6",
"friendly_name": "anthropic.claude-sonnet-4-6",
"provider": "bedrock-converse",
"is_cross_region": true,
"cross_region_type": "us",
"streaming_supported": true,
"is_fallback": false
},
{
"model_id": "zai.glm-5",
"model_name": "zai.glm-5",
"friendly_name": "zai.glm-5",
"provider": "bedrock-converse",
"is_cross_region": false,
"cross_region_type": null,
"streaming_supported": true,
"is_fallback": true
}
]
}List enabled models from the database.
| Parameter | In | Type | Description |
|---|---|---|---|
token_id |
query | string | Optional -- filter by token UUID |
Response:
{
"models": [
{
"id": "uuid",
"model_name": "claude-sonnet-4-5",
"model_id": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
"friendly_name": "claude-sonnet-4-5",
"provider": "bedrock",
"streaming_supported": true,
"is_active": true
}
]
}Add a model to a token's allowed list.
Request body:
{
"token_id": "uuid",
"model_name": "claude-sonnet-4-5"
}Response: { "message": "Model claude-sonnet-4-5 added successfully", "id": "uuid" }
Soft-delete a model configuration.
Response: { "message": "Model claude-sonnet-4-5 deleted successfully" }
Get aggregated usage statistics for the current user.
| Parameter | In | Type | Description |
|---|---|---|---|
start_date |
query | datetime | Optional custom range start |
end_date |
query | datetime | Optional custom range end |
Response:
{
"current_month_cost": "12.34",
"current_month_requests": 150,
"current_month_tokens": 50000,
"last_30_days_cost": "45.67",
"last_30_days_requests": 500,
"total_cost": "123.45",
"total_requests": 2000
}Usage statistics grouped by API token.
| Parameter | In | Type | Description |
|---|---|---|---|
token_id |
query | string | Optional filter |
start_date |
query | datetime | Optional start |
end_date |
query | datetime | Optional end |
Response: Array of UsageByTokenResponse.
Usage statistics grouped by model.
| Parameter | In | Type | Description |
|---|---|---|---|
model |
query | string | Optional model filter |
start_date |
query | datetime | Optional start |
end_date |
query | datetime | Optional end |
Response: Array of UsageByModelResponse.
Time-series usage data.
| Parameter | In | Type | Required | Description |
|---|---|---|---|---|
start_date |
query | datetime | yes | Range start |
end_date |
query | datetime | yes | Range end |
granularity |
query | string | no | hourly, daily (default), weekly, monthly |
token_id |
query | string | no | Filter by token |
Response:
{
"granularity": "daily",
"start_date": "2026-01-01T00:00:00",
"end_date": "2026-01-31T23:59:59",
"data": [
{
"time_bucket": "2026-01-15",
"call_count": 42,
"total_prompt_tokens": 10000,
"total_completion_tokens": 5000,
"total_tokens": 15000,
"total_cost": "3.21"
}
]
}Per-token usage summary for a time period.
| Parameter | In | Type | Required | Description |
|---|---|---|---|---|
start_date |
query | datetime | yes | Range start |
end_date |
query | datetime | yes | Range end |
Response: Array of TokenUsageSummary:
[
{
"token_id": "uuid",
"token_name": "My Key",
"call_count": 100,
"total_tokens": 25000,
"total_cost": "8.50"
}
]Multi-token time-series data for chart overlays.
| Parameter | In | Type | Required | Description |
|---|---|---|---|---|
start_date |
query | datetime | yes | Range start |
end_date |
query | datetime | yes | Range end |
token_ids |
query | string | yes | Comma-separated token UUIDs |
granularity |
query | string | no | hourly, daily, weekly, monthly |
metric |
query | string | no | calls (default), tokens, cost |
Response:
{
"granularity": "daily",
"metric": "calls",
"series": [
{
"token_id": "uuid",
"token_name": "Key A",
"data": [
{ "time_bucket": "2026-01-15", "value": 42 }
]
}
]
}List audit logs with pagination and filters.
| Parameter | In | Type | Default | Description |
|---|---|---|---|---|
page |
query | integer | 1 |
Page number (1-based) |
page_size |
query | integer | 50 |
Items per page (max 200) |
user_id |
query | uuid | null | Filter by user |
action |
query | string | null | Filter by action type |
success |
query | boolean | null | Filter by outcome |
start_date |
query | datetime | null | From date |
end_date |
query | datetime | null | To date |
Response:
{
"items": [
{
"id": "uuid",
"user_id": "uuid",
"action": "login_success",
"success": true,
"details": null,
"error_message": null,
"ip_address": "1.2.3.4",
"user_agent": "Mozilla/5.0...",
"resource_type": null,
"resource_id": null,
"created_at": "2026-01-15T10:30:00"
}
],
"total": 150,
"page": 1,
"page_size": 50,
"total_pages": 3
}Audit activity summary with counts by action type.
| Parameter | In | Type | Description |
|---|---|---|---|
start_date |
query | datetime | Optional from date |
end_date |
query | datetime | Optional to date |
Response:
{
"total": 500,
"success_count": 480,
"failure_count": 20,
"action_counts": {
"login_success": 200,
"token_refresh_success": 150,
"login_failed": 20,
"logout_all_devices": 5
}
}Activity feed visible to all admins. Shows only management operations (token/team/model/admin CRUD) from the last N days. Non-super_admin users cannot see actions performed by super_admins.
| Parameter | In | Type | Default | Description |
|---|---|---|---|---|
page |
query | integer | 1 |
Page number (1-based) |
page_size |
query | integer | 50 |
Items per page (max 100) |
days |
query | integer | 7 |
Lookback window (1-30 days) |
Response:
{
"items": [
{
"id": "uuid",
"user_id": "uuid",
"user_email": "admin@example.com",
"action": "token_created",
"resource_type": "token",
"resource_id": "uuid",
"details": "{\"name\": \"New Token\"}",
"created_at": "2026-01-15T10:30:00"
}
],
"total": 25,
"page": 1,
"page_size": 50,
"total_pages": 1
}Super admin only. Manage admin user accounts.
List all active admin users (super_admin and admin roles).
Response: Array of AdminUserResponse:
[
{
"id": "uuid",
"email": "admin@example.com",
"first_name": "John",
"last_name": "Doe",
"role": "admin",
"permissions": {
"manage_api_keys": "all", // pragma: allowlist secret
"manage_teams": "all",
"manage_models": "all",
"view_usage": true,
"view_monitor": true
},
"is_active": true,
"created_at": "2026-01-01T00:00:00",
"last_login_at": "2026-01-15T10:30:00"
}
]Invite a new admin. Creates a Cognito user with a temporary password and a local user record.
Request body:
{
"email": "newadmin@example.com",
"username": "newadmin",
"temp_password": "TempPass123!", // pragma: allowlist secret
"role": "admin",
"permissions": {
"manage_api_keys": "all", // pragma: allowlist secret
"manage_teams": ["team-uuid-1"],
"manage_models": true,
"view_usage": true,
"view_monitor": true
}
}| Field | Type | Required | Description |
|---|---|---|---|
email |
string | yes | Admin email address |
username |
string | yes | Cognito login username |
temp_password |
string | yes | Temporary password (must change on first login) |
role |
string | no | "super_admin" or "admin" (default: "admin") |
permissions |
object | no | Permission scope (only for admin role) |
Response (201): AdminUserResponse
Update admin user role or permissions.
Request body (all fields optional):
{
"role": "admin",
"permissions": { "manage_api_keys": "all", "view_usage": true }, // pragma: allowlist secret
"is_active": true
}Response: Updated AdminUserResponse.
Deactivate an admin user. Cannot deactivate yourself. Returns 204 No Content.
List all assignable resources (tokens, teams, models) for the permission editor UI.
Response:
{
"api_keys": [{ "id": "uuid", "name": "Token Name" }],
"teams": [{ "id": "uuid", "name": "Team Name" }],
"models": [{ "id": "model-id", "name": "model-id" }]
}Manage team budgets and member allocations. Requires manage_teams permission.
Create a new team.
Request body:
{
"name": "Engineering Team",
"monthly_budget_usd": 500.00,
"monthly_reset_policy": "reset",
"daily_limit_enabled": true
}| Field | Type | Required | Description |
|---|---|---|---|
name |
string | yes | Team name |
monthly_budget_usd |
decimal | yes | Monthly budget in USD |
monthly_reset_policy |
string | no | "reset" (default) or "rollover" |
daily_limit_enabled |
boolean | no | Enable daily spending limit per member (default: true) |
Response (201):
{
"id": "uuid",
"name": "Engineering Team",
"monthly_budget_usd": "500.00",
"monthly_reset_policy": "reset",
"daily_limit_enabled": true,
"unallocated_pool_usd": "500.00"
}List all teams. Returns TeamListItem array with member counts and usage summaries.
Get team dashboard with detailed member usage.
Response (TeamDashboardResponse):
{
"id": "uuid",
"name": "Engineering Team",
"monthly_budget_usd": "500.00",
"monthly_reset_policy": "reset",
"daily_limit_enabled": true,
"total_allocated_usd": "300.00",
"total_used_usd": "45.67",
"unallocated_pool_usd": "200.00",
"members": [
{
"token_id": "uuid",
"token_name": "Alice",
"allocated_usd": "150.00",
"used_usd": "23.45",
"remaining_usd": "126.55",
"daily_limit_usd": "4.84",
"daily_used_usd": "1.20",
"is_active": true,
"last_used_at": "2026-01-15T10:30:00"
}
]
}Update team settings (name, budget, reset policy, daily limit).
Delete a team. Returns 204 No Content.
Add a member (token) to the team with a budget allocation.
Request body:
{
"token_id": "uuid",
"allocated_usd": 100.00
}Remove a member from the team.
Adjust a member's allocated budget.
Request body:
{
"allocated_usd": 150.00
}Transfer allocation between members.
Request body:
{
"from_token_id": "uuid",
"to_token_id": "uuid",
"amount": 25.00
}Batch create members with new tokens and equal allocations.
Request body:
{
"names": "alice, bob, charlie",
"per_member_allocation": 50.00,
"expires_at": "2026-12-31T23:59:59",
"quota_usd": 100.00,
"allowed_ips": null,
"token_metadata": null,
"model_names": ["claude-sonnet-4-5"]
}No authentication required.
Basic health check for load balancer.
{ "status": "healthy", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }Readiness probe. Verifies database connectivity. Returns 503 if unhealthy.
{
"status": "ready",
"timestamp": "2026-01-15T10:00:00",
"service": "kolya-br-proxy",
"components": {
"database": { "status": "healthy", "message": "Connected" }
}
}Liveness probe for Kubernetes.
{ "status": "alive", "timestamp": "2026-01-15T10:00:00", "service": "kolya-br-proxy" }Basic metrics endpoint.
{
"service": "kolya-br-proxy",
"timestamp": "2026-01-15T10:00:00",
"version": "1.0.0",
"debug_mode": false,
"metrics": {
"health_checks_total": "counter",
"requests_total": "counter",
"request_duration_seconds": "histogram"
}
}