This document covers the security protection design in Kolya BR Proxy, including AWS WAF (rate limiting and managed rule sets at the ALB layer), Cross-Origin Resource Sharing (CORS), Cross-Site Request Forgery (CSRF) protection, API token hashing, refresh token hardening, and JWT library choices — how these attacks work, why an API gateway must defend against them, and the specific implementation in this project.
Kolya BR Proxy serves two types of clients:
- API clients (Cline, Cursor, OpenAI SDK) — Call
/v1/*endpoints with Bearer Tokens - Browser clients (Vue admin dashboard) — Call
/admin/*endpoints with JWT (Bearer Token)
This project uses a hybrid authentication approach: access tokens are stored in localStorage and sent via Authorization: Bearer <token> headers, while refresh tokens are stored in HttpOnly cookies (kbr_refresh_token, Path=/admin/auth, SameSite=None; Secure) to prevent XSS theft of long-lived credentials. The OAuth login flow also uses PKCE (Proof Key for Code Exchange, S256) to prevent authorization code interception attacks.
This project implements full CORS and CSRF protection for the following reasons:
- Defense in Depth — Security should never rely on a single mechanism. The HttpOnly cookie for refresh tokens requires proper CSRF protection
- Origin validation blocks cross-origin requests from unauthorized domains regardless of the authentication method
- Security response headers prevent clickjacking, MIME sniffing, XSS, and other browser-side attacks
- Industry best practice — OWASP recommends these protections for all web APIs regardless of the current authentication scheme
The browser's Same-Origin Policy blocks web pages from making requests to a different origin (protocol + domain + port) by default. CORS is a set of HTTP headers that allow servers to declare which external origins may access their resources.
The frontend and backend in this project run on different origins, a typical cross-origin architecture:
| Environment | Frontend (Admin Dashboard) | Backend (API) | Cross-Origin? |
|---|---|---|---|
| Local dev | http://localhost:9000 |
http://localhost:8000 |
Yes (different port) |
| Production | https://kbp.kolya.fun |
https://api.kbp.kolya.fun |
Yes (different domain) |
Per the browser's Same-Origin Policy, if any of protocol, domain, or port differs, the request is cross-origin. Every request from the admin dashboard to the backend API is a cross-origin request. If the backend does not configure a CORS allowlist, the browser blocks all API calls from the frontend, and the admin dashboard cannot function at all.
CORS configuration is therefore a fundamental requirement for this project — not an optional security hardening measure, but a prerequisite for the separated frontend/backend architecture to work.
However, misconfigured CORS is equally dangerous:
Access-Control-Allow-Origin: *— Any website can read API responses, potentially leaking sensitive data- Overly broad allowlists — Attackers can leverage permitted domains to launch attacks
The correct approach is to only allow your own frontend domain and reject all other origins.
1. Admin logs into the Kolya BR Proxy dashboard (browser holds JWT)
2. Admin opens attacker's malicious site evil.com in a new tab
3. evil.com's JavaScript sends a GET request to https://api.kbp.kolya.fun/admin/tokens
4. If CORS is configured as *, the browser allows evil.com to read the response
5. Attacker obtains the full API Token list
Configuration entry: backend/main.py
app.add_middleware(
CORSMiddleware,
allow_origins=settings.get_allowed_origins(), # Strict allowlist
allow_credentials=True, # Allow credentials
allow_methods=["*"],
allow_headers=["*"],
)Preflight reliability: SecurityMiddleware handles OPTIONS requests directly — returning CORS headers without calling call_next(). This prevents intermittent header loss that can occur when BaseHTTPMiddleware passes preflight requests through the full middleware chain. The CORS headers on preflight responses are thus guaranteed regardless of inner middleware behavior.
Per-environment policy:
| Environment | KBR_ALLOWED_ORIGINS |
Security Level |
|---|---|---|
| Local development | http://localhost:3000,http://localhost:9000 or * |
Relaxed (only env that allows *) |
| Non-production | Non-prod domains only (no wildcards) | Moderate |
| Production | Production domains only (e.g., https://kbp.kolya.fun) |
Strict |
Wildcard validation (backend/app/core/config.py):
@validator("ALLOWED_ORIGINS")
def validate_allowed_origins(cls, v, values):
env = os.getenv("KBR_ENV", "non-prod")
if v == "*" and env != "local":
raise ValueError(
f"Wildcard CORS origins ('*') not allowed in {env} environment. "
"Only KBR_ENV=local supports '*'."
)
return vCSRF attacks exploit the browser's automatic cookie-sending behavior to make authenticated users unknowingly submit requests to a target site. Unlike CORS, CSRF does not need to read the response — the damage is done simply by the request being executed.
This is a common misconception. CORS only controls whether the browser allows reading the response, not whether the request is sent:
| Scenario | CORS Behavior | CSRF Risk |
|---|---|---|
Simple POST (Content-Type: application/x-www-form-urlencoded) |
Browser sends directly, no preflight | High — request reaches server and executes |
JSON POST (Content-Type: application/json) |
Triggers preflight (OPTIONS); actual request blocked if preflight fails | Low — but depends on correct CORS config |
| GET request | No preflight, sent directly | Depends on whether GET has side effects |
For this project's JSON API, the CORS preflight mechanism provides some protection. But relying on CORS alone is insufficient:
- CORS policies can be misconfigured
- Browser behavior varies in edge cases
- Defense-in-depth requires multiple layers
This project stores refresh tokens in HttpOnly cookies (kbr_refresh_token, Path=/admin/auth), which means the browser automatically attaches them to matching requests. This makes CSRF protection directly relevant:
1. Admin is logged into the Kolya BR Proxy dashboard (browser holds HttpOnly refresh token cookie)
2. Admin opens a malicious page evil.com in another tab
3. evil.com submits a hidden form POST to https://api.kbp.kolya.fun/admin/auth/refresh
(Content-Type: application/x-www-form-urlencoded, bypasses CORS preflight)
4. Browser automatically attaches the kbr_refresh_token cookie
5. Without CSRF protection, the server would issue a new access token to the attacker
The three-layer CSRF defense (Origin validation + Referer check + custom header requirement) prevents this attack. Additionally, the cookie is scoped to Path=/admin/auth, limiting the attack surface to auth endpoints only. Access tokens are stored in localStorage and sent via Authorization: Bearer headers, which browsers never attach automatically.
Implementation file: backend/app/middleware/security.py — SecurityMiddleware
This project uses a three-layer defense strategy:
For all state-changing operations (POST, PUT, DELETE, PATCH), verify the Origin header against the allowlist:
if origin and not self._is_origin_allowed(origin):
# 403 Forbidden - Origin not allowed- Browsers automatically include the
Originheader on cross-origin requests, and JavaScript cannot forge it - Requests from unlisted origins are immediately rejected
Verify the origin in the Referer header is legitimate:
if self.enforce_referer and not self._validate_referer(request):
# 403 Forbidden - Invalid referer- Disabled by default (
enforce_referer=False) as some clients do not send Referer - Can be enabled as an additional security layer
Require browser requests to include either X-Requested-With or Authorization:
if self.require_custom_header and origin:
has_auth_header = request.headers.get("authorization")
has_custom_header = request.headers.get("x-requested-with")
if not has_auth_header and not has_custom_header:
# 403 Forbidden - Missing CSRF protection headerWhy does this prevent CSRF?
- HTML forms and
<img>tags cannot set custom HTTP headers - Only JavaScript's
XMLHttpRequestorfetchcan add custom headers - Cross-origin JavaScript requests trigger CORS preflight, which is blocked by the origin allowlist
Frontend counterpart (frontend/src/boot/axios.ts):
const api = axios.create({
headers: {
'X-Requested-With': 'XMLHttpRequest', // CSRF protection
},
});The following requests bypass CSRF checks:
| Condition | Reason |
|---|---|
GET, HEAD, OPTIONS methods |
Safe methods that do not change server state |
/health/* paths |
Public health-check endpoints |
Requests without Origin header |
Non-browser clients (curl, SDKs) authenticated via Bearer Token |
The SecurityMiddleware described above protects general API requests from CSRF. However, the OAuth login flow has a unique attack surface — the OAuth callback endpoint.
In the OAuth Authorization Code flow, after the user completes authentication at a third party (Microsoft / Cognito), they are redirected back to this project's callback endpoint with a code parameter. If an attacker can craft a malicious callback URL and trick a user into clicking it, they can:
- Login CSRF: Bind the victim's session to the attacker's account
- Authorization code injection: Replace the victim's code with one obtained by the attacker, hijacking the user's account
1. Attacker initiates OAuth login and obtains a valid authorization code
2. Attacker crafts a callback URL:
https://api.kbp.kolya.fun/admin/auth/cognito/callback?code=ATTACKER_CODE&state=...
3. Attacker tricks the victim (a logged-in admin) into clicking the link
4. Without state validation, the server accepts the attacker's code
5. The victim's browser session is bound to the attacker's OAuth identity
Key files:
backend/app/services/oauth.py—OAuthService(state generation and validation)backend/app/models/oauth_state.py—OAuthState(database model)backend/app/api/admin/endpoints/auth.py— Login and callback endpoints
1. User clicks login → /admin/auth/cognito/login
│
├── Generate state = secrets.token_urlsafe(32) (cryptographically secure random)
├── Store in database oauth_states table (with provider, expires_at)
└── Return authorization_url (includes state parameter)
2. User completes auth at Cognito/Microsoft → redirected to callback
│
├── /admin/auth/cognito/callback?code=xxx&state=yyy
│
├── Validate state:
│ ├── Database lookup: state value + provider must match
│ ├── Expiry check: valid for 10 minutes after creation
│ ├── One-time use: deleted from database immediately after validation
│ └── Any check fails → 403 Forbidden
│
└── State validated → exchange code for token → login success
| Property | Implementation | Purpose |
|---|---|---|
| Cryptographically secure random | secrets.token_urlsafe(32) (256 bits of entropy) |
Prevent state guessing or brute force |
| Server-side storage | PostgreSQL oauth_states table |
Prevent client-side tampering |
| Provider binding | state + provider joint validation |
Prevent cross-provider replay |
| 10-minute expiry | expires_at = created_at + 10min |
Minimize attack time window |
| One-time use | Deleted immediately after successful validation | Prevent replay attacks |
| Expired state cleanup | cleanup_expired_states() periodic cleanup |
Prevent database bloat |
Generate state (backend/app/services/oauth.py):
async def generate_state(self, provider: str) -> tuple[str, str]:
state = secrets.token_urlsafe(32)
code_verifier = secrets.token_urlsafe(96) # PKCE
digest = hashlib.sha256(code_verifier.encode("ascii")).digest()
code_challenge = base64.urlsafe_b64encode(digest).rstrip(b"=").decode("ascii")
oauth_state = OAuthState(state=state, provider=provider, code_verifier=code_verifier)
self.db.add(oauth_state)
await self.db.commit()
return state, code_challengeValidate state (backend/app/services/oauth.py):
async def validate_state(self, state: str, provider: str) -> tuple[bool, str | None]:
# Query database
query = select(OAuthState).where(
OAuthState.state == state, OAuthState.provider == provider
)
oauth_state = result.scalar_one_or_none()
if not oauth_state or oauth_state.is_expired():
return False, None
code_verifier = oauth_state.code_verifier # PKCE
# One-time use: delete after validation
await self.db.delete(oauth_state)
await self.db.commit()
return True, code_verifierCallback endpoint validation (backend/app/api/admin/endpoints/auth.py):
# Both Microsoft and Cognito callbacks perform the same validation
is_valid, code_verifier = await oauth_state_service.validate_state(state, "cognito")
if not is_valid:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="Invalid or expired state parameter",
)
# code_verifier is passed to the token exchange requestPKCE is an OAuth 2.1 best practice that prevents authorization code interception attacks. Even if an attacker intercepts the authorization code (e.g., via a malicious browser extension or redirect), they cannot exchange it for tokens without the code_verifier.
This project uses the S256 challenge method. The entire PKCE flow is handled server-side — the frontend does not need to participate.
| Step | Action | Location |
|---|---|---|
| 1. Generate | code_verifier = secrets.token_urlsafe(96) (~128 chars) |
OAuthService.generate_state() |
| 2. Challenge | code_challenge = BASE64URL(SHA256(code_verifier)) |
OAuthService.generate_state() |
| 3. Store | code_verifier saved in oauth_states DB table |
OAuthState.code_verifier column |
| 4. Authorize | code_challenge + code_challenge_method=S256 sent in auth URL |
get_authorization_url() |
| 5. Exchange | code_verifier sent in token exchange request |
exchange_code_for_token() |
| 6. Verify | IdP verifies SHA256(code_verifier) == code_challenge |
Microsoft / Cognito server |
Since the backend handles both the authorization URL construction and the token exchange, the code_verifier is stored server-side in the oauth_states database table (alongside the CSRF state parameter). This means the frontend requires zero changes for PKCE support.
Refresh tokens are long-lived (7 days) credentials. Storing them in localStorage exposes them to XSS attacks — any injected JavaScript can read localStorage and exfiltrate the token. HttpOnly cookies cannot be accessed by JavaScript (document.cookie returns nothing), providing a strong defense against XSS theft.
| Attribute | Value | Purpose |
|---|---|---|
key |
kbr_refresh_token |
Cookie name |
httponly |
true |
Prevents JavaScript access |
secure |
true (non-local) |
Only sent over HTTPS |
samesite |
none (non-local) / lax (local) |
Cross-origin required for api.kbp.kolya.fun ↔ kbp.kolya.fun |
path |
/admin/auth |
Only sent to auth endpoints, not /v1/* |
max_age |
7 days | Matches refresh token lifetime |
The frontend (kbp.kolya.fun) and backend (api.kbp.kolya.fun) are on different subdomains. This requires:
SameSite=None; Secureon the cookie (browsers requireSecurewhenSameSite=None)withCredentials: trueon the Axios client (to send cookies cross-origin)allow_credentials=Truein CORS configuration (already in place)
For local development over HTTP, the cookie uses SameSite=Lax since SameSite=None requires HTTPS.
- Set cookie:
backend/app/core/cookies.py—set_refresh_token_cookie() - Read cookie:
backend/app/core/cookies.py—get_refresh_token_from_cookie() - Clear cookie:
backend/app/core/cookies.py—clear_refresh_token_cookie() - Backward compatibility: The
/admin/auth/refreshand/admin/auth/revokeendpoints accept refresh tokens from both cookies (preferred) and request body (legacy)
In addition to CORS and CSRF protection, SecurityMiddleware adds security headers to every response:
| Header | Value | Purpose |
|---|---|---|
X-Content-Type-Options |
nosniff |
Prevent browser MIME-type sniffing |
X-Frame-Options |
DENY |
Prevent clickjacking |
X-XSS-Protection |
1; mode=block |
Enable browser XSS filter |
Referrer-Policy |
strict-origin-when-cross-origin |
Control Referer information leakage |
Content-Security-Policy |
default-src 'none'; frame-ancestors 'none' |
Restrict resource loading, prevent injection |
Cache-Control |
no-cache, no-store, must-revalidate |
Prevent caching of sensitive data |
The application-layer SecurityMiddleware protects against CORS/CSRF attacks, but it cannot defend against volumetric abuse — high-frequency requests that overwhelm backend resources. Rate limiting at the application layer requires additional infrastructure (e.g., Redis for distributed counters) and still consumes Pod resources for every request.
AWS WAF operates at the ALB layer, intercepting malicious traffic before it reaches EKS Pods:
Client → (Optional Global Accelerator) → ALB + WAF → EKS Pod
│
├── Rate limit exceeded? → 403 (blocked at ALB)
├── Known bad input? → 403 (blocked at ALB)
└── Passed all rules → Forward to Pod
Advantages over application-layer rate limiting:
| Aspect | Application Layer | AWS WAF (ALB Layer) |
|---|---|---|
| Traffic reaches Pod | Yes — consumes CPU/memory | No — blocked before Pod |
| Distributed counting | Requires Redis or similar | Built-in (AWS-managed) |
| Managed rule sets | Manual implementation | AWS-maintained (SQLi, XSS, known bad inputs) |
| Scalability | Limited by Pod resources | Scales with ALB |
The WAF WebACL contains five rules, evaluated in priority order:
| Priority | Rule | Type | Threshold / Action | Purpose |
|---|---|---|---|---|
| 1 | rate-limit-auth |
Rate-based (scoped) | 50 req / 5 min per IP on /admin/auth/* (excludes OPTIONS) |
Prevent OAuth brute force |
| 2 | rate-limit-inference |
Rate-based (scoped) | 300 req / 5 min per IP on /v1/chat/completions, /v1/messages, /v1beta/models/ |
Prevent API abuse |
| 3 | aws-managed-common |
AWS Managed Rule Group | AWSManagedRulesCommonRuleSet (some rules overridden to Count) | SQLi, XSS, and other common exploits |
| 4 | aws-managed-known-bad-inputs |
AWS Managed Rule Group | AWSManagedRulesKnownBadInputsRuleSet | Known malicious payloads (Log4j, etc.) |
| 5 | rate-limit-global |
Rate-based (global) | 2000 req / 5 min per IP | Global abuse prevention |
Rule evaluation logic:
- Rules are evaluated from lowest to highest priority number
- Path-scoped rate limits (auth, chat) are checked first so they can enforce tighter limits on sensitive endpoints
- AWS Managed Rules catch known attack patterns (SQLi, XSS, Log4j, etc.) regardless of rate
- The global rate limit acts as a catch-all for any IP exceeding overall thresholds
- Default action is Allow — only requests matching a rule's condition are blocked
Certain rules in AWSManagedRulesCommonRuleSet produce false positives on legitimate LLM API traffic. The following rules are overridden to Count (log only, no block) via rule_action_override:
| Excluded Rule | False Positive Scenario | Rationale |
|---|---|---|
SizeRestrictions_BODY |
Agent loop request bodies (system prompt + tool definitions + multi-turn conversation history) exceed the 8KB limit | LLM request bodies are inherently large; the 8KB threshold is unsuitable for this API |
CrossSiteScripting_BODY |
Code snippets in user messages contain <script> tags, triggering XSS false positives |
Chat content containing code is a normal use case, and the API does not render HTML |
NoUserAgent_HEADER |
Some SDK clients (OpenCode, etc.) do not send a User-Agent header |
API clients may not send User-Agent; requests are already authenticated via Bearer Token |
Security assurance: The /v1/chat/completions path is protected by Bearer Token authentication + IP rate limiting (300 req / 5 min). Downgrading these rules does not reduce actual security but prevents false-positive blocking of normal LLM traffic.
Excluded rules still operate in Count mode, so their trigger frequency can be monitored in CloudWatch metrics. If anomalous patterns are detected, they can be restored to Block at any time.
| Endpoint | Limit | Rationale |
|---|---|---|
/admin/auth/* |
50 / 5 min (excludes OPTIONS) | OAuth login involves redirects and token exchange. OPTIONS preflight requests are excluded from counting because browsers send preflight before every cross-origin request, and counting them inflates the rate unfairly — this previously caused intermittent CORS failures. |
/v1/chat/completions, /v1/messages, /v1beta/models/ |
300 / 5 min | Each inference request invokes a Bedrock model call (costly). 300 per 5 minutes (1 req/sec average) is sufficient for normal usage while preventing runaway scripts. |
| Global | 2000 / 5 min | Covers all endpoints including static assets, health checks, and API calls. Generous enough for legitimate browsing but blocks automated scanners and DDoS. |
The WAF WebACL is associated with two ALBs (created by Kubernetes ALB Controller, discovered by name via Terraform data "aws_lb"):
| ALB | Default Name | Protects |
|---|---|---|
| Frontend ALB | kolya-br-proxy-frontend-alb |
Admin dashboard (/admin/*) |
| API ALB | kolya-br-proxy-api-alb |
API endpoints (/v1/*, /health/*) |
Terraform module: iac/modules/waf/
| File | Contents |
|---|---|
main.tf |
aws_wafv2_web_acl (WebACL + 5 rules) and aws_wafv2_web_acl_association × 2 |
data.tf |
ALB auto-discovery by name |
variables.tf |
ALB names, rate limit thresholds, project metadata |
outputs.tf |
WebACL ARN, ID |
Root module variables (iac/variables.tf):
| Variable | Type | Default | Description |
|---|---|---|---|
enable_waf |
bool | true |
Enable/disable the entire WAF module |
waf_frontend_alb_name |
string | kolya-br-proxy-frontend-alb |
Frontend ALB name for auto-discovery |
waf_api_alb_name |
string | kolya-br-proxy-api-alb |
API ALB name for auto-discovery |
waf_rate_limit_global |
number | 2000 |
Global rate limit (req / 5 min) |
waf_rate_limit_auth |
number | 50 |
Auth endpoint rate limit (req / 5 min, excludes OPTIONS) |
waf_rate_limit_chat |
number | 300 |
Chat endpoint rate limit (req / 5 min) |
All rules have CloudWatch metrics enabled (cloudwatch_metrics_enabled = true) and request sampling (sampled_requests_enabled = true). Metrics are available in the AWS WAF Console and CloudWatch under the following metric names:
kbr-proxy-rate-limit-authkbr-proxy-rate-limit-chatkbr-proxy-aws-managed-commonkbr-proxy-aws-managed-bad-inputskbr-proxy-rate-limit-globalkbr-proxy-waf-<workspace>(WebACL-level default metric)
Browser Request → ALB + WAF → SecurityMiddleware → CORS Middleware → Route Handler
│ │
│ ├── /health/* → Skip checks, pass through
│ ├── OPTIONS → Skip checks (CORS preflight)
│ ├── GET/HEAD → Skip CSRF checks, add security headers
│ └── POST/PUT/DELETE/PATCH
│ │
│ ├── 1. Origin not in allowlist? → 403
│ ├── 2. Invalid Referer? (if enabled) → 403
│ ├── 3. Browser request missing Authorization and X-Requested-With? → 403
│ └── All passed → Add security headers → Continue processing
│
├── Rate limit exceeded? → 403 (blocked at ALB, never reaches Pod)
├── Matches AWS Managed Rule? → 403 (blocked at ALB)
└── Passed all WAF rules → Forward to Pod
API Client Request (no Origin header) → ALB + WAF → SecurityMiddleware → Route Handler
│ │
│ └── No Origin → Skip CSRF checks
│ → Bearer Token validated at route layer
│
└── WAF rules apply equally to API clients
| Protection | Local Dev | Non-Production | Production |
|---|---|---|---|
| AWS WAF (rate limiting + managed rules) | N/A (no ALB) | Enabled (if ALB exists) | Enabled |
| CORS Origin allowlist | localhost | Non-prod domains | Strict domain allowlist |
| CSRF Origin validation | Enabled | Enabled | Enabled |
| X-Requested-With check | Enabled | Enabled | Enabled |
| Referer validation | Disabled | Disabled | Optionally enabled |
| Security response headers | Enabled | Enabled | Enabled |
| Swagger UI | Enabled (DEBUG) | Enabled (DEBUG) | Disabled |
Secrets are managed through AWS Secrets Manager with External Secrets Operator (ESO) syncing them into Kubernetes. This replaces local secrets.yaml files and ensures secrets never exist in version control.
deploy-all.sh
├── aws secretsmanager put-secret-value → AWS Secrets Manager
│ (single source of truth)
│
└── kubectl apply ExternalSecret CRDs
↓
External Secrets Operator (ESO)
├── Authenticates via Pod Identity (no static AWS creds)
├── Reads secrets from AWS Secrets Manager
├── Creates/updates Kubernetes Secrets (refreshInterval: 1h)
└── Backend Pods consume secrets via envFrom / env references
| Property | Implementation |
|---|---|
| No secrets in Git | Secrets stored in AWS Secrets Manager, not in secrets.yaml or Helm values |
| Automatic sync | ESO refreshes every 1 hour (refreshInterval: 1h), picking up rotations automatically |
| IAM-based access | ESO authenticates via EKS Pod Identity -- no static AWS_ACCESS_KEY_ID needed |
| Deployment push | deploy-all.sh pushes secrets via aws secretsmanager put-secret-value |
| Least privilege | ESO IAM role only has secretsmanager:GetSecretValue for specific secret ARNs |
| Secret | Contents |
|---|---|
| Database credentials | DATABASE_URL (Aurora PostgreSQL connection string) |
| JWT signing key | KBR_JWT_SECRET_KEY (Fernet encryption + JWT signing) |
| OAuth client secrets | Cognito and Microsoft OAuth client secrets |
| Token encryption key | Used for API token Fernet encryption/decryption |
API tokens are hashed before storage using plain SHA256 (hashlib.sha256). The hash is deterministic (same input always produces the same output), so it is indexed in the database for O(1) lookup by hash.
The hashing scheme serves two purposes:
- Secure storage — Even if the database is compromised, the original tokens cannot be recovered from the hashes
- Fast lookup — The deterministic hash is indexed in the database for O(1) lookup
Additionally, tokens are encrypted with Fernet (AES-128-CBC) using a key derived from JWT_SECRET_KEY. This allows administrators to retrieve the original token value when needed (e.g., for display in the admin dashboard), while the hash is used for authentication lookups.
| Field | Algorithm | Purpose |
|---|---|---|
token_hash |
SHA256 (plain, no salt/key) | Authentication lookup (indexed) |
encrypted_token |
Fernet (AES-128-CBC) | Admin retrieval of original token |
Plain SHA256 is fast, which means an attacker with a leaked database could brute-force token hashes quickly. However, API tokens are generated with high entropy (secrets.token_urlsafe(32), 256 bits), making brute-force infeasible regardless of hash speed. The Fernet-encrypted token provides a second layer of protection requiring JWT_SECRET_KEY to decrypt.
Note: This is distinct from refresh token hashing, which uses PBKDF2-HMAC-SHA256 with 100,000 iterations (see Refresh Token Security Hardening below).
backend/app/core/security.py—hash_token(),verify_token(),encrypt_token(),decrypt_token()backend/app/core/redis.py—RedisCachewrapper with JSON serialization and graceful degradationbackend/app/services/token.py— Token creation and validation servicebackend/app/services/token_cache.py—CachedTokenServicewith Redis cache (TTL 300s) and DB fallback
When a token is deleted via DELETE /admin/tokens/{token_id}, three layers ensure it is immediately blocked from API access:
| Layer | Mechanism | Protection |
|---|---|---|
| 1 | is_active = false |
validate_token() queries WHERE is_active = TRUE — DB returns no match |
| 2 | Redis cache invalidation | _invalidate_token_cache() deletes the cached entry on deletion |
| 3 | Cache-hit guard | Even if a stale cache is hit (race condition), is_active is checked before returning the token |
The token is soft-deleted (is_deleted = true) to preserve historical usage data for reporting, but is_active = false is the authoritative gate for API access.
Super admins need to export all API keys (including plaintext values) for audit or migration purposes. However, plaintext keys must never appear in any XHR/JSON response — even if an attacker monitors DevTools Network tab, browser extensions intercept responses, or XSS scripts read JavaScript variables, they cannot obtain key values.
The export uses browser-native navigation (window.open()) instead of XHR/fetch:
┌─────────────┐ XHR/fetch ┌──────────┐
│ Frontend │ ◄──────────────────────► │ Backend │
│ │ key_prefix only (no key) │ │
└─────────────┘ └──────────┘
┌─────────────┐ window.open (HTTPS) ┌──────────┐
│ Browser │ ─────── download CSV ────► │ Backend │
│ Download │ ◄─── Content-Disposition ── │ decrypt │
│ Manager │ file saved to disk, └──────────┘
└─────────────┘ never enters JS runtime
| Attack Vector | XHR/fetch Response | window.open() Download |
|---|---|---|
| DevTools Network tab (XHR/Fetch panel) | Visible — response body readable | Not shown — browser navigation responses bypass the XHR panel |
| JavaScript variables | response.data stored in memory, accessible to XSS |
No JS variable holds the data — response goes directly to download manager |
| Browser extensions / interceptors | Can hook XMLHttpRequest and fetch |
Cannot intercept browser-native navigation responses |
| Axios interceptors / Pinia devtools | Captured and logged | Not triggered — request is outside the app's HTTP client |
-
Normal API interaction (XHR/fetch) —
GET /admin/tokensreturnsTokenResponsewith onlykey_prefix: "sk-ant-api03". The full key value is never included in any JSON response. -
CSV export (
window.open()) — OnlyGET /admin/tokens/export/keyscallsdecrypt_token()to retrieve plaintext values. The response is aContent-Disposition: attachmentfile download that the browser saves directly to disk without passing through the JavaScript runtime.
| Measure | Implementation |
|---|---|
| Authentication | JWT from ?token= query param or Authorization header |
| Authorization | super_admin role required (403 for all other roles) |
| Transport | HTTPS encrypts the URL (including the JWT query param) in transit |
| Caching | Cache-Control: no-store prevents browser/proxy caching |
| Token expiry | JWT has a short lifetime; even if URL is captured from browser history, it expires quickly |
| No CORS | File download is same-origin navigation; no cross-origin access possible |
backend/app/api/admin/endpoints/tokens.py—export_keys_csv()endpointfrontend/src/pages/DashboardPage.vue—exportKeys()function usingwindow.open()
Refresh tokens (JWT strings) are hashed with PBKDF2-HMAC-SHA256 using 100,000 iterations before storage. This was increased from 1 iteration to 100,000 to provide meaningful resistance against offline brute-force attacks if the database is compromised.
The hash is deterministic and keyed with JWT_SECRET_KEY, allowing efficient lookup while preventing offline attacks without the server secret.
Refresh token rotation (issuing a new refresh token and invalidating the old one) uses row-level locking (SELECT ... FOR UPDATE) to prevent race conditions. Without this, concurrent requests using the same refresh token could both succeed, creating duplicate active tokens and breaking the rotation chain.
The locking ensures that only one concurrent request can rotate a given refresh token — subsequent requests using the same token see it as already consumed and are rejected.
backend/app/core/security.py—hash_refresh_token()backend/app/api/admin/endpoints/auth.py— Refresh endpoint with row-level locking
The project migrated from python-jose to PyJWT for JWT encoding and decoding. The primary reasons:
| Aspect | python-jose | PyJWT |
|---|---|---|
| Maintenance | Largely unmaintained | Actively maintained |
| Security | Known ecdsa timing attack vulnerability | No known vulnerabilities |
| Compatibility | Lagging behind Python versions | Up-to-date |
The migration was a drop-in replacement at the API level. The project enforces an algorithm whitelist (HS256, HS384, HS512) to prevent algorithm confusion attacks, regardless of library.
backend/app/core/security.py— JWT encoding/decoding with PyJWT (import jwt)
The system implements two-tier RBAC via the role and permissions fields on the User model:
| Role | Access |
|---|---|
super_admin |
Full access to all endpoints and resources, bypasses all permission checks |
admin |
Scoped access controlled by the permissions JSON object |
Admin users have a permissions JSON field that controls access to specific resource types:
| Permission | Values | Controls |
|---|---|---|
manage_api_keys |
true/"all", [id, ...], false |
API token CRUD, alert rules/notifications |
manage_teams |
true/"all", [id, ...], false |
Team CRUD, member management |
manage_models |
true/"all", [id, ...], false |
Model configuration |
view_usage |
true/false |
Usage statistics |
view_monitor |
true/false |
Pricing table |
Permission value rules:
trueor"all": full access to all resources of that type- Array of UUIDs: access only to those specific resources (only
manage_api_keys,manage_teams,manage_modelssupport resource-level scoping) falseor missing: no access (403)- Special case: when
permissionsisnullor empty{}, the admin has full access to all business operations (no restrictions applied). Once any key is set inpermissions, unset keys are treated as denied
| Feature | Endpoint | Description |
|---|---|---|
| User management | /admin/users/* |
Invite, edit, deactivate admin accounts |
| Entra group mappings | /admin/entra-groups/* |
Configure group-to-role mappings |
| Audit logs | /admin/audit |
View all operation audit records and summaries |
| Pricing management | /admin/pricing/* |
Manually update model pricing, query pricing details |
| Observability | /admin/observability |
View and modify OpenTelemetry configuration |
| Key export | GET /admin/tokens/export/keys |
Export all API keys as CSV in plaintext |
| Feature | Endpoint | Permission Key | Resource Scoping |
|---|---|---|---|
| API Key CRUD | /admin/tokens/* |
manage_api_keys |
Yes (scoped to specific keys) |
| Key balance/revoke | /admin/tokens/{id}/adjust, /revoke |
manage_api_keys |
Yes |
| Alert rules | /admin/alerts/rules/* |
manage_api_keys |
— |
| Alert notifications | /admin/alerts/notifications/* |
manage_api_keys |
— |
| Team CRUD | /admin/teams/* |
manage_teams |
Yes (scoped to specific teams) |
| Team members | /admin/teams/{id}/members/* |
manage_teams |
Yes |
| Model config | /admin/models/* |
manage_models |
Yes |
| Usage stats | /admin/usage/* |
view_usage |
— |
| Pricing table | /admin/monitor/pricing-table |
view_monitor |
— |
| Feature | Endpoint | Description |
|---|---|---|
| Dashboard | / |
Overview page |
| Activity | /admin/audit-logs/activity |
Recent admin operations |
| Playground | /playground |
Conversation testing |
| Settings | /settings |
Personal account settings |
When KBR_MICROSOFT_ENABLE_GROUP_SYNC=true, user roles and permissions are resolved from Azure AD security group membership instead of being set manually. See OAuth Setup — Entra ID Group Sync for configuration details.
| Aspect | Cognito Mode | Entra ID Group Sync Mode |
|---|---|---|
| User provisioning | Manual invite by super_admin | Automatic via group membership |
| Permission assignment | Per-user in Admin Users page | Per-group in Entra Groups page |
| Permission updates | Manual edit | Automatic on next login (overwrites) |
| Role upgrade/downgrade | Manual edit | Automatic — determined by highest-priority group mapping |
| Deprovisioning | Manual deactivation | Remove from all mapped Azure groups (denied on next login) |
| UI role editing | Allowed | Disabled for Microsoft users (tooltip explains) |
Microsoft OAuth callback
│
├─ No group mappings exist AND no Microsoft users in DB?
│ → Bootstrap: grant super_admin to this first user
│
├─ No group mappings exist BUT Microsoft users already exist?
│ → Reject with 403 "Group mappings not configured"
│
├─ Group mappings exist → call Graph API /me/memberOf
│ │
│ ├─ Graph API error (network, 401, 429, 500)?
│ │ → Fail closed: reject with 503 "Unable to verify group membership"
│ │
│ ├─ User groups match a mapping?
│ │ → Use the highest-priority mapping's role + permissions
│ │
│ └─ User groups don't match any mapping?
│ → Reject with 403 "Not authorized"
│
└─ Apply resolved role/permissions to user record (overwrite on every login)
-
Fail closed on Graph API errors — If the Graph API call to
/me/memberOffails for any reason, login is rejected with HTTP 503. This prevents stale elevated permissions from persisting when group membership cannot be verified. -
Bootstrap is single-use — Only the very first Microsoft user (when no Microsoft users exist in the DB AND no group mappings are configured) receives automatic
super_admin. All subsequent users are rejected until the first admin creates group mappings via the Entra Groups page. -
Every login syncs permissions — The role and permissions are overwritten on every successful login based on the current group mapping. If a user is moved between Entra groups, their KBP permissions update on next login. There is no way to manually override a Microsoft user's role when group sync is active.
-
Cognito users are unaffected — Group sync only applies to Microsoft OAuth users. Cognito users continue to be managed manually via the Admin Users page. The two auth methods coexist independently.
-
Priority-based resolution — When a user belongs to multiple mapped groups, the mapping with the highest
priorityvalue wins. This allows fine-grained control (e.g., "Engineering" group = admin with limited permissions at priority 5, "Platform Team" = super_admin at priority 10).
The sidebar dynamically filters menu items based on permissions. Unauthorized features are hidden:
| Menu Item | Visible When |
|---|---|
| Dashboard | All admins |
| Teams | manage_teams permission |
| API Keys | manage_api_keys permission |
| Models | manage_models permission |
| Playground | All admins |
| Monitor | view_monitor permission |
| Activity | All admins |
| Admin Users | Super Admin only |
| Entra Groups | Super Admin only (group sync mode) |
| Settings | All admins |
| Endpoint Group | Required Permission |
|---|---|
/admin/tokens/* |
manage_api_keys |
/admin/teams/* |
manage_teams |
/admin/models/* |
manage_models |
/admin/alerts/* |
manage_api_keys |
/admin/usage/* |
view_usage |
/admin/monitor/* |
view_monitor |
/admin/users/* |
super_admin role only |
/admin/entra-groups/* |
super_admin role only |
/admin/pricing/* |
super_admin role only |
/admin/observability |
super_admin role only |
/admin/audit |
super_admin role only |
/admin/audit-logs/activity |
Any admin |
backend/app/models/user.py—UserRoleenum,roleandpermissionscolumnsbackend/app/api/deps.py—require_permission(),check_resource_scope(),get_allowed_resource_ids()
| File | Responsibility |
|---|---|
iac/modules/waf/main.tf |
AWS WAF WebACL definition (rate limiting + managed rules) and ALB association |
iac/modules/waf/data.tf |
ALB auto-discovery by name |
iac/modules/waf/variables.tf |
WAF configuration variables (rate limits, ALB names) |
backend/app/middleware/security.py |
SecurityMiddleware (Origin/Referer/custom header validation + security response headers) |
backend/main.py |
CORS middleware configuration, SecurityMiddleware registration |
backend/app/core/config.py |
ALLOWED_ORIGINS configuration and production validation |
backend/app/core/security.py |
JWT/API Token generation, SHA256 hashing (API tokens), PBKDF2 hashing (refresh tokens), verification, Fernet encryption |
backend/app/services/token.py |
API Token CRUD service (hash + encrypt on create, hash lookup on validate) |
backend/app/services/token_cache.py |
Cached token validation by hash |
backend/app/services/oauth.py |
OAuth State generation and validation (CSRF + PKCE) |
backend/app/models/oauth_state.py |
OAuth State database model (10-minute expiry, one-time use, PKCE code_verifier) |
backend/app/core/cookies.py |
HttpOnly cookie utilities for refresh token storage |
backend/app/api/admin/endpoints/auth.py |
OAuth login and callback endpoints (PKCE + cookie handling) |
frontend/src/boot/axios.ts |
Axios global config (X-Requested-With header, withCredentials: true) |