Skip to content

Prism Gateway docs rewrite - Phase 1 & 2#553

Open
NVJKKartik wants to merge 26 commits intoastrofrom
gateway-docs
Open

Prism Gateway docs rewrite - Phase 1 & 2#553
NVJKKartik wants to merge 26 commits intoastrofrom
gateway-docs

Conversation

@NVJKKartik
Copy link
Copy Markdown
Contributor

Summary

First batch of the Prism Gateway docs rewrite (6 pages out of 41 planned). Covers foundation pages and core API.

Pages rewritten/added

  1. How it works (rewrite) - full plugin pipeline with exact priority numbers, cache short-circuit mechanics, multi-tenancy, config hierarchy, hot-reload
  2. Supported providers (update) - 19 cloud + 5 self-hosted providers, 4-tab strategy (Prism SDK | OpenAI SDK | LiteLLM | cURL)
  3. Endpoints overview (new) - all 97 API endpoints across 20+ categories
  4. Chat completions (new) - primary endpoint with streaming, function calling (full 2-turn), vision, structured outputs. Absorbs streaming.mdx content.
  5. Quickstart (rewrite) - OpenAI SDK first ("change 2 lines"), all 4 tabs, tested against live gateway
  6. Virtual keys & access control (new) - key properties, BYOK vs managed, admin API, RBAC, IP ACL (3 layers), access groups

Structural changes

  • Sidebar restructured: Concepts, Providers, API Reference, Routing, Safety & Policy, Performance, Cost & Observability, Agentic, Deployment
  • All code examples tested against live gateway (gateway.futureagi.com with Gemini models)
  • LiteLLM requires /v1 in base_url (discovered during testing, fixed everywhere)
  • Every page passed docs-architect review + humanizer review

What's next

35 more pages across Phases 3-6. Tracked in Notion.

Test plan

  • Verify all 6 pages render correctly
  • Verify new sidebar structure shows correct groups
  • Click all Card links to confirm no 404s
  • Spot-check code examples (basic completion, streaming, function calling)

- Complete pipeline with exact priority numbers (10-999)
- Pre-request vs post-response split with sequential/parallel distinction
- Cache short-circuit mechanics (exact vs semantic hit behavior)
- Virtual key properties (RBAC, BYOK, credits, IP restrictions, etc.)
- Multi-tenancy isolation model
- Config hierarchy: request headers > key > org > global
- Hot-reload with SHA-256 change detection and Redis pub/sub
- Streaming behavior (pre-plugins → stream → post-plugins after final chunk)
- Add Hugging Face, Anyscale, Replicate to cloud providers table
- Add LocalAI to self-hosted providers table
- New tab strategy: Prism SDK | OpenAI SDK | LiteLLM | cURL for inference
- Dashboard | Python | TypeScript for config/management
- Provider health section with circuit breaker flow
- Fix model name to real Anthropic model ID
- New page: /docs/prism/api/endpoints with all 97 endpoints across 20+ categories
- Restructure sidebar: Concepts, Providers, API Reference, Routing, Safety & Policy,
  Performance, Cost & Observability, Agentic, Deployment
- Add Quickstart to top-level nav
- Rename "Core Concepts" to "How it works", "Manage Providers" to "Supported providers"
- Nav entries only added for pages that exist (incremental approach)
- New page: /docs/prism/api/chat with 4-tab examples (Prism SDK, OpenAI SDK,
  LiteLLM, cURL) for basic, streaming, function calling (full 2-turn), and vision
- Request/response body schemas, SSE streaming format, response headers table
- Fix LiteLLM base_url: needs /v1 suffix (tested against live gateway)
- All code examples tested against gateway.futureagi.com with Gemini models
- Add Chat completions to nav under API Reference
- Lead with OpenAI SDK base_url swap (2-line change)
- Add LiteLLM tab to all examples
- Response headers example using with_raw_response (tested against live gateway)
- Remove error responses section (moves to error handling guide)
- Remove Prism SDK install as step 1 - framework note at bottom instead
- Provider switching example with OpenAI/Anthropic/Gemini
- Key properties table with all fields from actual APIKey struct
- BYOK vs managed key types with credit balance
- Admin API examples (create, list, revoke, add credits) matching registered routes
- Per-key guardrail overrides with YAML config example
- RBAC: roles, teams, wildcard permissions, resolution order with concrete metadata example
- IP ACL: 3 layers (global, per-org, per-key) with config/API examples
- Access groups for logical model grouping
- All architect review fixes applied
@NVJKKartik NVJKKartik requested a review from hadarishav April 2, 2026 12:29
…pattern

Move topic-domain groups (Providers, API Reference, Routing, Safety, Performance,
Cost & Observability, Agentic) into sub-groups under Features. Keeps standard
Overview → Quickstart → Concepts → Features → Deployment structure matching
all other product sections in the docs.
…aching

Routing:
- Add complexity-based routing (8 scoring signals, tier mapping)
- Add provider lock (sticky routing via header)
- Add adaptive strategy details (learning phase, weight smoothing)
- Add race/fastest strategy config (max_concurrent, cancel_delay, billing warning)
- Fix model names to latest (claude-sonnet-4-6)
- Update tabs to Dashboard | Python (Prism SDK) | TypeScript (Prism SDK)

Caching:
- Fix duplicate About section and duplicate cache modes section
- Clean structure: About, When to use, Config, Namespaces, Per-request control, Backends
- Add namespace header example and per-request tabs (Prism SDK, OpenAI SDK, cURL)
- Clarify exact vs semantic hit cost behavior
…credits

- Fix wrong claim "per-key not supported" - per-key RPM/TPM is fully supported
- Add 3-level rate limiting (global, per-org, per-key)
- Add budgets section (daily/weekly/monthly/total, hard/soft limits)
- Add managed key credits (USD balance, auto-deduction, add credits API)
- 4-tab examples for retry logic
- Update nav title to "Rate limiting & budgets"
…rategy

- Routing: remove em dash, drop "maximize", deduplicate circuit breaker prose
- Caching: replace "cross-contamination" with plainer phrasing
- Rate limiting: replace config.yaml tab with TypeScript (Prism SDK) per tab strategy,
  move YAML to separate block below tabs
…ross-links

- Update all tab labels to match strategy (Dashboard | Python (Prism SDK) | TypeScript (Prism SDK))
- Add fail-open vs fail-closed explanation after enforcement modes
- Update Next steps cards with relevant cross-links
- Remove screenshot references to non-existent dashboard images
New page: api/headers.mdx - complete reference for all x-prism-* request
and response headers, response.prism SDK accessors, create_headers() usage.

Updated: concepts/configuration.mdx - added config hierarchy explanation,
model mapping section, GatewayConfig reference table, standard tabs
(Dashboard/Python/TypeScript), self-hosted YAML examples.

Nav: added headers page under API Reference.
New page: api/embeddings.mdx - embeddings endpoint with 4-tab examples
(Prism SDK, OpenAI SDK, LiteLLM, cURL), batch embeddings, reduced
dimensions, encoding format. Reranking endpoint with Prism SDK and
cURL examples, parameters table, response format. RAG pipeline example
showing embed → search → rerank → generate flow. Caching section.

Nav: added under API Reference.
New page: api/media.mdx - TTS, speech-to-text, audio translation, and
image generation. All sections have 3-tab examples (Prism SDK, OpenAI
SDK, cURL). Parameter tables, supported models, response formats.

LiteLLM tabs intentionally omitted - audio/image support is inconsistent.
New page: api/assistants.mdx - full OpenAI Assistants API proxy docs.
Covers assistants, threads, messages, runs with endpoint tables.
Examples: quick start flow, tool use with submit_tool_outputs, file
search with vector stores, streaming runs. Notes on what Prism adds
(cost tracking, rate limiting, logging) and limitations (no routing/
failover since threads are stored on OpenAI).
New page: api/files.mdx - file upload/list/delete, vector store CRUD,
batch file uploads, vector store search, file type reference. All
examples use OpenAI SDK since files are stored on OpenAI's servers.
New page: api/async-batch.mdx - async inference with polling, scheduled
completions, OpenAI Batch API with JSONL input/output. Decision table
for sync vs async vs scheduled vs batch.
Fixed metadata format (JSON, not key=value). Standardized tabs to
Prism SDK | OpenAI SDK | cURL for inference, Dashboard | Python | TS
for config. Added response.prism.cost accessor, client.current_cost,
SDK analytics methods. Fixed heading casing. Cross-linked to rate
limiting page for budget enforcement.
Fixed model names (claude-sonnet-4-6), lowercase headings, removed
card icons, tightened limitations section, restructured config sections.
Removed card icons, updated cross-links to include virtual keys and
endpoints pages. Content was already comprehensive.
New: guides/errors.mdx - error format, HTTP status codes, common errors
with fixes, retry strategies (Prism SDK, OpenAI SDK, manual), SDK
exception hierarchy, retry decision table.

New: guides/troubleshooting.mdx - debug checklist using x-prism-*
headers, step-by-step fixes for model not found, provider 404, slow
responses, cache misses, guardrail blocks, rate limits, cost issues,
failover problems.

Nav: added Guides section with both pages.
New: features/observability.mdx - request logging, distributed tracing,
Prometheus metrics table, OpenTelemetry config, session tracking.

New: features/self-hosted-models.mdx - Ollama, vLLM, LM Studio config,
hybrid routing patterns (cost-based, failover, complexity-based).

New: admin/organizations.mdx - org settings, member roles, API key
management, multi-tenancy patterns.

Updated: deployment/self-hosted.mdx - fixed title case, model names,
api_key required note for self-hosted providers, private repo note.

Nav: added Self-hosted models, Observability, Admin section.
…e_url

observability.mdx: Added Prism SDK | OpenAI SDK | cURL tabs for tracing,
added user_id and response.prism accessors, concrete session example.

assistants.mdx: Moved routing/failover caveat to top About section,
added variable context comment in streaming snippet.

organizations.mdx: Added control_plane_url explanation comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant