redpanda-data · micheleRP · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026
@@ -24,6 +24,37 @@
 ** xref:get-started:cluster-types/create-dedicated-cloud-cluster.adoc[]
 
 * xref:ai-agents:index.adoc[Agentic AI]
+** xref:ai-agents:ai-gateway/index.adoc[AI Gateway]
+*** xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[Overview]
+*** xref:ai-agents:ai-gateway/gateway-quickstart.adoc[Quickstart]
+*** xref:ai-agents:ai-gateway/gateway-architecture.adoc[Architecture]
+*** For Administrators
+**** xref:ai-agents:ai-gateway/admin/setup-guide.adoc[Setup Guide]
+*** For Builders
+**** xref:ai-agents:ai-gateway/builders/discover-gateways.adoc[Discover Gateways]
+**** xref:ai-agents:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent]
+**** xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Patterns]
+**** xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[MCP Aggregation]
+*** Observability
+**** xref:ai-agents:ai-gateway/observability-logs.adoc[Request Logs]
+**** xref:ai-agents:ai-gateway/observability-metrics.adoc[Metrics and Analytics]
+*** xref:ai-agents:ai-gateway/migration-guide.adoc[Migrate]
+*** xref:ai-agents:ai-gateway/integrations/index.adoc[Integrations]
+**** Claude Code
+***** xref:ai-agents:ai-gateway/integrations/claude-code-admin.adoc[Admin Guide]
+***** xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[User Guide]
+**** Cline
+***** xref:ai-agents:ai-gateway/integrations/cline-admin.adoc[Admin Guide]
+***** xref:ai-agents:ai-gateway/integrations/cline-user.adoc[User Guide]
+**** Continue.dev
+***** xref:ai-agents:ai-gateway/integrations/continue-admin.adoc[Admin Guide]
+***** xref:ai-agents:ai-gateway/integrations/continue-user.adoc[User Guide]
+**** Cursor IDE
+***** xref:ai-agents:ai-gateway/integrations/cursor-admin.adoc[Admin Guide]
+***** xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[User Guide]
+**** GitHub Copilot
+***** xref:ai-agents:ai-gateway/integrations/github-copilot-admin.adoc[Admin Guide]
+***** xref:ai-agents:ai-gateway/integrations/github-copilot-user.adoc[User Guide]
 ** xref:ai-agents:mcp/index.adoc[MCP]
 *** xref:ai-agents:mcp/overview.adoc[MCP Overview]
 *** xref:ai-agents:mcp/remote/index.adoc[Remote MCP]
@@ -38,6 +69,7 @@
 ***** xref:ai-agents:mcp/remote/manage-servers.adoc[Manage Servers]
 ***** xref:ai-agents:mcp/remote/scale-resources.adoc[Scale Resources]
 ***** xref:ai-agents:mcp/remote/monitor-activity.adoc[Monitor Activity]
+**** xref:ai-agents:mcp/remote/pipeline-patterns.adoc[MCP Server Patterns]
 *** xref:ai-agents:mcp/local/index.adoc[Redpanda Cloud Management MCP Server]
 **** xref:ai-agents:mcp/local/overview.adoc[Overview]
 **** xref:ai-agents:mcp/local/quickstart.adoc[Quickstart]

@@ -0,0 +1,326 @@
+= AI Gateway Setup Guide
+:description: Complete setup guide for administrators to enable providers, configure models, create gateways, and set up routing policies.
+:page-topic-type: how-to
+:personas: platform_admin
+:learning-objective-1: Enable LLM providers and models in the catalog
+:learning-objective-2: Create and configure gateways with routing policies, rate limits, and spend limits
+:learning-objective-3: Set up MCP tool aggregation for AI agents
+
+include::ai-agents:partial$ai-gateway-byoc-note.adoc[]
+
+This guide walks administrators through the complete setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.
+
+After completing this guide, you will be able to:
+
+* [ ] Enable LLM providers and models in the catalog
+* [ ] Create and configure gateways with routing policies, rate limits, and spend limits
+* [ ] Set up MCP tool aggregation for AI agents
+
+== Prerequisites
+
+* Access to the Redpanda Cloud Console with administrator privileges
+* API keys for at least one LLM provider (OpenAI or Anthropic)
+* (Optional) MCP server endpoints if you plan to use tool aggregation
+
+== Enable a provider
+
+Providers represent upstream services (Anthropic, OpenAI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator.
+
+. In the Redpanda Cloud Console, navigate to *AI Gateway* → *Providers*.
+. Select a provider (for example, Anthropic or OpenAI).
+. On the *Configuration* tab for the provider, click *Add configuration*.
+. Enter your API Key for the provider.
++
+TIP: Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy.
+
+. Click *Save* to enable the provider.
+
+Repeat this process for each LLM provider you want to make available through AI Gateway.
+
+== Enable models
+
+The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models.
+
+The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases.
+
+. Navigate to *AI Gateway* → *Models*.
+. Review the list of available models from enabled providers.
+. For each model you want to expose through gateways, toggle it to *Enabled*.
++
+Common models to enable:
++
+--
+* `openai/gpt-4o` - OpenAI's most capable model
+* `openai/gpt-4o-mini` - Cost-effective OpenAI model
+* `anthropic/claude-sonnet-3.5` - Balanced Anthropic model
+* `anthropic/claude-opus-4` - Anthropic's most capable model
+--
+
+. Click *Save changes*.
+
+Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways.
+
+=== Model naming convention
+
+Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider.
+
+Examples:
+
+* `openai/gpt-4o`
+* `anthropic/claude-sonnet-3.5`
+* `openai/gpt-4o-mini`
+
+== Create a gateway
+
+A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It's a "virtual gateway" that you can create per team, environment (staging/production), product, or customer.
+
+. Navigate to *AI Gateway* → *Gateways*.
+. Click *Create Gateway*.
+. Configure the gateway:
++
+--
+* *Name*: Choose a descriptive name (for example, `production-gateway`, `team-ml-gateway`, `staging-gateway`)
+* *Workspace*: Select the workspace this gateway belongs to
++
+TIP: A workspace is conceptually similar to a resource group in Redpanda streaming.
++
+* *Description* (optional): Add context about this gateway's purpose
+* *Tags* (optional): Add metadata for organization and filtering
+--
+
+. Click *Create*.
+
+. After creation, note the following information:
++
+--
+* *Gateway ID*: Unique identifier (for example, `gw_abc123`) - users include this in the `rp-aigw-id` header
+* *Gateway Endpoint*: Base URL for API requests (for example, `https://gw.ai.panda.com`)
+--
+
+You'll share the Gateway ID and Endpoint with users who need to access this gateway.
+
+== Configure LLM routing
+
+On the gateway details page, select the *LLM* tab to configure rate limits, spend limits, routing, and provider pools with fallback options.
+
+The LLM routing pipeline visually represents the request lifecycle:
+
+. *Rate Limit*: Global rate limit (for example, 100 requests/second)
+. *Spend Limit / Monthly Budget*: Monthly budget with blocking enforcement (for example, $15K/month)
+. *Routing*: Primary provider pool with optional fallback provider pools
+
+=== Configure rate limits
+
+Rate limits control how many requests can be processed within a time window.
+
+. In the *LLM* tab, locate the *Rate Limit* section.
+. Click *Add rate limit*.
+. Configure the limit:
++
+--
+* *Requests per second*: Maximum requests per second (for example, `100`)
+* *Burst allowance* (optional): Allow temporary bursts above the limit
+--
+
+. Click *Save*.
+
+Rate limits apply to all requests through this gateway, regardless of model or provider.
+
+=== Configure spend limits and budgets
+
+Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded.
+
+. In the *LLM* tab, locate the *Spend Limit* section.
+. Click *Configure budget*.
+. Set the budget:
++
+--
+* *Monthly budget*: Maximum spend per month (for example, `$15000`)
+* *Enforcement*: Choose *Block* to reject requests after the budget is exceeded, or *Alert* to notify but allow requests
+* *Notification threshold* (optional): Alert when X% of budget is consumed (for example, `80%`)
+--
+
+. Click *Save*.
+
+Budget tracking uses estimated costs based on token usage and public provider pricing.
+
+=== Configure routing and provider pools
+
+Provider pools define which LLM providers handle requests, with support for primary and fallback configurations.
+
+. In the *LLM* tab, locate the *Routing* section.
+. Click *Add provider pool*.
+. Configure the primary pool:
++
+--
+* *Name*: For example, `primary-anthropic`
+* *Providers*: Select one or more providers (for example, Anthropic)
+* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-3.5`)
+* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.)
+--
+
+. (Optional) Click *Add fallback pool* to configure automatic failover:
++
+--
+* *Name*: For example, `fallback-openai`
+* *Providers*: Select fallback provider (for example, OpenAI)
+* *Models*: Choose fallback models (for example, `openai/gpt-4o`)
+* *Trigger conditions*: When to activate fallback:
+  ** Rate limit exceeded (429 from primary)
+  ** Timeout (primary provider slow)
+  ** Server errors (5xx from primary)
+--
+
+. Configure routing rules using CEL expressions (optional):
++
+For simple routing, select *Route all requests to primary pool*.
++
+For advanced routing based on request properties, use CEL expressions. See xref:ai-gateway/cel-routing-cookbook.adoc[] for examples.
++
+Example CEL expression for tier-based routing:
++
+[source,cel]
+----
+request.headers["x-user-tier"] == "premium"
+  ? "anthropic/claude-opus-4"
+  : "anthropic/claude-sonnet-3.5"
+----
+
+. Click *Save routing configuration*.
+
+TIP: Provider pool (UI) = Backend pool (API)
+
+=== Load balancing and multi-provider distribution
+
+If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance:
+
+* *Round-robin*: Distribute evenly across all providers
+* *Weighted*: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)
+* *Least latency*: Route to fastest provider based on recent performance
+* *Cost-optimized*: Route to cheapest provider for each model
+
+== Configure MCP tools (optional)
+
+If your users will build AI agents that need access to tools via MCP (Model Context Protocol), configure MCP tool aggregation.
+
+On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple MCP servers, allowing agents to find and call tools through a single endpoint.
+
+=== Add MCP servers
+
+. In the *MCP* tab, click *Add MCP server*.
+. Configure the server:
++
+--
+* *Server name*: Human-readable identifier (for example, `database-server`, `slack-server`)
+* *Server URL*: Endpoint for the MCP server (for example, `https://mcp-database.example.com`)
+* *Authentication*: Configure authentication if required (bearer token, API key, mTLS)
+* *Enabled tools*: Select which tools from this server to expose (or *All tools*)
+--
+
+. Click *Test connection* to verify connectivity.
+. Click *Save* to add the server to this gateway.
+
+Repeat for each MCP server you want to aggregate.
+
+=== Configure deferred tool loading
+
+Deferred tool loading dramatically reduces token costs by initially exposing only a search tool and orchestrator, rather than listing all available tools.
+
+. In the *MCP* tab, locate *Deferred Loading*.
+. Toggle *Enable deferred tool loading* to *On*.
+. Configure behavior:
++
+--
+* *Initially expose*: Search tool + orchestrator only
+* *Load on demand*: Tools are retrieved when agents query for them
+* *Token savings*: Expect 80-90% reduction in token usage for tool definitions
+--
+
+. Click *Save*.
+
+See xref:ai-gateway/mcp-aggregation-guide.adoc[] for detailed information about MCP aggregation.
+
+=== Configure the MCP orchestrator
+
+The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips.
+
+Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator.
+
+The orchestrator is enabled by default when you enable MCP tools. You can configure:
+
+* *Execution timeout*: Maximum time for orchestrator workflows (for example, 30 seconds)
+* *Memory limit*: Maximum memory for JavaScript execution (for example, 128MB)
+* *Allowed operations*: Restrict which MCP tools can be called from orchestrator workflows
+
+== Verify your setup
+
+After completing the setup, verify that the gateway is working correctly:
+
+=== Test the gateway endpoint
+
+[source,bash]
+----
+curl https://{GATEWAY_ENDPOINT}/v1/models \
+  -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
+  -H "rp-aigw-id: ${GATEWAY_ID}"
+----
+
+Expected result: List of enabled models.
+
+=== Send a test request
+
+[source,bash]
+----
+curl https://{GATEWAY_ENDPOINT}/v1/chat/completions \
+  -H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
+  -H "rp-aigw-id: ${GATEWAY_ID}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
+    "max_tokens": 50
+  }'
+----
+
+Expected result: Successful completion response.
+
+=== Check observability
+
+. Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Analytics*.
+. Verify that your test request appears in the request logs.
+. Check metrics:
++
+--
+* Request count: Should show your test request
+* Token usage: Should show tokens consumed
+* Estimated cost: Should show calculated cost
+--
+
+== Share access with users
+
+Now that your gateway is configured, share access with users (builders):
+
+. Provide the *Gateway ID* (for example, `gw_abc123`)
+. Provide the *Gateway Endpoint* (for example, `https://gw.ai.panda.com`)
+. Share API credentials (Redpanda Cloud tokens with appropriate permissions)
+. (Optional) Document available models and any routing policies
+. (Optional) Share rate limits and budget information
+
+Users can then discover and connect to the gateway using the information provided. See xref:ai-gateway/builders/discover-gateways.adoc[] for user documentation.
+
+== Next steps
+
+*Configure and optimize:*
+
+// * xref:ai-gateway/admin/manage-gateways.adoc[Manage Gateways] - List, edit, and delete gateways
+* xref:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Cookbook] - Advanced routing patterns
+// * xref:ai-gateway/admin/networking-configuration.adoc[Networking Configuration] - Configure private endpoints and connectivity
+
+*Monitor and observe:*
+
+* xref:ai-gateway/observability-metrics.adoc[Monitor Usage] - Track costs and usage across all gateways
+* xref:ai-gateway/observability-logs.adoc[Request Logs] - View and filter request logs
+
+*Integrate tools:*
+
+* xref:ai-gateway/integrations/index.adoc[Integrations] - Admin guides for Claude Code, Cursor, and other tools