Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
021990a
DOC-1867 AI Gateway
micheleRP Jan 6, 2026
5a9cc4b
add questions for reviewers
micheleRP Jan 6, 2026
8188598
incorporate review comments
micheleRP Jan 6, 2026
c072cc0
configure AI Gateway LLM & MCP endpoints in Claude Code & similar tools
micheleRP Jan 6, 2026
1588b36
update Claude Code example + index page
micheleRP Jan 6, 2026
9c75742
revert nav title
micheleRP Jan 9, 2026
fce266b
edits
micheleRP Jan 11, 2026
85257c9
Add comprehensive AI Gateway documentation partials
micheleRP Jan 11, 2026
f2c87a3
clean up cc drafts
micheleRP Jan 12, 2026
f359f7b
cleanup
micheleRP Jan 12, 2026
8786f1a
cleanup
micheleRP Jan 12, 2026
49a573f
Apply documentation style guide to AI Gateway pages
micheleRP Jan 14, 2026
d3c6c23
Add AI Gateway client integration documentation
micheleRP Jan 15, 2026
07628a0
clean up
micheleRP Jan 16, 2026
2746f6b
cleanup overview and quickstarts
micheleRP Jan 20, 2026
8f264a9
cc audit
micheleRP Jan 20, 2026
77429a3
edit for user journey
micheleRP Jan 22, 2026
01d61b5
update nav
micheleRP Jan 22, 2026
acbbde8
update nav
micheleRP Jan 22, 2026
13e403a
Merge branch 'main' into DOC-1867-Document-feature-AI-Gateway-help-cl…
micheleRP Jan 22, 2026
492a50f
Refactor AI Gateway documentation for clarity and consistency
micheleRP Jan 23, 2026
02bb472
style edits
micheleRP Jan 23, 2026
21601b1
Convert learning objectives to standard format across all AI Gateway …
micheleRP Jan 23, 2026
84e5708
Fix persona metadata attribute to use correct standard
micheleRP Jan 23, 2026
2ab1f1c
Fix Anthropic model identifiers in gateway quickstart examples
micheleRP Jan 23, 2026
c1d3a5e
Clarify VS Code settings.json does not support native env var substit…
micheleRP Jan 23, 2026
ad4c882
Change Cursor MCP endpoint to use HTTPS instead of HTTP
micheleRP Jan 23, 2026
acc0c23
Fix inconsistent list markers in GitHub Copilot security section
micheleRP Jan 23, 2026
8742a94
Fix SDK examples in migration guide and remove placeholder notes
micheleRP Jan 24, 2026
6457cf4
Soften hard numeric claims with qualifiers across AI Gateway docs
micheleRP Jan 24, 2026
174199b
Convert persona restructuring plan from Markdown to AsciiDoc
micheleRP Jan 24, 2026
7621fdb
Remove unused ai-gateway.png image file
micheleRP Jan 24, 2026
273efa9
Remove persona restructuring planning document
micheleRP Jan 24, 2026
3de2abf
Update Claude Code config to match Anthropic MCP schema
micheleRP Jan 24, 2026
2d3c018
Add note about environment variable interpolation in mcpServers
micheleRP Jan 24, 2026
6c2b01c
Update Cline config to use official extension settings
micheleRP Jan 24, 2026
d73e572
Replace hardcoded token with env var in curl example
micheleRP Jan 24, 2026
1b32ae0
Update Continue.dev config to current YAML standard
micheleRP Jan 24, 2026
c34b122
Fix Continue.dev environment variable interpolation syntax
micheleRP Jan 24, 2026
48a108b
Fix Cursor settings.json environment variable handling
micheleRP Jan 24, 2026
56782ec
Fix GitHub Copilot configuration settings and flow
micheleRP Jan 24, 2026
a034c9c
Remove duplicate rp-aigw-id from LlamaIndex example
micheleRP Jan 24, 2026
930b261
Fix Continue.dev config file references to use YAML
micheleRP Jan 24, 2026
3a4d026
Fix Continue.dev MCP server schema in config.yaml examples
micheleRP Jan 24, 2026
d0fdb50
Fix unsupported ${VAR} interpolation in Continue.dev project config
micheleRP Jan 24, 2026
ac4d51d
Improve Continue.dev documentation clarity and compliance
micheleRP Jan 24, 2026
f754065
Remove checkboxes and periods from learning objectives across all int…
micheleRP Jan 24, 2026
8ce3de8
Revert "Remove checkboxes and periods from learning objectives across…
micheleRP Jan 24, 2026
44c4528
update nav
micheleRP Jan 24, 2026
000b819
AWS only
micheleRP Jan 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,37 @@
** xref:get-started:cluster-types/create-dedicated-cloud-cluster.adoc[]

* xref:ai-agents:index.adoc[Agentic AI]
** xref:ai-agents:ai-gateway/index.adoc[AI Gateway]
*** xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[Overview]
*** xref:ai-agents:ai-gateway/gateway-quickstart.adoc[Quickstart]
*** xref:ai-agents:ai-gateway/gateway-architecture.adoc[Architecture]
*** For Administrators
**** xref:ai-agents:ai-gateway/admin/setup-guide.adoc[Setup Guide]
*** For Builders
**** xref:ai-agents:ai-gateway/builders/discover-gateways.adoc[Discover Gateways]
**** xref:ai-agents:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent]
**** xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Patterns]
**** xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[MCP Aggregation]
*** Observability
**** xref:ai-agents:ai-gateway/observability-logs.adoc[Request Logs]
**** xref:ai-agents:ai-gateway/observability-metrics.adoc[Metrics and Analytics]
*** xref:ai-agents:ai-gateway/migration-guide.adoc[Migrate]
*** xref:ai-agents:ai-gateway/integrations/index.adoc[Integrations]
**** Claude Code
***** xref:ai-agents:ai-gateway/integrations/claude-code-admin.adoc[Admin Guide]
***** xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[User Guide]
**** Cline
***** xref:ai-agents:ai-gateway/integrations/cline-admin.adoc[Admin Guide]
***** xref:ai-agents:ai-gateway/integrations/cline-user.adoc[User Guide]
**** Continue.dev
***** xref:ai-agents:ai-gateway/integrations/continue-admin.adoc[Admin Guide]
***** xref:ai-agents:ai-gateway/integrations/continue-user.adoc[User Guide]
**** Cursor IDE
***** xref:ai-agents:ai-gateway/integrations/cursor-admin.adoc[Admin Guide]
***** xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[User Guide]
**** GitHub Copilot
***** xref:ai-agents:ai-gateway/integrations/github-copilot-admin.adoc[Admin Guide]
***** xref:ai-agents:ai-gateway/integrations/github-copilot-user.adoc[User Guide]
** xref:ai-agents:mcp/index.adoc[MCP]
*** xref:ai-agents:mcp/overview.adoc[MCP Overview]
*** xref:ai-agents:mcp/remote/index.adoc[Remote MCP]
Expand All @@ -38,6 +69,7 @@
***** xref:ai-agents:mcp/remote/manage-servers.adoc[Manage Servers]
***** xref:ai-agents:mcp/remote/scale-resources.adoc[Scale Resources]
***** xref:ai-agents:mcp/remote/monitor-activity.adoc[Monitor Activity]
**** xref:ai-agents:mcp/remote/pipeline-patterns.adoc[MCP Server Patterns]
*** xref:ai-agents:mcp/local/index.adoc[Redpanda Cloud Management MCP Server]
**** xref:ai-agents:mcp/local/overview.adoc[Overview]
**** xref:ai-agents:mcp/local/quickstart.adoc[Quickstart]
Expand Down
326 changes: 326 additions & 0 deletions modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
= AI Gateway Setup Guide
:description: Complete setup guide for administrators to enable providers, configure models, create gateways, and set up routing policies.
:page-topic-type: how-to
:personas: platform_admin
:learning-objective-1: Enable LLM providers and models in the catalog
:learning-objective-2: Create and configure gateways with routing policies, rate limits, and spend limits
:learning-objective-3: Set up MCP tool aggregation for AI agents

include::ai-agents:partial$ai-gateway-byoc-note.adoc[]

This guide walks administrators through the complete setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.

After completing this guide, you will be able to:

* [ ] Enable LLM providers and models in the catalog
* [ ] Create and configure gateways with routing policies, rate limits, and spend limits
* [ ] Set up MCP tool aggregation for AI agents

== Prerequisites

* Access to the Redpanda Cloud Console with administrator privileges
* API keys for at least one LLM provider (OpenAI or Anthropic)
* (Optional) MCP server endpoints if you plan to use tool aggregation

== Enable a provider

Providers represent upstream services (Anthropic, OpenAI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator.

. In the Redpanda Cloud Console, navigate to *AI Gateway* → *Providers*.
. Select a provider (for example, Anthropic or OpenAI).
. On the *Configuration* tab for the provider, click *Add configuration*.
. Enter your API Key for the provider.
+
TIP: Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy.

. Click *Save* to enable the provider.

Repeat this process for each LLM provider you want to make available through AI Gateway.

== Enable models

The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models.

The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases.

. Navigate to *AI Gateway* → *Models*.
. Review the list of available models from enabled providers.
. For each model you want to expose through gateways, toggle it to *Enabled*.
+
Common models to enable:
+
--
* `openai/gpt-4o` - OpenAI's most capable model
* `openai/gpt-4o-mini` - Cost-effective OpenAI model
* `anthropic/claude-sonnet-3.5` - Balanced Anthropic model
* `anthropic/claude-opus-4` - Anthropic's most capable model
--

. Click *Save changes*.

Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways.

=== Model naming convention

Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider.

Examples:

* `openai/gpt-4o`
* `anthropic/claude-sonnet-3.5`
* `openai/gpt-4o-mini`

== Create a gateway

A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It's a "virtual gateway" that you can create per team, environment (staging/production), product, or customer.

. Navigate to *AI Gateway* → *Gateways*.
. Click *Create Gateway*.
. Configure the gateway:
+
--
* *Name*: Choose a descriptive name (for example, `production-gateway`, `team-ml-gateway`, `staging-gateway`)
* *Workspace*: Select the workspace this gateway belongs to
+
TIP: A workspace is conceptually similar to a resource group in Redpanda streaming.
+
* *Description* (optional): Add context about this gateway's purpose
* *Tags* (optional): Add metadata for organization and filtering
--

. Click *Create*.

. After creation, note the following information:
+
--
* *Gateway ID*: Unique identifier (for example, `gw_abc123`) - users include this in the `rp-aigw-id` header
* *Gateway Endpoint*: Base URL for API requests (for example, `https://gw.ai.panda.com`)
--

You'll share the Gateway ID and Endpoint with users who need to access this gateway.

== Configure LLM routing

On the gateway details page, select the *LLM* tab to configure rate limits, spend limits, routing, and provider pools with fallback options.

The LLM routing pipeline visually represents the request lifecycle:

. *Rate Limit*: Global rate limit (for example, 100 requests/second)
. *Spend Limit / Monthly Budget*: Monthly budget with blocking enforcement (for example, $15K/month)
. *Routing*: Primary provider pool with optional fallback provider pools

=== Configure rate limits

Rate limits control how many requests can be processed within a time window.

. In the *LLM* tab, locate the *Rate Limit* section.
. Click *Add rate limit*.
. Configure the limit:
+
--
* *Requests per second*: Maximum requests per second (for example, `100`)
* *Burst allowance* (optional): Allow temporary bursts above the limit
--

. Click *Save*.

Rate limits apply to all requests through this gateway, regardless of model or provider.

=== Configure spend limits and budgets

Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded.

. In the *LLM* tab, locate the *Spend Limit* section.
. Click *Configure budget*.
. Set the budget:
+
--
* *Monthly budget*: Maximum spend per month (for example, `$15000`)
* *Enforcement*: Choose *Block* to reject requests after the budget is exceeded, or *Alert* to notify but allow requests
* *Notification threshold* (optional): Alert when X% of budget is consumed (for example, `80%`)
--

. Click *Save*.

Budget tracking uses estimated costs based on token usage and public provider pricing.

=== Configure routing and provider pools

Provider pools define which LLM providers handle requests, with support for primary and fallback configurations.

. In the *LLM* tab, locate the *Routing* section.
. Click *Add provider pool*.
. Configure the primary pool:
+
--
* *Name*: For example, `primary-anthropic`
* *Providers*: Select one or more providers (for example, Anthropic)
* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-3.5`)
* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.)
--

. (Optional) Click *Add fallback pool* to configure automatic failover:
+
--
* *Name*: For example, `fallback-openai`
* *Providers*: Select fallback provider (for example, OpenAI)
* *Models*: Choose fallback models (for example, `openai/gpt-4o`)
* *Trigger conditions*: When to activate fallback:
** Rate limit exceeded (429 from primary)
** Timeout (primary provider slow)
** Server errors (5xx from primary)
--

. Configure routing rules using CEL expressions (optional):
+
For simple routing, select *Route all requests to primary pool*.
+
For advanced routing based on request properties, use CEL expressions. See xref:ai-gateway/cel-routing-cookbook.adoc[] for examples.
+
Example CEL expression for tier-based routing:
+
[source,cel]
----
request.headers["x-user-tier"] == "premium"
? "anthropic/claude-opus-4"
: "anthropic/claude-sonnet-3.5"
----

. Click *Save routing configuration*.

TIP: Provider pool (UI) = Backend pool (API)

=== Load balancing and multi-provider distribution

If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance:

* *Round-robin*: Distribute evenly across all providers
* *Weighted*: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)
* *Least latency*: Route to fastest provider based on recent performance
* *Cost-optimized*: Route to cheapest provider for each model

== Configure MCP tools (optional)

If your users will build AI agents that need access to tools via MCP (Model Context Protocol), configure MCP tool aggregation.

On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple MCP servers, allowing agents to find and call tools through a single endpoint.

=== Add MCP servers

. In the *MCP* tab, click *Add MCP server*.
. Configure the server:
+
--
* *Server name*: Human-readable identifier (for example, `database-server`, `slack-server`)
* *Server URL*: Endpoint for the MCP server (for example, `https://mcp-database.example.com`)
* *Authentication*: Configure authentication if required (bearer token, API key, mTLS)
* *Enabled tools*: Select which tools from this server to expose (or *All tools*)
--

. Click *Test connection* to verify connectivity.
. Click *Save* to add the server to this gateway.

Repeat for each MCP server you want to aggregate.

=== Configure deferred tool loading

Deferred tool loading dramatically reduces token costs by initially exposing only a search tool and orchestrator, rather than listing all available tools.

. In the *MCP* tab, locate *Deferred Loading*.
. Toggle *Enable deferred tool loading* to *On*.
. Configure behavior:
+
--
* *Initially expose*: Search tool + orchestrator only
* *Load on demand*: Tools are retrieved when agents query for them
* *Token savings*: Expect 80-90% reduction in token usage for tool definitions
--

. Click *Save*.

See xref:ai-gateway/mcp-aggregation-guide.adoc[] for detailed information about MCP aggregation.

=== Configure the MCP orchestrator

The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips.

Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator.

The orchestrator is enabled by default when you enable MCP tools. You can configure:

* *Execution timeout*: Maximum time for orchestrator workflows (for example, 30 seconds)
* *Memory limit*: Maximum memory for JavaScript execution (for example, 128MB)
* *Allowed operations*: Restrict which MCP tools can be called from orchestrator workflows

== Verify your setup

After completing the setup, verify that the gateway is working correctly:

=== Test the gateway endpoint

[source,bash]
----
curl https://{GATEWAY_ENDPOINT}/v1/models \
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
-H "rp-aigw-id: ${GATEWAY_ID}"
----

Expected result: List of enabled models.

=== Send a test request

[source,bash]
----
curl https://{GATEWAY_ENDPOINT}/v1/chat/completions \
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
-H "rp-aigw-id: ${GATEWAY_ID}" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
"max_tokens": 50
}'
----

Expected result: Successful completion response.

=== Check observability

. Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Analytics*.
. Verify that your test request appears in the request logs.
. Check metrics:
+
--
* Request count: Should show your test request
* Token usage: Should show tokens consumed
* Estimated cost: Should show calculated cost
--

== Share access with users

Now that your gateway is configured, share access with users (builders):

. Provide the *Gateway ID* (for example, `gw_abc123`)
. Provide the *Gateway Endpoint* (for example, `https://gw.ai.panda.com`)
. Share API credentials (Redpanda Cloud tokens with appropriate permissions)
. (Optional) Document available models and any routing policies
. (Optional) Share rate limits and budget information

Users can then discover and connect to the gateway using the information provided. See xref:ai-gateway/builders/discover-gateways.adoc[] for user documentation.

== Next steps

*Configure and optimize:*

// * xref:ai-gateway/admin/manage-gateways.adoc[Manage Gateways] - List, edit, and delete gateways
* xref:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Cookbook] - Advanced routing patterns
// * xref:ai-gateway/admin/networking-configuration.adoc[Networking Configuration] - Configure private endpoints and connectivity

*Monitor and observe:*

* xref:ai-gateway/observability-metrics.adoc[Monitor Usage] - Track costs and usage across all gateways
* xref:ai-gateway/observability-logs.adoc[Request Logs] - View and filter request logs

*Integrate tools:*

* xref:ai-gateway/integrations/index.adoc[Integrations] - Admin guides for Claude Code, Cursor, and other tools
Loading