Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 16 additions & 11 deletions docs/toolhive/concepts/backend-auth.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,18 +211,23 @@ deployments using the ToolHive Operator.
- **Direct upstream redirect:** The embedded authorization server redirects
clients directly to the upstream provider for authentication (for example,
GitHub or Atlassian).
- **Single upstream provider:** Currently supports one upstream identity
provider per configuration.

:::info[Chained authentication not yet supported]

The embedded authorization server redirects clients directly to the upstream
provider. This means the upstream provider must be the service whose API the MCP
server calls. Chained authentication—where a client authenticates with a
- **Multiple upstream providers (VirtualMCPServer):** VirtualMCPServer supports
configuring multiple upstream identity providers with sequential
authentication. When multiple providers are configured, the authorization
server chains the authentication flow through each provider in sequence,
collecting tokens from all of them. This enables scenarios where backend tools
require tokens from different providers (such as a corporate IdP and GitHub).

:::info[Chained authentication for MCPServer]

MCPServer and MCPRemoteProxy support only one upstream provider. The embedded
authorization server redirects clients directly to that provider, so the
provider must be the service whose API the MCP server calls. If your MCPServer
deployment requires chained authentication—where a client authenticates with a
corporate IdP like Okta, which then federates to an external provider like
GitHub—is not yet supported. If your deployment requires this pattern, consider
using [token exchange](#same-idp-with-token-exchange) with a federated identity
provider instead.
GitHub—consider using [token exchange](#same-idp-with-token-exchange) with a
federated identity provider instead, or use a VirtualMCPServer with multiple
upstream providers.

:::

Expand Down
14 changes: 7 additions & 7 deletions docs/toolhive/guides-k8s/auth-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -599,13 +599,13 @@ kubectl apply -f embedded-auth-config.yaml

**Configuration reference:**

| Field | Description |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for the upstream identity provider. Currently supports one provider. |
| Field | Description |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |

**Step 5: Create the MCPServer resource**

Expand Down
7 changes: 6 additions & 1 deletion docs/toolhive/guides-k8s/redis-session-storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Redis Sentinel session storage
description:
How to deploy Redis Sentinel and configure persistent session storage for the
ToolHive embedded authorization server.
ToolHive embedded authorization server and horizontal scaling.
---

Deploy Redis Sentinel and configure it as the session storage backend for the
Expand All @@ -12,6 +12,11 @@ re-authenticate. Redis Sentinel provides persistent storage with automatic
master discovery, ACL-based access control, and optional failover when replicas
are configured.

Redis session storage is also required for horizontal scaling when running
multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments)
replicas, so that sessions are shared across pods.

:::info[Prerequisites]

Before you begin, ensure you have:
Expand Down
82 changes: 82 additions & 0 deletions docs/toolhive/guides-k8s/run-mcp-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,86 @@ For more details about a specific MCP server:
kubectl -n <NAMESPACE> describe mcpserver <NAME>
```

## Horizontal scaling

MCPServer creates two separate workloads: a proxy runner Deployment and a
backend MCP server StatefulSet. You can scale each independently:

- `spec.replicas` controls the proxy runner pod count
- `spec.backendReplicas` controls the backend MCP server pod count

The proxy runner handles authentication, MCP protocol framing, and session
management; it is stateless with respect to tool execution. The backend runs the
actual MCP server and executes tools.

Common configurations:

- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
auth and connection overhead is the bottleneck with a single backend.
- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful
when tool execution is CPU/memory-bound and the proxy is not a bottleneck. The
backend StatefulSet uses client-IP session affinity to route repeated
connections to the same pod — subject to the same NAT limitations as
proxy-level affinity.
- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
Redis session storage is required when `replicas > 1`.

```yaml title="MCPServer resource"
spec:
replicas: 2
backendReplicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location
db: 0
keyPrefix: mcp-sessions
passwordRef:
name: redis-secret
key: password
```

When running multiple replicas, configure
[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
replica management to an HPA or other external controller.

:::note The `SessionStorageWarning` condition fires only when
`spec.replicas > 1`. Scaling only the backend (`backendReplicas > 1`) does not
trigger a warning, but backend client-IP affinity is still unreliable behind NAT
or shared egress IPs. :::

:::note[Connection draining on scale-down]

When a proxy runner pod is terminated (scale-in, rolling update, or node
eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests for
up to 30 seconds before force-closing connections. The grace period and drain
timeout are both 30 seconds with no headroom, so long-lived SSE or streaming
connections may be dropped if they exceed the drain window.

No preStop hook is injected by the operator. If your workload requires
additional time — for example, to let kube-proxy propagate endpoint removal
before the pod stops accepting traffic — override
`terminationGracePeriodSeconds` via `podTemplateSpec`:

```yaml
spec:
podTemplateSpec:
spec:
terminationGracePeriodSeconds: 60
```

The same 30-second default applies to the backend StatefulSet.

:::

:::warning[Stdio transport limitation]

Backends using the `stdio` transport are limited to a single replica. The
operator rejects configurations with `backendReplicas` greater than 1 for stdio
backends.

:::

## Next steps

- [Connect clients to your MCP servers](./connect-clients.mdx) from outside the
Expand All @@ -455,6 +535,8 @@ kubectl -n <NAMESPACE> describe mcpserver <NAME>

- [Kubernetes CRD reference](../reference/crd-spec.md#apiv1alpha1mcpserver) -
Reference for the `MCPServer` Custom Resource Definition (CRD)
- [vMCP scaling and performance](../guides-vmcp/scaling-and-performance.mdx) -
Scale Virtual MCP Server deployments
- [Deploy the operator](./deploy-operator.mdx) - Install the ToolHive operator
- [Build MCP containers](../guides-cli/build-containers.mdx) - Create custom MCP
server container images
Expand Down
103 changes: 95 additions & 8 deletions docs/toolhive/guides-vmcp/composite-tools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ backend MCP servers, handling dependencies and collecting results.
wait for their prerequisites
- **Template expansion**: Dynamic arguments using step outputs
- **Elicitation**: Request user input mid-workflow (approval gates, choices)
- **Iteration**: Loop over collections with forEach steps
- **Error handling**: Configurable abort, continue, or retry behavior
- **Timeouts**: Workflow and per-step timeout configuration

Expand Down Expand Up @@ -290,7 +291,7 @@ spec:

### Steps

Each step can be a tool call or an elicitation:
Each step can be a tool call, an elicitation, or a forEach loop:

```yaml title="VirtualMCPServer resource"
spec:
Expand Down Expand Up @@ -344,6 +345,89 @@ spec:
timeout: '5m'
```

### forEach steps

Iterate over a collection from a previous step's output and execute a tool call
for each item:

```yaml title="VirtualMCPServer resource"
spec:
config:
compositeTools:
- name: scan_repositories
description: Check each repository for security advisories
parameters:
type: object
properties:
org:
type: string
required:
- org
steps:
- id: list_repos
tool: github_list_repos
arguments:
org: '{{.params.org}}'
# highlight-start
- id: check_advisories
type: forEach
collection: '{{json .steps.list_repos.output.repositories}}'
itemVar: repo
maxParallel: 5
step:
type: tool
tool: github_list_security_advisories
arguments:
repo: '{{.forEach.repo.name}}'
onError:
action: continue
dependsOn: [list_repos]
# highlight-end
```

**forEach fields:**

| Field | Description | Default |
| --------------- | ----------------------------------------------------- | ------- |
| `collection` | Template expression that produces an array | — |
| `itemVar` | Variable name for the current item | item |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
| `maxIterations` | Maximum total iterations (max 1000) | 100 |
| `step` | Inner step definition (tool call to execute per item) | — |
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |

:::note

`forEach` does not support `onError.action: retry`. Use `retry` on regular tool
steps. The `maxParallel` cap of 50 is enforced at runtime regardless of the
configured value.

:::

Access the current item inside the inner step using
`{{.forEach.<itemVar>.<field>}}`. In the example above, `{{.forEach.repo.name}}`
accesses the `name` field of the current repository. You can also use
`{{.forEach.index}}` to access the zero-based iteration index.

`maxParallel` controls how many iterations run concurrently **on the pod that
received the composite tool request**. Iterations are not distributed across
vMCP replicas — all parallel backend calls originate from a single pod
regardless of `spec.replicas`. When sizing your deployment, account for the
per-pod fan-out: a `maxParallel: 50` forEach step can open up to 50 simultaneous
connections to backend MCP servers from one pod. Ensure both the vMCP pod
resources and the backend MCP servers can handle that per-pod concurrency.

:::tip[Plan your workflow timeouts]

With `maxIterations: 1000` and `maxParallel: 10` (the defaults), a forEach loop
runs up to 100 serial batches. If each backend call takes a few seconds, the
total duration can easily exceed a workflow-level timeout. Set the workflow
`timeout` to at least
`ceil(maxIterations / maxParallel) × expected step duration` to avoid silent
truncation.

:::

### Error handling

Configure behavior when steps fail:
Expand Down Expand Up @@ -507,13 +591,16 @@ without defaultResults defined

Access workflow context in arguments:

| Template | Description |
| --------------------------- | ------------------------------------------ |
| `{{.params.name}}` | Input parameter |
| `{{.steps.id.output}}` | Step output (map) |
| `{{.steps.id.output.text}}` | Text content from step output |
| `{{.steps.id.content}}` | Elicitation response content |
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
| Template | Description |
| -------------------------------- | ------------------------------------------ |
| `{{.params.name}}` | Input parameter |
| `{{.steps.id.output}}` | Step output (map) |
| `{{.steps.id.output.text}}` | Text content from step output |
| `{{.steps.id.content}}` | Elicitation response content |
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
| `{{.forEach.<itemVar>}}` | Current forEach item |
| `{{.forEach.<itemVar>.<field>}}` | Field on current forEach item |
| `{{.forEach.index}}` | Zero-based iteration index |

### Template functions

Expand Down
Loading