diff --git a/docs/toolhive/concepts/backend-auth.mdx b/docs/toolhive/concepts/backend-auth.mdx index 04efb279..14aa7bcd 100644 --- a/docs/toolhive/concepts/backend-auth.mdx +++ b/docs/toolhive/concepts/backend-auth.mdx @@ -211,18 +211,23 @@ deployments using the ToolHive Operator. - **Direct upstream redirect:** The embedded authorization server redirects clients directly to the upstream provider for authentication (for example, GitHub or Atlassian). -- **Single upstream provider:** Currently supports one upstream identity - provider per configuration. - -:::info[Chained authentication not yet supported] - -The embedded authorization server redirects clients directly to the upstream -provider. This means the upstream provider must be the service whose API the MCP -server calls. Chained authentication—where a client authenticates with a +- **Multiple upstream providers (VirtualMCPServer):** VirtualMCPServer supports + configuring multiple upstream identity providers with sequential + authentication. When multiple providers are configured, the authorization + server chains the authentication flow through each provider in sequence, + collecting tokens from all of them. This enables scenarios where backend tools + require tokens from different providers (such as a corporate IdP and GitHub). + +:::info[Chained authentication for MCPServer] + +MCPServer and MCPRemoteProxy support only one upstream provider. The embedded +authorization server redirects clients directly to that provider, so the +provider must be the service whose API the MCP server calls. If your MCPServer +deployment requires chained authentication—where a client authenticates with a corporate IdP like Okta, which then federates to an external provider like -GitHub—is not yet supported. If your deployment requires this pattern, consider -using [token exchange](#same-idp-with-token-exchange) with a federated identity -provider instead. +GitHub—consider using [token exchange](#same-idp-with-token-exchange) with a +federated identity provider instead, or use a VirtualMCPServer with multiple +upstream providers. ::: diff --git a/docs/toolhive/guides-k8s/auth-k8s.mdx b/docs/toolhive/guides-k8s/auth-k8s.mdx index d836d94f..a569534e 100644 --- a/docs/toolhive/guides-k8s/auth-k8s.mdx +++ b/docs/toolhive/guides-k8s/auth-k8s.mdx @@ -599,13 +599,13 @@ kubectl apply -f embedded-auth-config.yaml **Configuration reference:** -| Field | Description | -| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- | -| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. | -| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. | -| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. | -| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). | -| `upstreamProviders` | Configuration for the upstream identity provider. Currently supports one provider. | +| Field | Description | +| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. | +| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. | +| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. | +| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). | +| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. | **Step 5: Create the MCPServer resource** diff --git a/docs/toolhive/guides-k8s/redis-session-storage.mdx b/docs/toolhive/guides-k8s/redis-session-storage.mdx index 35ac2909..d47f5b85 100644 --- a/docs/toolhive/guides-k8s/redis-session-storage.mdx +++ b/docs/toolhive/guides-k8s/redis-session-storage.mdx @@ -2,7 +2,7 @@ title: Redis Sentinel session storage description: How to deploy Redis Sentinel and configure persistent session storage for the - ToolHive embedded authorization server. + ToolHive embedded authorization server and horizontal scaling. --- Deploy Redis Sentinel and configure it as the session storage backend for the @@ -12,6 +12,11 @@ re-authenticate. Redis Sentinel provides persistent storage with automatic master discovery, ACL-based access control, and optional failover when replicas are configured. +Redis session storage is also required for horizontal scaling when running +multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or +[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments) +replicas, so that sessions are shared across pods. + :::info[Prerequisites] Before you begin, ensure you have: diff --git a/docs/toolhive/guides-k8s/run-mcp-k8s.mdx b/docs/toolhive/guides-k8s/run-mcp-k8s.mdx index 2d04e672..d54855e7 100644 --- a/docs/toolhive/guides-k8s/run-mcp-k8s.mdx +++ b/docs/toolhive/guides-k8s/run-mcp-k8s.mdx @@ -439,6 +439,86 @@ For more details about a specific MCP server: kubectl -n describe mcpserver ``` +## Horizontal scaling + +MCPServer creates two separate workloads: a proxy runner Deployment and a +backend MCP server StatefulSet. You can scale each independently: + +- `spec.replicas` controls the proxy runner pod count +- `spec.backendReplicas` controls the backend MCP server pod count + +The proxy runner handles authentication, MCP protocol framing, and session +management; it is stateless with respect to tool execution. The backend runs the +actual MCP server and executes tools. + +Common configurations: + +- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when + auth and connection overhead is the bottleneck with a single backend. +- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful + when tool execution is CPU/memory-bound and the proxy is not a bottleneck. The + backend StatefulSet uses client-IP session affinity to route repeated + connections to the same pod — subject to the same NAT limitations as + proxy-level affinity. +- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale. + Redis session storage is required when `replicas > 1`. + +```yaml title="MCPServer resource" +spec: + replicas: 2 + backendReplicas: 3 + sessionStorage: + provider: redis + address: redis-master.toolhive-system.svc.cluster.local:6379 # Update to match your Redis Service location + db: 0 + keyPrefix: mcp-sessions + passwordRef: + name: redis-secret + key: password +``` + +When running multiple replicas, configure +[Redis session storage](./redis-session-storage.mdx) so that sessions are shared +across pods. If you omit `replicas` or `backendReplicas`, the operator defers +replica management to an HPA or other external controller. + +:::note The `SessionStorageWarning` condition fires only when +`spec.replicas > 1`. Scaling only the backend (`backendReplicas > 1`) does not +trigger a warning, but backend client-IP affinity is still unreliable behind NAT +or shared egress IPs. ::: + +:::note[Connection draining on scale-down] + +When a proxy runner pod is terminated (scale-in, rolling update, or node +eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests for +up to 30 seconds before force-closing connections. The grace period and drain +timeout are both 30 seconds with no headroom, so long-lived SSE or streaming +connections may be dropped if they exceed the drain window. + +No preStop hook is injected by the operator. If your workload requires +additional time — for example, to let kube-proxy propagate endpoint removal +before the pod stops accepting traffic — override +`terminationGracePeriodSeconds` via `podTemplateSpec`: + +```yaml +spec: + podTemplateSpec: + spec: + terminationGracePeriodSeconds: 60 +``` + +The same 30-second default applies to the backend StatefulSet. + +::: + +:::warning[Stdio transport limitation] + +Backends using the `stdio` transport are limited to a single replica. The +operator rejects configurations with `backendReplicas` greater than 1 for stdio +backends. + +::: + ## Next steps - [Connect clients to your MCP servers](./connect-clients.mdx) from outside the @@ -455,6 +535,8 @@ kubectl -n describe mcpserver - [Kubernetes CRD reference](../reference/crd-spec.md#apiv1alpha1mcpserver) - Reference for the `MCPServer` Custom Resource Definition (CRD) +- [vMCP scaling and performance](../guides-vmcp/scaling-and-performance.mdx) - + Scale Virtual MCP Server deployments - [Deploy the operator](./deploy-operator.mdx) - Install the ToolHive operator - [Build MCP containers](../guides-cli/build-containers.mdx) - Create custom MCP server container images diff --git a/docs/toolhive/guides-vmcp/composite-tools.mdx b/docs/toolhive/guides-vmcp/composite-tools.mdx index 41f178fc..86dc3553 100644 --- a/docs/toolhive/guides-vmcp/composite-tools.mdx +++ b/docs/toolhive/guides-vmcp/composite-tools.mdx @@ -19,6 +19,7 @@ backend MCP servers, handling dependencies and collecting results. wait for their prerequisites - **Template expansion**: Dynamic arguments using step outputs - **Elicitation**: Request user input mid-workflow (approval gates, choices) +- **Iteration**: Loop over collections with forEach steps - **Error handling**: Configurable abort, continue, or retry behavior - **Timeouts**: Workflow and per-step timeout configuration @@ -290,7 +291,7 @@ spec: ### Steps -Each step can be a tool call or an elicitation: +Each step can be a tool call, an elicitation, or a forEach loop: ```yaml title="VirtualMCPServer resource" spec: @@ -344,6 +345,89 @@ spec: timeout: '5m' ``` +### forEach steps + +Iterate over a collection from a previous step's output and execute a tool call +for each item: + +```yaml title="VirtualMCPServer resource" +spec: + config: + compositeTools: + - name: scan_repositories + description: Check each repository for security advisories + parameters: + type: object + properties: + org: + type: string + required: + - org + steps: + - id: list_repos + tool: github_list_repos + arguments: + org: '{{.params.org}}' + # highlight-start + - id: check_advisories + type: forEach + collection: '{{json .steps.list_repos.output.repositories}}' + itemVar: repo + maxParallel: 5 + step: + type: tool + tool: github_list_security_advisories + arguments: + repo: '{{.forEach.repo.name}}' + onError: + action: continue + dependsOn: [list_repos] + # highlight-end +``` + +**forEach fields:** + +| Field | Description | Default | +| --------------- | ----------------------------------------------------- | ------- | +| `collection` | Template expression that produces an array | — | +| `itemVar` | Variable name for the current item | item | +| `maxParallel` | Maximum concurrent iterations (max 50) | 10 | +| `maxIterations` | Maximum total iterations (max 1000) | 100 | +| `step` | Inner step definition (tool call to execute per item) | — | +| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort | + +:::note + +`forEach` does not support `onError.action: retry`. Use `retry` on regular tool +steps. The `maxParallel` cap of 50 is enforced at runtime regardless of the +configured value. + +::: + +Access the current item inside the inner step using +`{{.forEach..}}`. In the example above, `{{.forEach.repo.name}}` +accesses the `name` field of the current repository. You can also use +`{{.forEach.index}}` to access the zero-based iteration index. + +`maxParallel` controls how many iterations run concurrently **on the pod that +received the composite tool request**. Iterations are not distributed across +vMCP replicas — all parallel backend calls originate from a single pod +regardless of `spec.replicas`. When sizing your deployment, account for the +per-pod fan-out: a `maxParallel: 50` forEach step can open up to 50 simultaneous +connections to backend MCP servers from one pod. Ensure both the vMCP pod +resources and the backend MCP servers can handle that per-pod concurrency. + +:::tip[Plan your workflow timeouts] + +With `maxIterations: 1000` and `maxParallel: 10` (the defaults), a forEach loop +runs up to 100 serial batches. If each backend call takes a few seconds, the +total duration can easily exceed a workflow-level timeout. Set the workflow +`timeout` to at least +`ceil(maxIterations / maxParallel) × expected step duration` to avoid silent +truncation. + +::: + ### Error handling Configure behavior when steps fail: @@ -507,13 +591,16 @@ without defaultResults defined Access workflow context in arguments: -| Template | Description | -| --------------------------- | ------------------------------------------ | -| `{{.params.name}}` | Input parameter | -| `{{.steps.id.output}}` | Step output (map) | -| `{{.steps.id.output.text}}` | Text content from step output | -| `{{.steps.id.content}}` | Elicitation response content | -| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) | +| Template | Description | +| -------------------------------- | ------------------------------------------ | +| `{{.params.name}}` | Input parameter | +| `{{.steps.id.output}}` | Step output (map) | +| `{{.steps.id.output.text}}` | Text content from step output | +| `{{.steps.id.content}}` | Elicitation response content | +| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) | +| `{{.forEach.}}` | Current forEach item | +| `{{.forEach..}}` | Field on current forEach item | +| `{{.forEach.index}}` | Zero-based iteration index | ### Template functions diff --git a/docs/toolhive/guides-vmcp/scaling-and-performance.mdx b/docs/toolhive/guides-vmcp/scaling-and-performance.mdx index 18e5029f..897d5a05 100644 --- a/docs/toolhive/guides-vmcp/scaling-and-performance.mdx +++ b/docs/toolhive/guides-vmcp/scaling-and-performance.mdx @@ -4,7 +4,10 @@ description: How to scale Virtual MCP Server deployments vertically and horizontally. --- -This guide explains how to scale Virtual MCP Server (vMCP) deployments. +This guide explains how to scale Virtual MCP Server (vMCP) deployments. For +MCPServer scaling, see +[Horizontal scaling](../guides-k8s/run-mcp-k8s.mdx#horizontal-scaling) in the +Kubernetes operator guide. ## Vertical scaling @@ -37,24 +40,62 @@ higher request volumes. ### How to scale horizontally -The VirtualMCPServer CRD does not have a `replicas` field. The operator creates -a Deployment named `vmcp-` (where `` is your VirtualMCPServer name) -with 1 replica and preserves the replicas count, allowing you to manage scaling -separately. +Set the `replicas` field in your VirtualMCPServer spec to control the number of +vMCP pods: + +```yaml title="VirtualMCPServer resource" +spec: + replicas: 3 +``` + +If you omit `replicas`, the operator defers replica management to an HPA or +other external controller. You can also scale manually or with an HPA: **Option 1: Manual scaling** ```bash -kubectl scale deployment vmcp- -n --replicas=3 +kubectl scale deployment vmcp- -n --replicas=3 ``` **Option 2: Autoscaling with HPA** ```bash -kubectl autoscale deployment vmcp- -n \ +kubectl autoscale deployment vmcp- -n \ --min=2 --max=5 --cpu-percent=70 ``` +### Session storage for multi-replica deployments + +When running multiple replicas, configure Redis session storage so that sessions +are shared across pods. Without session storage, a request routed to a different +replica than the one that established the session will fail. + +```yaml title="VirtualMCPServer resource" +spec: + replicas: 3 + sessionStorage: + provider: redis + address: redis-master.toolhive-system.svc.cluster.local:6379 + db: 0 + keyPrefix: vmcp-sessions + passwordRef: + name: redis-secret + key: password +``` + +See [Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx) +for a complete Redis deployment guide. + +:::warning + +If you configure multiple replicas without session storage, the operator sets a +`SessionStorageWarning` status condition on the resource but **still applies the +replica count**. Pods will start, but requests routed to a replica that did not +establish the session will fail. Ensure Redis is available before scaling beyond +a single replica. + +::: + ### When horizontal scaling is challenging Horizontal scaling works well for **stateless backends** (fetch, search, @@ -63,22 +104,38 @@ read-only operations) where sessions can be resumed on any instance. However, **stateful backends** make horizontal scaling difficult: - **Stateful backends** (Playwright browser sessions, database connections, file - system operations) require requests to be routed to the same vMCP instance - that established the session + system operations) require requests to be routed to the same instance that + established the session - Session resumption may not work reliably for stateful backends The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how the Kubernetes Service routes repeated client connections. By default, it uses `ClientIP` affinity, which routes connections from the same client IP to the -same pod. You can configure this using the `sessionAffinity` field: +same pod: ```yaml spec: sessionAffinity: ClientIP # default ``` -For stateful backends, vertical scaling or dedicated vMCP instances per team/use -case are recommended instead of horizontal scaling. +:::warning[ClientIP affinity is unreliable behind NAT or shared egress IPs] + +`ClientIP` affinity relies on the source IP reaching kube-proxy. When clients +sit behind a NAT gateway, corporate proxy, or cloud load balancer (common in +EKS, GKE, and AKS), all traffic appears to originate from the same IP — routing +every client to the same pod and eliminating the benefit of horizontal scaling. +This fails silently: the deployment appears healthy but only one pod handles all +load. + +For stateless backends, set `sessionAffinity: None` so the Service load-balances +freely. For stateful backends where true per-session routing is required, +`ClientIP` affinity is a best-effort mechanism only. Prefer vertical scaling or +a dedicated vMCP instance per team instead. + +::: + +For stateful backends, vertical scaling or dedicated instances per team/use case +are recommended instead of horizontal scaling. ## Next steps diff --git a/docs/toolhive/guides-vmcp/tool-aggregation.mdx b/docs/toolhive/guides-vmcp/tool-aggregation.mdx index 4044a2d4..f4657dce 100644 --- a/docs/toolhive/guides-vmcp/tool-aggregation.mdx +++ b/docs/toolhive/guides-vmcp/tool-aggregation.mdx @@ -146,6 +146,42 @@ spec: description: 'Create a new GitHub issue in the repository' ``` +### Annotation overrides + +Override MCP tool annotations to provide hints to LLM clients about tool +behavior. Annotations are optional—only set the fields you want to override: + +```yaml title="VirtualMCPServer resource" +spec: + config: + aggregation: + tools: + - workload: github + overrides: + delete_repository: + annotations: + destructiveHint: true + readOnlyHint: false + list_issues: + annotations: + title: 'List GitHub Issues' + readOnlyHint: true + idempotentHint: true +``` + +**Available annotation fields:** + +| Field | Type | Description | +| ----------------- | ------- | -------------------------------------------------- | +| `title` | string | Display title for the tool in MCP clients | +| `readOnlyHint` | boolean | Indicates the tool does not modify data | +| `destructiveHint` | boolean | Indicates the tool may delete or overwrite data | +| `idempotentHint` | boolean | Indicates repeated calls produce the same result | +| `openWorldHint` | boolean | Indicates the tool interacts with external systems | + +Annotation overrides can be combined with name and description overrides on the +same tool. + :::info You can also reference an `MCPToolConfig` resource using `toolConfigRef` instead