Skip to content

RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends#55

Open
JAORMX wants to merge 9 commits intomainfrom
jaosorior/mcpserverentry-direct-remote-backends
Open

RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends#55
JAORMX wants to merge 9 commits intomainfrom
jaosorior/mcpserverentry-direct-remote-backends

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented Mar 12, 2026

Summary

Introduces MCPServerEntry, a new lightweight CRD (short name: mcpentry) that allows VirtualMCPServer to connect directly to remote MCP servers without deploying MCPRemoteProxy infrastructure. MCPServerEntry is a pod-less configuration resource that declares a remote MCP endpoint and belongs to an MCPGroup.

Motivation

vMCP currently requires MCPRemoteProxy (which spawns thv-proxyrunner pods) to reach remote MCP servers. This creates three problems:

  1. Forced auth on public remotes (#3104): MCPRemoteProxy requires OIDC even when vMCP already handles client auth
  2. Dual auth boundary confusion (#4109): externalAuthConfigRef serves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)
  3. Resource waste: Every remote backend needs a Deployment + Service + Pod just to make an HTTP call

Design Highlights

  • Zero infrastructure: MCPServerEntry deploys no pods, services, or deployments — pure configuration
  • Single auth boundary: externalAuthConfigRef has one unambiguous purpose (auth to the remote)
  • Validation-only controller: Validates references (MCPGroup, MCPExternalAuthConfig) but never probes remote URLs
  • Both discovery modes: Works with static (operator-generated ConfigMap) and dynamic (vMCP runtime discovery) from THV-0014
  • Header forwarding: Reuses existing headerForward pattern from THV-0026
  • Custom CA support: caBundleRef for private remote servers with internal CAs
  • Planned supersession: Will be superseded by MCPRemoteEndpoint (THV-0067), which unifies direct and proxy modes into a single CRD

Related RFCs

  • THV-0008 (Virtual MCP Server)
  • THV-0009 (Remote MCP Proxy)
  • THV-0010 (MCPGroup CRD)
  • THV-0014 (K8s-Aware vMCP)
  • THV-0026 (Header Passthrough)
  • THV-0067 (MCPRemoteEndpoint — future unified CRD)

Test plan

  • Review RFC structure and completeness
  • Validate CRD design against existing patterns
  • Verify security considerations (auth boundaries, SSRF mitigations)
  • Confirm MCPRemoteEndpoint supersession path is documented
  • Review comparison guide (MCPRemoteProxy vs MCPServerEntry)

JAORMX and others added 4 commits March 12, 2026 09:03
Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect
directly to remote MCP servers without MCPRemoteProxy infrastructure,
resolving the forced-auth (#3104) and dual-boundary confusion (#4109)
issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RFC should focus on design intent, not implementation code.
Keep YAML/Mermaid examples, replace Go blocks with prose
describing controller behavior, discovery logic, and TLS
handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implementation details like specific file paths belong in
the implementation, not the RFC design document.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:

  • groupRef type inconsistency (string vs object pattern) needs resolution
  • caBundleRef resource type (Secret vs ConfigMap) is ambiguous across sections
  • SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
  • Auth resolution timing in ListWorkloadsInGroup() description is unclear
  • Static mode CA bundle propagation is unspecified
  • toolConfigRef deferral creates an unacknowledged migration regression for MCPRemoteProxy users

- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy
- Fix Alt 1 YAML example to use string form for groupRef
- Change caBundleRef to reference ConfigMap (CA certs are public data)
- Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate
- Clarify auth resolution loads config only, token exchange deferred to request time
- Specify CA bundle volume mount for static mode (PEM files, not env vars)
- Document toolConfigRef migration path via aggregation.tools[].workload

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
yrobla previously approved these changes Mar 12, 2026
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks as a very good improvement to me

amirejaz
amirejaz previously approved these changes Mar 12, 2026
Copy link
Copy Markdown
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Added one comment

metadata:
name: context7
spec:
remoteURL: https://mcp.context7.com/mcp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.

@amirejaz
Copy link
Copy Markdown
Contributor

Question: Would it make sense to rename MCPServerEntry to MCPRemoteServerEntry to improve clarity?

jerm-dro
jerm-dro previously approved these changes Mar 13, 2026
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I appreciate the alternatives section.

@lorr1
Copy link
Copy Markdown

lorr1 commented Mar 17, 2026

We need this!! I love it.

@@ -0,0 +1,904 @@
# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only

Copy link
Copy Markdown
Contributor

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.

The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?

After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:

  • Problem #1 (forced OIDC on public remotes) can be fixed by making oidcConfig optional on MCPRemoteProxy — a one-field change (see comment 2).
  • Problem #2 (dual auth boundary confusion) turns out not to exist in the code — externalAuthConfigRef and oidcConfig already serve separate purposes in pkg/vmcp/workloads/k8s.go (see comment 1).
  • Problem #3 (pod waste) can be solved by adding a direct: true flag to MCPRemoteProxy that skips ensureDeployment() and ensureService() and uses remoteURL as the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).

The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).

Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).

Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.

If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.

🤖 Review assisted by Claude Code

@jhrozek
Copy link
Copy Markdown
Contributor

jhrozek commented Mar 17, 2026

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

@JAORMX
Copy link
Copy Markdown
Contributor Author

JAORMX commented Mar 18, 2026

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions.

@ChrisJBurns ChrisJBurns dismissed stale reviews from jerm-dro, amirejaz, and yrobla via c2965b7 March 18, 2026 17:09
@ChrisJBurns ChrisJBurns changed the title RFC: MCPServerEntry CRD for direct remote MCP server backends RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity Mar 19, 2026
MCPServerEntry ships now to unblock near-term use cases. It will be
superseded by MCPRemoteEndpoint, a unified CRD that combines direct
and proxy remote connectivity under a single resource.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the jaosorior/mcpserverentry-direct-remote-backends branch from dcc74ba to 397ccb4 Compare April 7, 2026 11:16
JAORMX and others added 2 commits April 7, 2026 14:21
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX changed the title RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends Apr 7, 2026
name: remote-api-credentials
key: api-key

# OPTIONAL: Custom CA bundle for private remote servers using
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Field name doesn't match existing type

addHeadersFromSecrets (plural) doesn't match the existing HeaderForwardConfig type which uses addHeadersFromSecret (singular). See cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go line 20:

type HeaderForwardConfig struct {
    AddPlaintextHeaders  map[string]string    `json:"addPlaintextHeaders,omitempty"`
    AddHeadersFromSecret []HeaderFromSecret   `json:"addHeadersFromSecret,omitempty"`
}

This appears in multiple YAML examples throughout the RFC (lines 240, 310, 531). Should be addHeadersFromSecret everywhere to match the existing type.

```

**Example: Unauthenticated public remote (resolves #3104):**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: caBundleRef shape doesn't match existing CABundleSource type

The RFC shows a flat {name, key} struct:

caBundleRef:
  name: internal-ca-bundle
  key: ca.crt

But the existing CABundleSource type in mcpserver_types.go (line 659) wraps a corev1.ConfigMapKeySelector:

type CABundleSource struct {
    ConfigMapRef *corev1.ConfigMapKeySelector `json:"configMapRef,omitempty"`
}

So the actual YAML shape should be:

caBundleRef:
  configMapRef:
    name: internal-ca-bundle
    key: ca.crt

Recommend reusing the existing CABundleSource type for consistency across CRDs. This affects all YAML examples with caBundleRef (lines 249, 315).


There is intentionally **no `RemoteReachable` condition**. The controller
should NOT probe remote URLs because:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: Condition types don't match codebase convention

The RFC uses -Valid suffix (GroupRefValid, AuthConfigValid, CABundleValid), but the codebase consistently uses -Validated:

  • MCPServer: GroupRefValidated, ExternalAuthConfigValidated, CABundleRefValidated
  • MCPRemoteProxy: GroupRefValidated, ExternalAuthConfigValidated, ToolConfigValidated

For consistency, these should be:

  • GroupRefValidated (not GroupRefValid)
  • ExternalAuthConfigValidated (not AuthConfigValid)
  • CABundleRefValidated (not CABundleValid)

Also, the Go constant names should follow the existing prefix pattern: ConditionTypeMCPServerEntryGroupRefValidated, etc.

|-------|------|-------------|
| `conditions` | []Condition | Standard Kubernetes conditions (see table below). |
| `observedGeneration` | int64 | Most recent generation observed by the controller. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: .status.phase field

Every existing CRD in the codebase has a .status.phase field with a typed enum:

  • MCPServer: Pending, Running, Failed, Terminating, Stopped
  • MCPRemoteProxy: Pending, Ready, Failed, Terminating
  • VirtualMCPServer: Pending, Ready, Degraded, Failed
  • MCPGroup: Ready, Pending, Failed

The RFC status only lists conditions and observedGeneration but no phase. For a validation-only resource, sensible phases would be: Pending, Ready, Failed.

Also missing: kubebuilder print column markers. Every existing CRD specifies +kubebuilder:printcolumn for Phase/Ready, URL, Transport, Age, etc. The RFC should specify these for kubectl get mcpentry output.

| `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. |
| `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. |
| `caBundleRef` | object | No | Reference to a ConfigMap containing a custom CA certificate bundle for TLS verification. ConfigMap is used rather than Secret because CA certificates are public data, consistent with the `kube-root-ca.crt` pattern. |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design: Use a spec field instead of annotation for HTTPS override

The existing codebase uses insecureAllowHTTP: bool spec fields (e.g., InlineOIDCConfig.InsecureAllowHTTP in mcpserver_types.go line 719, and in oidc_validation.go line 59). The annotation-based approach (toolhive.stacklok.dev/allow-insecure) is a divergence with security implications:

  1. RBAC escalation: Annotations can be set by anyone with patch rights on metadata, even without update on spec
  2. No CEL validation: Annotations bypass kubebuilder validation markers — a typo like "tru" silently enforces HTTPS
  3. Visibility: Annotations don't show up in kubectl get print columns

Recommend using a spec field (e.g., insecureAllowHTTP: bool with +kubebuilder:default=false) consistent with the existing pattern.

# OPTIONAL: Header forwarding configuration.
# Reuses existing pattern from MCPRemoteProxy (THV-0026).
headerForward:
addPlaintextHeaders:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: Multi-upstream ExternalAuthConfig rejection + header forwarding interaction

Multi-upstream: Both MCPServer and MCPRemoteProxy explicitly reject multi-upstream ExternalAuthConfig (see ConditionReasonExternalAuthConfigMultiUpstream). The RFC should specify that MCPServerEntry does the same.

Header forwarding + token exchange interaction: What happens when both externalAuthConfigRef (token exchange → sets Authorization) and headerForward.addHeadersFromSecrets (also sets Authorization) are configured? The existing types have no CEL validation preventing this conflict.

Recommend either:

  • Define clear precedence rules (e.g., externalAuthConfigRef wins for any header it sets)
  • Or add CEL/webhook validation rejecting headerForward entries that set Authorization when externalAuthConfigRef is also present
  • Document that headerForward is for non-auth headers (tenant IDs, correlation IDs) while externalAuthConfigRef handles all authentication

`status.serverCount` reflect both types of backends in the group.

##### Operator: VirtualMCPServer Controller Update

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete: MCPGroup controller changes are substantial

The MCPGroup controller update is described in one paragraph but requires significant work:

  1. New status fields: ServerEntries []string + ServerEntryCount int (CRD schema change)
  2. New findReferencingMCPServerEntries() method
  3. New populateServerEntryStatus() method
  4. Update handleDeletion() to iterate MCPServerEntry resources (without this, deleting an MCPGroup leaves MCPServerEntries with stale GroupRefValid=True conditions)
  5. New watch in SetupWithManager() for MCPServerEntry
  6. New field index for spec.groupRef on MCPServerEntry
  7. Update ConditionTypeMCPServersChecked message (or rename to MembersChecked)

The RFC should enumerate these explicitly so implementation scope is clear.

HTTP client in `pkg/vmcp/client/client.go` must be updated to:

1. Use the system CA certificate pool by default (for public CAs).
2. Optionally append a custom CA bundle from `caBundleRef` (for private
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation gap: No existing per-backend TLS configuration code path

The current newBackendTransport() in pkg/vmcp/client/client.go (line 90) clones http.DefaultTransport with no per-backend TLS customization. For caBundleRef to work, a new mechanism is needed to inject custom CA bundles per backend into the HTTP client factory.

This needs to flow through BackendTarget (adding a CABundle []byte field or similar) or be injected at the transport factory level. The RFC should call out this gap since it's a meaningful implementation change to the vMCP client infrastructure.

Also worth noting: for static mode, the CA bundle ConfigMap must be mounted as a volume into the VirtualMCPServer pod (not the nonexistent MCPServerEntry pod). The mount path convention (e.g., /etc/toolhive/ca-bundles/<entry-name>/ca.crt) and how it's referenced in the generated backend ConfigMap should be specified.

infrastructure from THV-0014 must be extended to watch MCPServerEntry
resources.

The `MCPServerEntryWatcher` follows the same reconciler pattern as the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Session recovery is more important for entry backends

The existing mcpSession explicitly documents a "Phase 1 limitation — no reconnection: if the underlying transport drops... all subsequent operations on this backend will fail." For in-cluster backends this is rarely hit (pods are colocated), but MCPServerEntry backends over the internet will experience session drops much more frequently due to:

  • Network interruptions
  • Remote server restarts/redeployments
  • Remote server session TTL expiration
  • TLS renegotiation failures

The MCP spec states servers may terminate sessions at any time and clients should re-initialize when receiving HTTP 404 for a known session ID. Consider documenting this as a known limitation with a follow-up plan, or addressing session re-establishment as part of this RFC's implementation phases.


The `ListWorkloadsInGroup()` function in `pkg/vmcp/workloads/k8s.go` is
extended to discover MCPServerEntry resources in addition to MCPServer
resources. For each MCPServerEntry in the group, vMCP:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: WorkloadKindMCPServerEntry and WorkloadType constants

The existing WorkloadReference enum in mcpoidcconfig_types.go (line 148-158) defines:

WorkloadKindMCPServer        = "MCPServer"
WorkloadKindVirtualMCPServer = "VirtualMCPServer"
WorkloadKindMCPRemoteProxy   = "MCPRemoteProxy"

And pkg/vmcp/workloads/discoverer.go defines WorkloadTypeMCPServer and WorkloadTypeMCPRemoteProxy.

Both need new constants for MCPServerEntry, plus the +kubebuilder:validation:Enum on WorkloadReference.Kind must be updated. The MCPExternalAuthConfig controller's ReferencingWorkloads tracking also needs to recognize MCPServerEntry references.

Also: BackendReconciler.fetchBackendResource() currently tries MCPServer then MCPRemoteProxy sequentially. A third try-fetch is fine for three types but the RFC should acknowledge this pattern doesn't scale well beyond that.

…le complexity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants