RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends#55
RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends#55
Conversation
Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:
groupReftype inconsistency (string vs object pattern) needs resolutioncaBundleRefresource type (Secret vs ConfigMap) is ambiguous across sections- SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
- Auth resolution timing in
ListWorkloadsInGroup()description is unclear - Static mode CA bundle propagation is unspecified
toolConfigRefdeferral creates an unacknowledged migration regression for MCPRemoteProxy users
- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
looks as a very good improvement to me
amirejaz
left a comment
There was a problem hiding this comment.
This looks good to me. Added one comment
| metadata: | ||
| name: context7 | ||
| spec: | ||
| remoteURL: https://mcp.context7.com/mcp |
There was a problem hiding this comment.
Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.
|
Question: Would it make sense to rename |
jerm-dro
left a comment
There was a problem hiding this comment.
Looks great! I appreciate the alternatives section.
|
We need this!! I love it. |
| @@ -0,0 +1,904 @@ | |||
| # RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends | |||
There was a problem hiding this comment.
(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only
There was a problem hiding this comment.
Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.
The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?
After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:
- Problem #1 (forced OIDC on public remotes) can be fixed by making
oidcConfigoptional onMCPRemoteProxy— a one-field change (see comment 2). - Problem #2 (dual auth boundary confusion) turns out not to exist in the code —
externalAuthConfigRefandoidcConfigalready serve separate purposes inpkg/vmcp/workloads/k8s.go(see comment 1). - Problem #3 (pod waste) can be solved by adding a
direct: trueflag toMCPRemoteProxythat skipsensureDeployment()andensureService()and usesremoteURLas the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).
The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).
Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).
Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.
If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.
🤖 Review assisted by Claude Code
|
+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc.. |
I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions. |
c2965b7
MCPServerEntry ships now to unblock near-term use cases. It will be superseded by MCPRemoteEndpoint, a unified CRD that combines direct and proxy remote connectivity under a single resource. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dcc74ba to
397ccb4
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| name: remote-api-credentials | ||
| key: api-key | ||
|
|
||
| # OPTIONAL: Custom CA bundle for private remote servers using |
There was a problem hiding this comment.
Bug: Field name doesn't match existing type
addHeadersFromSecrets (plural) doesn't match the existing HeaderForwardConfig type which uses addHeadersFromSecret (singular). See cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go line 20:
type HeaderForwardConfig struct {
AddPlaintextHeaders map[string]string `json:"addPlaintextHeaders,omitempty"`
AddHeadersFromSecret []HeaderFromSecret `json:"addHeadersFromSecret,omitempty"`
}This appears in multiple YAML examples throughout the RFC (lines 240, 310, 531). Should be addHeadersFromSecret everywhere to match the existing type.
| ``` | ||
|
|
||
| **Example: Unauthenticated public remote (resolves #3104):** | ||
|
|
There was a problem hiding this comment.
Bug: caBundleRef shape doesn't match existing CABundleSource type
The RFC shows a flat {name, key} struct:
caBundleRef:
name: internal-ca-bundle
key: ca.crtBut the existing CABundleSource type in mcpserver_types.go (line 659) wraps a corev1.ConfigMapKeySelector:
type CABundleSource struct {
ConfigMapRef *corev1.ConfigMapKeySelector `json:"configMapRef,omitempty"`
}So the actual YAML shape should be:
caBundleRef:
configMapRef:
name: internal-ca-bundle
key: ca.crtRecommend reusing the existing CABundleSource type for consistency across CRDs. This affects all YAML examples with caBundleRef (lines 249, 315).
|
|
||
| There is intentionally **no `RemoteReachable` condition**. The controller | ||
| should NOT probe remote URLs because: | ||
|
|
There was a problem hiding this comment.
Naming: Condition types don't match codebase convention
The RFC uses -Valid suffix (GroupRefValid, AuthConfigValid, CABundleValid), but the codebase consistently uses -Validated:
- MCPServer:
GroupRefValidated,ExternalAuthConfigValidated,CABundleRefValidated - MCPRemoteProxy:
GroupRefValidated,ExternalAuthConfigValidated,ToolConfigValidated
For consistency, these should be:
GroupRefValidated(notGroupRefValid)ExternalAuthConfigValidated(notAuthConfigValid)CABundleRefValidated(notCABundleValid)
Also, the Go constant names should follow the existing prefix pattern: ConditionTypeMCPServerEntryGroupRefValidated, etc.
| |-------|------|-------------| | ||
| | `conditions` | []Condition | Standard Kubernetes conditions (see table below). | | ||
| | `observedGeneration` | int64 | Most recent generation observed by the controller. | | ||
|
|
There was a problem hiding this comment.
Missing: .status.phase field
Every existing CRD in the codebase has a .status.phase field with a typed enum:
- MCPServer:
Pending,Running,Failed,Terminating,Stopped - MCPRemoteProxy:
Pending,Ready,Failed,Terminating - VirtualMCPServer:
Pending,Ready,Degraded,Failed - MCPGroup:
Ready,Pending,Failed
The RFC status only lists conditions and observedGeneration but no phase. For a validation-only resource, sensible phases would be: Pending, Ready, Failed.
Also missing: kubebuilder print column markers. Every existing CRD specifies +kubebuilder:printcolumn for Phase/Ready, URL, Transport, Age, etc. The RFC should specify these for kubectl get mcpentry output.
| | `externalAuthConfigRef` | object | No | Reference to an MCPExternalAuthConfig in the same namespace. Omit for unauthenticated endpoints. | | ||
| | `headerForward` | object | No | Header forwarding configuration. Reuses existing `HeaderForwardConfig` type from MCPRemoteProxy. | | ||
| | `caBundleRef` | object | No | Reference to a ConfigMap containing a custom CA certificate bundle for TLS verification. ConfigMap is used rather than Secret because CA certificates are public data, consistent with the `kube-root-ca.crt` pattern. | | ||
|
|
There was a problem hiding this comment.
Design: Use a spec field instead of annotation for HTTPS override
The existing codebase uses insecureAllowHTTP: bool spec fields (e.g., InlineOIDCConfig.InsecureAllowHTTP in mcpserver_types.go line 719, and in oidc_validation.go line 59). The annotation-based approach (toolhive.stacklok.dev/allow-insecure) is a divergence with security implications:
- RBAC escalation: Annotations can be set by anyone with
patchrights on metadata, even withoutupdateon spec - No CEL validation: Annotations bypass kubebuilder validation markers — a typo like
"tru"silently enforces HTTPS - Visibility: Annotations don't show up in
kubectl getprint columns
Recommend using a spec field (e.g., insecureAllowHTTP: bool with +kubebuilder:default=false) consistent with the existing pattern.
| # OPTIONAL: Header forwarding configuration. | ||
| # Reuses existing pattern from MCPRemoteProxy (THV-0026). | ||
| headerForward: | ||
| addPlaintextHeaders: |
There was a problem hiding this comment.
Missing: Multi-upstream ExternalAuthConfig rejection + header forwarding interaction
Multi-upstream: Both MCPServer and MCPRemoteProxy explicitly reject multi-upstream ExternalAuthConfig (see ConditionReasonExternalAuthConfigMultiUpstream). The RFC should specify that MCPServerEntry does the same.
Header forwarding + token exchange interaction: What happens when both externalAuthConfigRef (token exchange → sets Authorization) and headerForward.addHeadersFromSecrets (also sets Authorization) are configured? The existing types have no CEL validation preventing this conflict.
Recommend either:
- Define clear precedence rules (e.g.,
externalAuthConfigRefwins for any header it sets) - Or add CEL/webhook validation rejecting
headerForwardentries that setAuthorizationwhenexternalAuthConfigRefis also present - Document that
headerForwardis for non-auth headers (tenant IDs, correlation IDs) whileexternalAuthConfigRefhandles all authentication
| `status.serverCount` reflect both types of backends in the group. | ||
|
|
||
| ##### Operator: VirtualMCPServer Controller Update | ||
|
|
There was a problem hiding this comment.
Incomplete: MCPGroup controller changes are substantial
The MCPGroup controller update is described in one paragraph but requires significant work:
- New status fields:
ServerEntries []string+ServerEntryCount int(CRD schema change) - New
findReferencingMCPServerEntries()method - New
populateServerEntryStatus()method - Update
handleDeletion()to iterate MCPServerEntry resources (without this, deleting an MCPGroup leaves MCPServerEntries with staleGroupRefValid=Trueconditions) - New watch in
SetupWithManager()for MCPServerEntry - New field index for
spec.groupRefon MCPServerEntry - Update
ConditionTypeMCPServersCheckedmessage (or rename toMembersChecked)
The RFC should enumerate these explicitly so implementation scope is clear.
| HTTP client in `pkg/vmcp/client/client.go` must be updated to: | ||
|
|
||
| 1. Use the system CA certificate pool by default (for public CAs). | ||
| 2. Optionally append a custom CA bundle from `caBundleRef` (for private |
There was a problem hiding this comment.
Implementation gap: No existing per-backend TLS configuration code path
The current newBackendTransport() in pkg/vmcp/client/client.go (line 90) clones http.DefaultTransport with no per-backend TLS customization. For caBundleRef to work, a new mechanism is needed to inject custom CA bundles per backend into the HTTP client factory.
This needs to flow through BackendTarget (adding a CABundle []byte field or similar) or be injected at the transport factory level. The RFC should call out this gap since it's a meaningful implementation change to the vMCP client infrastructure.
Also worth noting: for static mode, the CA bundle ConfigMap must be mounted as a volume into the VirtualMCPServer pod (not the nonexistent MCPServerEntry pod). The mount path convention (e.g., /etc/toolhive/ca-bundles/<entry-name>/ca.crt) and how it's referenced in the generated backend ConfigMap should be specified.
| infrastructure from THV-0014 must be extended to watch MCPServerEntry | ||
| resources. | ||
|
|
||
| The `MCPServerEntryWatcher` follows the same reconciler pattern as the |
There was a problem hiding this comment.
Note: Session recovery is more important for entry backends
The existing mcpSession explicitly documents a "Phase 1 limitation — no reconnection: if the underlying transport drops... all subsequent operations on this backend will fail." For in-cluster backends this is rarely hit (pods are colocated), but MCPServerEntry backends over the internet will experience session drops much more frequently due to:
- Network interruptions
- Remote server restarts/redeployments
- Remote server session TTL expiration
- TLS renegotiation failures
The MCP spec states servers may terminate sessions at any time and clients should re-initialize when receiving HTTP 404 for a known session ID. Consider documenting this as a known limitation with a follow-up plan, or addressing session re-establishment as part of this RFC's implementation phases.
|
|
||
| The `ListWorkloadsInGroup()` function in `pkg/vmcp/workloads/k8s.go` is | ||
| extended to discover MCPServerEntry resources in addition to MCPServer | ||
| resources. For each MCPServerEntry in the group, vMCP: |
There was a problem hiding this comment.
Missing: WorkloadKindMCPServerEntry and WorkloadType constants
The existing WorkloadReference enum in mcpoidcconfig_types.go (line 148-158) defines:
WorkloadKindMCPServer = "MCPServer"
WorkloadKindVirtualMCPServer = "VirtualMCPServer"
WorkloadKindMCPRemoteProxy = "MCPRemoteProxy"And pkg/vmcp/workloads/discoverer.go defines WorkloadTypeMCPServer and WorkloadTypeMCPRemoteProxy.
Both need new constants for MCPServerEntry, plus the +kubebuilder:validation:Enum on WorkloadReference.Kind must be updated. The MCPExternalAuthConfig controller's ReferencingWorkloads tracking also needs to recognize MCPServerEntry references.
Also: BackendReconciler.fetchBackendResource() currently tries MCPServer then MCPRemoteProxy sequentially. A third try-fetch is fine for three types but the RFC should acknowledge this pattern doesn't scale well beyond that.
…le complexity Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Introduces
MCPServerEntry, a new lightweight CRD (short name:mcpentry) that allows VirtualMCPServer to connect directly to remote MCP servers without deploying MCPRemoteProxy infrastructure. MCPServerEntry is a pod-less configuration resource that declares a remote MCP endpoint and belongs to an MCPGroup.Motivation
vMCP currently requires MCPRemoteProxy (which spawns
thv-proxyrunnerpods) to reach remote MCP servers. This creates three problems:externalAuthConfigRefserves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)Design Highlights
externalAuthConfigRefhas one unambiguous purpose (auth to the remote)headerForwardpattern from THV-0026caBundleReffor private remote servers with internal CAsMCPRemoteEndpoint(THV-0067), which unifies direct and proxy modes into a single CRDRelated RFCs
Test plan