Add E2E acceptance test for backend routing after proxy restart#4574
Open
Add E2E acceptance test for backend routing after proxy restart#4574
Conversation
MCPServer supports horizontal scaling with Redis session storage, but there was no E2E test verifying that a session established on one pod is accessible from a different pod. This test deploys an MCPServer with replicas=2 and Redis session storage, initializes an MCP session, then sends raw JSON-RPC requests directly to each pod IP using the same Mcp-Session-Id header to prove sessions are shared via Redis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pod IPs are not reachable from the CI runner host in Kind clusters. Replace direct pod IP HTTP calls with kubectl port-forward to each pod, which tunnels through the Kind node's network. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The MCPServer CRD's sessionStorage config was populated by the operator into RunConfig but the proxy runner never read it — sessions always used in-memory LocalStorage, making cross-replica routing non-functional. Add WithSessionStorage transport option and wire ScalingConfig.SessionRedis from RunConfig into the transport layer so both StdioTransport and HTTPTransport (transparent proxy) use Redis-backed session storage when configured. Rewrite the E2E test to use mcp-go clients throughout, including transport.WithSession to create a client on pod B that reuses the session established on pod A. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass SessionStorage through types.Config instead of a factory option with interface assertion. The factory now sets the field directly on each transport type during construction. Add clientA to codespell ignore list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When backendReplicas > 1 and the proxy runner restarts, the recovered session's backend_url points to the ClusterIP service. Kube-proxy may route to a backend pod that never handled the session's initialize, causing HTTP 404 / JSON-RPC -32001 "session not found" errors. This test creates an MCPServer with backendReplicas=2, establishes a session, deletes the proxy runner pod, and sends 5 requests with the same session ID. With 2 backends and random routing, P(all 5 hit the correct pod) ≈ 3%, making the failure reliably detectable. The test is expected to fail until pod-specific headless DNS routing is implemented in the proxy runner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4574 +/- ##
==========================================
- Coverage 69.06% 68.84% -0.23%
==========================================
Files 502 505 +3
Lines 51997 52408 +411
==========================================
+ Hits 35913 36078 +165
- Misses 13300 13542 +242
- Partials 2784 2788 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The namespace parameter is always defaultNamespace in current tests but is kept for reusability across future test contexts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backendReplicas > 1and the proxy runner restarts, the recovered session'sbackend_urlpoints to the ClusterIP service. Kube-proxy may route to a backend pod that never handled the session'sinitialize, causing HTTP 404 / JSON-RPC-32001"session not found" errors.backendReplicas=2, establishes a session, deletes the proxy runner pod, and sends 5 requests with the same session ID. With 2 backends and random routing, P(all 5 hit correct pod) ≈ 3%, making the failure reliably detectable.Type of change
Test plan
task test-e2e) — this PR adds the test; it is expected to fail on CI, proving the bug existstask test) — no production code changed, existing tests unaffectedDoes this introduce a user-facing change?
No
Special notes for reviewers
This test is intentionally expected to fail on CI. It demonstrates the backend routing bug that will be fixed in a follow-up PR implementing pod-specific headless DNS routing. The test context
"backendReplicas=2 and proxy runner restarts"will pass once the fix lands.Generated with Claude Code