Skip to content

Add E2E coverage for newly added RPC methods across all SDKs#1610

Merged
stephentoub merged 2 commits into
mainfrom
stephentoub/csharp-rpc-e2e-coverage
Jun 9, 2026
Merged

Add E2E coverage for newly added RPC methods across all SDKs#1610
stephentoub merged 2 commits into
mainfrom
stephentoub/csharp-rpc-e2e-coverage

Conversation

@stephentoub

Copy link
Copy Markdown
Collaborator

Why

We aim for 100% of the JSON-RPC surface to be exercised by meaningful E2E tests, so the SDKs stay provably wired up to the runtime and the tests act as a behavioral backstop. A batch of RPC methods was recently added without E2E coverage. This fills that gap.

What

Adds real-ish E2E tests (asserting meaningful values, not just "no error") for the previously uncovered surface:

  • Server plugins / marketplaces - install/list/uninstall, enable/disable, update single/all, local marketplace browse/refresh/remove, MCP config cache reload, direct local install with deprecation warning
  • Server remote-control - status reporting, compare-and-swap transfer rejection, no-op steering when off, reaching the runtime for an unknown session
  • MCP server lifecycle - start/restart/stop, list tools + running status, register/unregister external client, reload with config, GitHub MCP config, OAuth-without-pending-request, error when listing tools for an unconnected server
  • Session-state extras - allowAll permissions get/set, current tool metadata, telemetry engagement id, list models, empty SQL todos, reload session plugins, idle activity
  • User-requested shell - execute and cancel
  • UI ephemeral query - single model-backed answer
  • Server misc - user-settings reload, attachment gating from non-extension connections, agent-registry spawn-gate, sessions.open without context, runtime.shutdown

Approach

Authored first in C# (35 tests / 7 files) so the assertions and isolation patterns could be reviewed, then ported faithfully to Python, Go, Rust, and TypeScript. All five suites share the same 35 recorded snapshots under test/snapshots/, so every language replays identical runtime exchanges. 35 tests per language, 175 total.

Designed against CI flakiness

These were written and reviewed specifically to avoid nondeterminism:

  • Shell commands pass the bare script body. The runtime wraps it in the platform shell (pwsh -Command / sh -c) itself, so there is no nested shell that could be orphaned on cancel and keep the work dir locked (this was a real Windows fixture-cleanup failure that the fix eliminates).
  • Synchronization is condition-based polling, never fixed sleeps or timeouts-as-sync.
  • Sessions are disposed deterministically; leaked sessions use scoped disposal.
  • Error-path tests assert on the specific domain error, not only a generic "unhandled method", so a regression that changes the failure reason is caught.
  • Plugin/marketplace and session-spawning tests isolate a per-test home directory.
  • TypeScript sets explicit generous per-test timeouts because vitest's 30s global would otherwise preempt the 60s internal condition budgets.

Notes for reviewers

  • The Python harness gains a small additive wait_for_condition helper and skips snapshot writes under GITHUB_ACTIONS, matching the other suites' replay behavior.
  • No snapshots were edited or added beyond the 35 shared recordings; all five languages reuse them.
  • Format/lint verified clean in every language: dotnet format, ruff, gofmt + go vet, cargo fmt, prettier.

Adds meaningful end-to-end tests for RPC surface area that was previously
uncovered: server plugins/marketplaces, server remote-control, MCP server
lifecycle, session-state extras, user-requested shell exec/cancel, the UI
ephemeral query, and miscellaneous server methods (settings reload, attachments
gating, agent-registry spawn gate, sessions.open, runtime.shutdown).

Authored first in C# (35 tests / 7 files) so the assertions and isolation
patterns could be reviewed, then ported faithfully to Python, Go, Rust, and
TypeScript. All five suites share the same 35 recorded snapshots under
test/snapshots so they exercise identical runtime exchanges.

The tests are written to be deterministic in CI: shell commands pass the bare
script body (the runtime wraps it in the platform shell itself, so no nested
shell can be orphaned on cancel and lock the work dir), synchronization is
condition-based polling rather than fixed sleeps, sessions are disposed
deterministically, and error-path tests assert on the specific domain error
rather than only a generic "unhandled method". Plugin/marketplace and
session-spawning tests isolate per-test home directories. The Python harness
gains a small wait_for_condition helper and skips snapshot writes under
GITHUB_ACTIONS to match the other suites.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 9, 2026 17:07
@stephentoub stephentoub requested a review from a team as a code owner June 9, 2026 17:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cross-SDK end-to-end coverage for recently added JSON-RPC methods, using the shared replaying proxy/snapshot harness so each language suite exercises the same recorded runtime exchanges.

Changes:

  • Added new E2E test suites across Rust, Python, Go, Node.js, and .NET for UI ephemeral query, user-requested shell execute/cancel, session-state “extras”, server remote-control, server plugins/marketplaces/misc, and MCP lifecycle.
  • Added/updated shared replay snapshots under test/snapshots/ for the new RPC coverage categories.
  • Extended the Python E2E harness with a polling helper (wait_for_condition) and adjusted teardown to skip writing snapshot cache in GitHub Actions.
Show a summary per file
File Description
test/snapshots/rpc_ui_ephemeral_query/should_answer_ephemeral_query.yaml Snapshot for UI ephemeral query replay.
test/snapshots/rpc_shell_user_requested/should_execute_user_requested_shell_command.yaml Snapshot for user-requested shell execute replay (no model conversation).
test/snapshots/rpc_shell_user_requested/should_cancel_user_requested_shell_command.yaml Snapshot for user-requested shell cancel replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_report_session_activity_when_idle.yaml Snapshot for session activity RPC replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_reload_session_plugins.yaml Snapshot for session plugin reload replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_read_empty_sql_todos_for_fresh_session.yaml Snapshot for SQL todos RPC replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_list_models_for_session.yaml Snapshot for session model list replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_get_telemetry_engagement_id.yaml Snapshot for telemetry engagement id replay (no model conversation).
test/snapshots/rpc_session_state_extras/should_get_current_tool_metadata_after_initialization.yaml Snapshot for initializing a turn prior to tool metadata query.
test/snapshots/rpc_session_state_extras/should_get_and_set_allowall_permissions.yaml Snapshot for allow-all permissions replay (no model conversation).
test/snapshots/rpc_server_remote_control/should_treat_set_steering_as_no_op_when_off.yaml Snapshot for remote-control steering replay (no model conversation).
test/snapshots/rpc_server_remote_control/should_report_remote_control_status_as_off.yaml Snapshot for remote-control status replay (no model conversation).
test/snapshots/rpc_server_remote_control/should_report_not_stopped_when_remote_control_is_off.yaml Snapshot for remote-control stop replay (no model conversation).
test/snapshots/rpc_server_remote_control/should_reject_transfer_when_off_with_compare_and_swap.yaml Snapshot for remote-control transfer replay (no model conversation).
test/snapshots/rpc_server_remote_control/should_reach_runtime_when_starting_remote_control_for_unknown_session.yaml Snapshot for remote-control start unknown-session failure-path replay.
test/snapshots/rpc_server_plugins/should_update_single_marketplace_plugin.yaml Snapshot for plugin update replay (no model conversation).
test/snapshots/rpc_server_plugins/should_update_all_installed_plugins.yaml Snapshot for plugin update-all replay (no model conversation).
test/snapshots/rpc_server_plugins/should_reload_mcp_config_cache.yaml Snapshot for MCP config cache reload replay (no model conversation).
test/snapshots/rpc_server_plugins/should_list_browse_refresh_and_remove_local_marketplace.yaml Snapshot for marketplace list/browse/refresh/remove replay (no model conversation).
test/snapshots/rpc_server_plugins/should_install_list_and_uninstall_plugin_from_local_marketplace.yaml Snapshot for marketplace plugin install/list/uninstall replay (no model conversation).
test/snapshots/rpc_server_plugins/should_install_direct_local_plugin_with_deprecation_warning.yaml Snapshot for direct local plugin install replay (no model conversation).
test/snapshots/rpc_server_plugins/should_enable_and_disable_marketplace_plugin.yaml Snapshot for plugin enable/disable replay (no model conversation).
test/snapshots/rpc_server_misc/should_shut_down_owned_runtime.yaml Snapshot for runtime shutdown replay (no model conversation).
test/snapshots/rpc_server_misc/should_report_not_found_when_opening_session_without_context.yaml Snapshot for sessions.open not_found replay (no model conversation).
test/snapshots/rpc_server_misc/should_report_agent_registry_spawn_gate_closed.yaml Snapshot for agent registry spawn gate replay (no model conversation).
test/snapshots/rpc_server_misc/should_reload_user_settings.yaml Snapshot for user settings reload replay (no model conversation).
test/snapshots/rpc_server_misc/should_reject_send_attachments_from_non_extension_connection.yaml Snapshot for extensions attachment guard replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_throw_when_listing_tools_for_unconnected_server.yaml Snapshot for MCP list-tools error-path replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_stop_running_mcp_server.yaml Snapshot for MCP stop-server replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_start_and_restart_mcp_server.yaml Snapshot for MCP start/restart replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_respond_to_mcp_oauth_request_without_pending_request.yaml Snapshot for MCP oauth.respond no-op replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_reload_mcp_servers_with_config.yaml Snapshot for MCP reload-with-config replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_register_and_unregister_external_mcp_client.yaml Snapshot for MCP external client register/unregister replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_list_tools_and_report_running_status_for_connected_server.yaml Snapshot for MCP list-tools + isServerRunning replay (no model conversation).
test/snapshots/rpc_mcp_lifecycle/should_configure_github_mcp_server.yaml Snapshot for MCP configureGitHub replay (no model conversation).
rust/tests/e2e/rpc_ui_ephemeral_query.rs Rust E2E for session UI ephemeral query.
rust/tests/e2e/rpc_shell_user_requested.rs Rust E2E for shell execute/cancel user-requested flows.
rust/tests/e2e/rpc_session_state_extras.rs Rust E2E for additional session-scoped RPCs (models/activity/permissions/etc).
rust/tests/e2e/rpc_server_remote_control.rs Rust E2E for server-scoped remote-control RPCs.
rust/tests/e2e/rpc_server_plugins.rs Rust E2E for plugin + marketplace server RPCs and MCP config reload.
rust/tests/e2e/rpc_server_misc.rs Rust E2E for misc server RPCs (settings reload, shutdown, sessions.open, attachments guard, agent registry gate).
rust/tests/e2e/rpc_mcp_lifecycle.rs Rust E2E for MCP lifecycle RPCs (list tools, running, stop/start/restart, reloadWithConfig, configureGitHub, oauth.respond).
rust/tests/e2e.rs Wires new Rust E2E modules into the Rust test suite.
python/e2e/testharness/helper.py Adds wait_for_condition polling helper for Python E2E tests.
python/e2e/testharness/init.py Exports wait_for_condition from the Python E2E harness package.
python/e2e/test_rpc_ui_ephemeral_query_e2e.py Python E2E for session UI ephemeral query.
python/e2e/test_rpc_shell_user_requested_e2e.py Python E2E for shell execute/cancel user-requested flows.
python/e2e/test_rpc_session_state_extras_e2e.py Python E2E for additional session-scoped RPCs.
python/e2e/test_rpc_server_remote_control_e2e.py Python E2E for server-scoped remote-control RPCs.
python/e2e/test_rpc_server_plugins_e2e.py Python E2E for plugin + marketplace server RPCs and MCP config reload.
python/e2e/test_rpc_server_misc_e2e.py Python E2E for misc server RPCs.
python/e2e/test_rpc_mcp_lifecycle_e2e.py Python E2E for MCP lifecycle RPCs.
python/e2e/conftest.py Skips writing snapshot cache on CI (and when failures occur) to avoid corruption.
nodejs/test/e2e/rpc_ui_ephemeral_query.e2e.test.ts Node E2E for session UI ephemeral query.
nodejs/test/e2e/rpc_shell_user_requested.e2e.test.ts Node E2E for user-requested shell execute/cancel with polling/cleanup.
nodejs/test/e2e/rpc_session_state_extras.e2e.test.ts Node E2E for additional session-scoped RPCs.
nodejs/test/e2e/rpc_server_remote_control.e2e.test.ts Node E2E for server-scoped remote-control RPCs.
nodejs/test/e2e/rpc_server_plugins.e2e.test.ts Node E2E for plugin + marketplace server RPCs and MCP config reload.
nodejs/test/e2e/rpc_server_misc.e2e.test.ts Node E2E for misc server RPCs.
nodejs/test/e2e/rpc_mcp_lifecycle.e2e.test.ts Node E2E for MCP lifecycle RPCs.
go/internal/e2e/rpc_ui_ephemeral_query_e2e_test.go Go E2E for session UI ephemeral query.
go/internal/e2e/rpc_shell_user_requested_e2e_test.go Go E2E for user-requested shell execute/cancel with polling/cleanup.
go/internal/e2e/rpc_session_state_extras_e2e_test.go Go E2E for additional session-scoped RPCs.
go/internal/e2e/rpc_server_remote_control_e2e_test.go Go E2E for server-scoped remote-control RPCs.
go/internal/e2e/rpc_server_plugins_e2e_test.go Go E2E for plugin + marketplace server RPCs and MCP config reload.
go/internal/e2e/rpc_server_misc_e2e_test.go Go E2E for misc server RPCs.
go/internal/e2e/rpc_mcp_lifecycle_e2e_test.go Go E2E for MCP lifecycle RPCs.
dotnet/test/E2E/RpcUiEphemeralQueryE2ETests.cs .NET E2E for session UI ephemeral query.
dotnet/test/E2E/RpcShellUserRequestedE2ETests.cs .NET E2E for user-requested shell execute/cancel.
dotnet/test/E2E/RpcSessionStateExtrasE2ETests.cs .NET E2E for additional session-scoped RPCs.
dotnet/test/E2E/RpcServerRemoteControlE2ETests.cs .NET E2E for server-scoped remote-control RPCs.
dotnet/test/E2E/RpcServerPluginsE2ETests.cs .NET E2E for plugin + marketplace server RPCs and MCP config reload.
dotnet/test/E2E/RpcServerMiscE2ETests.cs .NET E2E for misc server RPCs.
dotnet/test/E2E/RpcMcpLifecycleE2ETests.cs .NET E2E for MCP lifecycle RPCs.

Copilot's findings

  • Files reviewed: 74/74 changed files
  • Comments generated: 3

Comment thread rust/tests/e2e/rpc_shell_user_requested.rs Outdated
Comment thread rust/tests/e2e/rpc_mcp_lifecycle.rs
Comment thread rust/tests/e2e/rpc_shell_user_requested.rs Outdated
@github-actions

This comment has been minimized.

Comment thread dotnet/test/E2E/RpcServerMiscE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerMiscE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerMiscE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerMiscE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerMiscE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerPluginsE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerPluginsE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerPluginsE2ETests.cs
Comment thread dotnet/test/E2E/RpcServerRemoteControlE2ETests.cs
Comment thread dotnet/test/E2E/RpcShellUserRequestedE2ETests.cs
In the user-requested shell cancel test, the spawned execute_user_requested
JoinHandle was moved into tokio::time::timeout and dropped on the timeout
path. Dropping a JoinHandle detaches the task rather than cancelling it, so a
timed-out shell command would keep running in the background and could hold
file handles open, destabilizing later tests. Await the handle by mutable
reference and abort() it before panicking so the failure path cleans up after
itself.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Cross-SDK Consistency Review

This PR adds 35 E2E tests per language (175 total) across .NET, Go, Node.js/TypeScript, Python, and Rust — great coverage breadth! However, Java is the one SDK that was not ported.

What's missing

Java is the 6th SDK in this repo and already has a complete E2E test infrastructure (E2ETestContext, SessionConfigE2ETest, SessionEventsE2ETest). All 7 test groups added in this PR map directly to Java-generated RPC API classes that already exist:

Test group Java API class(es)
rpc_mcp_lifecycle SessionMcpApi (listTools, isServerRunning, startServer, restartServer, stopServer, registerExternalClient, unregisterExternalClient, reloadWithConfig, configureGitHub, SessionMcpOauthApi.respond)
rpc_server_misc ServerUserSettingsApi.reload, ServerAgentRegistryApi.spawn, ServerRuntimeApi.shutdown, ServerSessionsApi.open, SessionExtensionsApi.sendAttachmentsToMessage
rpc_server_plugins ServerPluginsApi + ServerPluginsMarketplacesApi
rpc_server_remote_control ServerSessionsApi (getRemoteControlStatus, setRemoteControlSteering, stopRemoteControl, transferRemoteControl, startRemoteControl)
rpc_session_state_extras SessionModelApi.list, SessionMetadataApi.activity, SessionPermissionsApi (getAllowAll, setAllowAll), SessionPlanApi.readSqlTodos, SessionTelemetryApi.getEngagementId, SessionToolsApi.getCurrentMetadata, SessionPluginsApi (reload, list)
rpc_shell_user_requested SessionShellApi (executeUserRequested, cancelUserRequested)
rpc_ui_ephemeral_query SessionUiApi.ephemeralQuery

Since all five other languages share the same 35 recorded snapshots under test/snapshots/, the Java port would reuse the exact same YAML fixtures — no new snapshots needed.

Suggestion

Consider adding the 7 missing Java test files in java/src/test/java/com/github/copilot/ (e.g., RpcMcpLifecycleE2ETest.java, RpcServerMiscE2ETest.java, etc.) following the same pattern as the existing SessionConfigE2ETest.java. The Java test harness and all underlying RPC API methods are already in place.

Generated by SDK Consistency Review Agent for issue #1610 · sonnet46 2.8M ·

@stephentoub stephentoub merged commit 3cbeae5 into main Jun 9, 2026
40 checks passed
@stephentoub stephentoub deleted the stephentoub/csharp-rpc-e2e-coverage branch June 9, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants