Skip to content

test(smoke-tests): add MCP smoke-test infrastructure and e2e test suites#205

Merged
cameroncooke merged 5 commits intomainfrom
nostalgic-wilbur
Feb 7, 2026
Merged

test(smoke-tests): add MCP smoke-test infrastructure and e2e test suites#205
cameroncooke merged 5 commits intomainfrom
nostalgic-wilbur

Conversation

@cameroncooke
Copy link
Owner

@cameroncooke cameroncooke commented Feb 7, 2026

Adds comprehensive smoke-test and e2e test infrastructure for the MCP server. The test harness boots a real MCP server with in-memory transport and mocked command executors, enabling tests to verify tool registration, invocation, session management, error paths, and command capture without executing real system commands.

What's included

Test harness (mcp-test-harness.ts):

  • In-memory MCP client/server via InMemoryTransport
  • Capturing command executor that records all tool-issued commands
  • Dual-module patching (vitest source modules + built JS modules) for correct singleton isolation
  • Build-directory guard with actionable error message
  • resetCapturedCommands() API for clean test isolation
  • Configurable commandResponses with longest-match-first pattern resolution

Shared test helpers (test-helpers.ts):

  • extractText, isErrorResponse, getContent, expectContent

12 test suites / 145 tests:

  • cli-surface -- CLI help, version, workflow subcommands, tool-specific help, invocation
  • e2e-mcp-discovery -- tool registration, annotations, manifest completeness
  • e2e-mcp-invocation -- representative tools with valid args, error handling
  • e2e-mcp-sessions -- set/show/clear defaults, defaults flowing into tools
  • e2e-mcp-error-paths -- missing defaults, command failures, invalid params
  • e2e-mcp-simulator -- build, test, launch, install, management tools
  • e2e-mcp-device-macos -- device and macOS tool invocation and error paths
  • e2e-mcp-ui-automation -- tap, swipe, button, gesture, key, touch, screenshot
  • e2e-mcp-swift-package -- build, test, run, stop, list
  • e2e-mcp-logging -- sim/device log capture start/stop
  • e2e-mcp-scaffolding -- iOS/macOS project scaffolding
  • e2e-mcp-doctor -- diagnostic output

Production changes (test hooks only):

  • command.ts -- __setTestCommandExecutorOverride, __setTestFileSystemExecutorOverride, __clearTestExecutorOverrides
  • tool-context.ts -- __setTestDebuggerToolContextOverride, __clearTestDebuggerToolContextOverride
  • server-state.ts -- __resetServerStateForTests
  • tool-registry.ts -- __resetToolRegistryForTests (with try-catch guard for post-close safety)

Also includes unit tests for all 8 debugging tools (debugging-tools.test.ts).


Note

Medium Risk
Mostly test-only additions, but it introduces new override/reset hooks in core execution/debugger/server/tool-registry singletons that could affect runtime behavior if misused or accidentally invoked outside tests.

Overview
Adds a dedicated MCP smoke/e2e testing harness (src/smoke-tests/mcp-test-harness.ts) that boots a real server over InMemoryTransport, captures commands via a mocked executor, and resets singletons/registries between runs (including patching both source and built build/ module instances).

Introduces a new test:smoke script and vitest.smoke.config.ts, and adds extensive smoke/e2e suites validating CLI help/version/tool help, MCP tool discovery/registration, representative tool invocation across workflows (simulator/device/macOS/swift-package/logging/ui/scaffolding/doctor), and key error paths.

To support the harness, adds test-only hooks to production code: executor override setters in utils/command.ts (re-exported from utils/execution), debugger tool-context override setters in utils/debugger/tool-context.ts, plus reset helpers in server-state.ts and tool-registry.ts.

Written by Cursor Bugbot for commit 5092733. This will update automatically on new commits. Configure here.

Introduce an in-process MCP test harness that boots a real server with
InMemoryTransport, injects a capturing command executor, and exposes
an MCP client for end-to-end assertions.

Includes four smoke-test suites: CLI surface (help/version/tools),
MCP discovery (listTools coverage), MCP invocation (callTool with
mocked commands), and session management (defaults round-trip).

Adds test-only reset/override hooks in command.ts, server-state.ts,
tool-registry.ts, and debugger/tool-context.ts to support singleton
teardown between test runs. Separates smoke tests from unit tests via
a dedicated vitest.smoke.config.ts and a new test:smoke npm script.
…e types

- Extract shared extractText, isErrorResponse, getContent into test-helpers.ts
- Add resetCapturedCommands() to McpTestHarness interface replacing direct array mutation
- Add build-directory guard with actionable error message in createMcpTestHarness
- Document dynamic import necessity for built module patching
- Replace inline error-checking blocks in device-macos tests with shared isErrorResponse
- Narrow cli-surface.test.ts catch type to NodeJS.ErrnoException
- Remove false-positive "provide" match from isErrorResponse (keep "must provide")
- Set exitCode dynamically based on success in command response mock
- Sort commandResponses by pattern length to prevent shorter patterns shadowing longer ones
- Consolidate createMockFileSystemExecutor() calls to single shared instance
- Add expectContent() helper and replace ~40 inline content-extraction patterns
- Replace remaining inline error-checking blocks with shared isErrorResponse
- Guard tool.remove() with try-catch in __resetToolRegistryForTests for safe cleanup
- Build before smoke tests in test:smoke npm script to prevent stale-build issues
The spread of process.env narrows the type so VITEST and NODE_ENV are
not known properties. Use Record<string, string | undefined> to allow
delete on arbitrary keys.
@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 7, 2026

Open in StackBlitz

npm i https://pkg.pr.new/cameroncooke/XcodeBuildMCP/xcodebuildmcp@205

commit: 5092733

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Walkthrough

This pull request adds comprehensive smoke and end-to-end test infrastructure. It introduces a new MCP test harness framework with mock command execution capabilities, adds extensive test suites covering CLI surface validation, device tooling, simulator workflows, session management, debugging tools, logging, UI automation, Swift packages, and discovery tools. Supporting changes include test helper utilities for result validation, test override mechanisms for command and filesystem executors, debugger context overrides, server state reset functionality, tool registry cleanup, and two Vitest configuration files: one excluding smoke tests from standard test runs and another dedicated to smoke test execution with forked pooling and extended timeouts.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately summarises the main change: adding comprehensive MCP smoke-test infrastructure and e2e test suites.
Description check ✅ Passed The PR description comprehensively describes the changeset, detailing the test harness implementation, shared helpers, 12 test suites with 145 tests, production test hooks, and debugging tool unit tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@src/smoke-tests/__tests__/cli-surface.test.ts`:
- Around line 107-117: The test 'tool with --output json returns valid JSON'
currently ignores failures when runMayFail('simulator list-sims --output json')
returns non-zero; change the test to still validate output on failure by
asserting that result.stdout is present and non-empty (and optionally matches a
JSON-like start such as '{' or '[') so the test fails if no output is produced,
or alternatively log/mark the test as skipped with a clear reason when running
on non-macOS; locate the assertion in this test and update it to check
result.stdout (and/or attempt a JSON.parse guarded by try/catch asserting at
least that stdout exists) rather than doing nothing when result.status !== 0.
- Around line 6-12: The type error occurs in the cliEnv IIFE where env is
created from process.env and then properties are deleted; TypeScript narrows the
spread so delete env.VITEST and delete env.NODE_ENV fail. Fix by asserting env
to a compatible indexable type (e.g., cast the result of { ...process.env,
NO_COLOR: '1' } to NodeJS.ProcessEnv or Record<string, string|undefined>) so the
subsequent delete operations on env.VITEST and env.NODE_ENV compile; update the
env declaration inside the cliEnv IIFE (the env variable used by delete)
accordingly.

In `@src/smoke-tests/__tests__/e2e-mcp-doctor.test.ts`:
- Around line 15-17: The afterAll hook currently calls harness.cleanup()
unguarded, which will throw a TypeError if createMcpTestHarness failed and left
harness undefined; update the afterAll callback to check that the harness
variable is defined (and optionally has a cleanup method) before awaiting
harness.cleanup(), e.g. guard with if (harness) await harness.cleanup();
reference the harness variable and the afterAll block (and createMcpTestHarness
in beforeAll) so the cleanup only runs when setup actually succeeded.

In `@src/smoke-tests/__tests__/e2e-mcp-scaffolding.test.ts`:
- Around line 13-15: The afterAll teardown calls await harness.cleanup() without
guarding against harness being undefined, which can throw if beforeAll failed;
update the afterAll block in e2e-mcp-scaffolding.test.ts to check the harness
before calling cleanup (e.g., if (harness) await harness.cleanup(); or if
(typeof harness?.cleanup === 'function') await harness.cleanup()), referencing
the afterAll callback and the harness variable so the cleanup only runs when
harness is defined.

In `@src/smoke-tests/__tests__/e2e-mcp-sessions.test.ts`:
- Around line 72-95: The negative assertion is fragile because it checks for the
short substring 'App' and '/proj'; update the test "session_clear_defaults
clears all defaults" to use a distinctive scheme and projectPath when setting
defaults (e.g., replace the arguments in the session_set_defaults call from {
scheme: 'App', projectPath: '/proj' } to something unique like { scheme:
'ClearMeScheme', projectPath: '/clear-me-proj' }) and update the expectations to
assert that the result text does not contain 'ClearMeScheme' and does not
contain '/clear-me-proj' when calling session_show_defaults; this makes the
not.toContain checks robust against incidental matches.
🧹 Nitpick comments (9)
src/smoke-tests/__tests__/e2e-mcp-ui-automation.test.ts (2)

66-103: Consider using expectContent for consistency with other test files.

Other test suites (e.g., e2e-mcp-swift-package.test.ts, e2e-mcp-sessions.test.ts) use expectContent(result) instead of the getContent(result) + expect(content.length).toBeGreaterThan(0) two-step. Since expectContent is already available in test-helpers.ts and performs the same check, switching to it would reduce verbosity and align with the convention used elsewhere.


66-77: Repetitive setSimulatorDefaults() + resetCapturedCommands() could live in a beforeEach.

Every happy-path test calls await setSimulatorDefaults(); harness.resetCapturedCommands();. Moving this into a beforeEach within the relevant describe blocks (or a nested describe wrapping the happy-path tests) would reduce boilerplate. The error-paths section already deviates (e.g., clearDefaults), so a nested structure would keep things clean.

src/smoke-tests/__tests__/e2e-mcp-logging.test.ts (1)

22-84: Consider adding a beforeEach to clear session defaults between tests.

Unlike e2e-mcp-sessions.test.ts, this file does not clear session defaults between tests. The start_device_log_cap test (line 54) inherits the simulatorId set by start_sim_log_cap (line 27). While this likely doesn't affect correctness for these particular tools, adding a beforeEach that clears defaults would prevent subtle cross-test contamination as the suite grows.

src/smoke-tests/__tests__/cli-surface.test.ts (1)

13-19: CodeQL: shell command built from environment values.

Static analysis flags line 14 because CLI is interpolated into execSync. Since CLI is derived from __dirname (a controlled path within the repo), this is a false positive in this test context. However, you could avoid the interpolation entirely by using execFileSync with an array, which is also slightly safer against argument-injection edge cases.

Proposed fix using execFileSync
-import { execSync } from 'child_process';
+import { execFileSync } from 'child_process';
 import { resolve } from 'path';
 
 const CLI = resolve(__dirname, '../../../build/cli.js');
@@ -13,8 +13,9 @@
 })();
-const run = (args: string): string => {
-  return execSync(`node ${CLI} ${args}`, {
+const run = (args: string): string => {
+  return execFileSync('node', [CLI, ...args.split(' ')], {
     encoding: 'utf8',
     timeout: 15_000,
     env: cliEnv,
   });
 };

Note: args.split(' ') is sufficient here since all test invocations use simple space-separated arguments without quoting.

src/smoke-tests/__tests__/e2e-mcp-discovery.test.ts (2)

22-40: Magic-number tolerance of 10 in the tool-count assertion may silently mask regressions.

Line 39 allows up to 10 predicate-gated tools to be absent without failing, but this tolerance is fixed regardless of how many predicate-gated tools actually exist. If predicate-gated tools are removed over time, a real registration regression could be hidden. Consider deriving this tolerance from the manifest (e.g. count tools that have predicates) instead of a hardcoded literal.


16-168: Repeated harness.client.listTools() call in every test — consider hoisting.

Each test individually calls listTools(). Since this suite's harness is read-only (no mutations between tests), the tool list could be fetched once in a beforeAll or a shared variable to reduce redundant round-trips. This is purely a minor ergonomic/performance observation for ~18 tests.

src/smoke-tests/test-helpers.ts (1)

7-20: isErrorResponse keyword scanning may produce false positives on legitimate success responses.

The function matches on broad terms like 'required' and 'missing', which could appear in non-error tool output (e.g. "no missing dependencies" or "all required files present"). In practice this is mitigated by the controlled smoke-test environment, but it's worth noting for future test authors.

Consider lowering .toLowerCase() once per item instead of calling it five times:

♻️ Minor optimisation
   return r.content.some(
-    (c) =>
-      typeof c.text === 'string' &&
-      (c.text.toLowerCase().includes('error') ||
-        c.text.toLowerCase().includes('required') ||
-        c.text.toLowerCase().includes('missing') ||
-        c.text.toLowerCase().includes('must provide') ||
-        c.text.toLowerCase().includes('fail')),
+    (c) => {
+      if (typeof c.text !== 'string') return false;
+      const lower = c.text.toLowerCase();
+      return (
+        lower.includes('error') ||
+        lower.includes('required') ||
+        lower.includes('missing') ||
+        lower.includes('must provide') ||
+        lower.includes('fail')
+      );
+    },
   );
src/smoke-tests/__tests__/e2e-mcp-invocation.test.ts (1)

42-69: Broad invocation smoke test — consider logging tool names on failure for easier debugging.

The test calls every registered tool with empty args and records results. If a tool unexpectedly fails (returns ok: false), the assertion at line 67 (expect(r.ok).toBe(true)) won't indicate which tool failed. Since ok is always set to true (both in the try and catch branches), the assertion as written can never fail — which means this test only verifies that calling tools doesn't crash the server, not that they return valid responses.

If the intent is only to verify no crashes, the code is correct but the assertion is vacuous. If you want to catch unexpected protocol errors, you'd need to differentiate between expected schema-validation rejections and unexpected failures.

src/smoke-tests/__tests__/e2e-mcp-simulator.test.ts (1)

75-85: Consider asserting session_set_defaults succeeds.

If setSimulatorSessionDefaults() fails silently, every subsequent test will fail with misleading errors. Adding expectContent(result) or an assertion on the returned value would give a clearer diagnostic when session setup goes wrong.

Proposed improvement
 function setSimulatorSessionDefaults(): Promise<unknown> {
-  return harness.client.callTool({
+  const result = await harness.client.callTool({
     name: 'session_set_defaults',
     arguments: {
       scheme: 'MyApp',
       projectPath: '/path/to/MyApp.xcodeproj',
       simulatorId: SIM_UUID,
       bundleId: 'com.test.MyApp',
     },
   });
+  expectContent(result);
+  return result;
 }

Note: the function signature would need to change to async function setSimulatorSessionDefaults().

Use execFileSync for CLI smoke invocations to avoid shell-string execution and satisfy code scanning feedback.

Require non-empty output in the --output json smoke test even when the tool exits non-zero on non-macOS.

Guard harness cleanup in doctor and scaffolding suites so setup failures do not mask root causes.

Use distinctive session defaults in clear-defaults assertions to avoid fragile substring matches.

Co-Authored-By: Claude <noreply@anthropic.com>
@cameroncooke cameroncooke merged commit a28a57d into main Feb 7, 2026
8 checks passed
@cameroncooke cameroncooke deleted the nostalgic-wilbur branch February 7, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant