test(smoke-tests): add MCP smoke-test infrastructure and e2e test suites#205
test(smoke-tests): add MCP smoke-test infrastructure and e2e test suites#205cameroncooke merged 5 commits intomainfrom
Conversation
Introduce an in-process MCP test harness that boots a real server with InMemoryTransport, injects a capturing command executor, and exposes an MCP client for end-to-end assertions. Includes four smoke-test suites: CLI surface (help/version/tools), MCP discovery (listTools coverage), MCP invocation (callTool with mocked commands), and session management (defaults round-trip). Adds test-only reset/override hooks in command.ts, server-state.ts, tool-registry.ts, and debugger/tool-context.ts to support singleton teardown between test runs. Separates smoke tests from unit tests via a dedicated vitest.smoke.config.ts and a new test:smoke npm script.
…e types - Extract shared extractText, isErrorResponse, getContent into test-helpers.ts - Add resetCapturedCommands() to McpTestHarness interface replacing direct array mutation - Add build-directory guard with actionable error message in createMcpTestHarness - Document dynamic import necessity for built module patching - Replace inline error-checking blocks in device-macos tests with shared isErrorResponse - Narrow cli-surface.test.ts catch type to NodeJS.ErrnoException
- Remove false-positive "provide" match from isErrorResponse (keep "must provide") - Set exitCode dynamically based on success in command response mock - Sort commandResponses by pattern length to prevent shorter patterns shadowing longer ones - Consolidate createMockFileSystemExecutor() calls to single shared instance - Add expectContent() helper and replace ~40 inline content-extraction patterns - Replace remaining inline error-checking blocks with shared isErrorResponse - Guard tool.remove() with try-catch in __resetToolRegistryForTests for safe cleanup - Build before smoke tests in test:smoke npm script to prevent stale-build issues
The spread of process.env narrows the type so VITEST and NODE_ENV are not known properties. Use Record<string, string | undefined> to allow delete on arbitrary keys.
commit: |
WalkthroughThis pull request adds comprehensive smoke and end-to-end test infrastructure. It introduces a new MCP test harness framework with mock command execution capabilities, adds extensive test suites covering CLI surface validation, device tooling, simulator workflows, session management, debugging tools, logging, UI automation, Swift packages, and discovery tools. Supporting changes include test helper utilities for result validation, test override mechanisms for command and filesystem executors, debugger context overrides, server state reset functionality, tool registry cleanup, and two Vitest configuration files: one excluding smoke tests from standard test runs and another dedicated to smoke test execution with forked pooling and extended timeouts. 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@src/smoke-tests/__tests__/cli-surface.test.ts`:
- Around line 107-117: The test 'tool with --output json returns valid JSON'
currently ignores failures when runMayFail('simulator list-sims --output json')
returns non-zero; change the test to still validate output on failure by
asserting that result.stdout is present and non-empty (and optionally matches a
JSON-like start such as '{' or '[') so the test fails if no output is produced,
or alternatively log/mark the test as skipped with a clear reason when running
on non-macOS; locate the assertion in this test and update it to check
result.stdout (and/or attempt a JSON.parse guarded by try/catch asserting at
least that stdout exists) rather than doing nothing when result.status !== 0.
- Around line 6-12: The type error occurs in the cliEnv IIFE where env is
created from process.env and then properties are deleted; TypeScript narrows the
spread so delete env.VITEST and delete env.NODE_ENV fail. Fix by asserting env
to a compatible indexable type (e.g., cast the result of { ...process.env,
NO_COLOR: '1' } to NodeJS.ProcessEnv or Record<string, string|undefined>) so the
subsequent delete operations on env.VITEST and env.NODE_ENV compile; update the
env declaration inside the cliEnv IIFE (the env variable used by delete)
accordingly.
In `@src/smoke-tests/__tests__/e2e-mcp-doctor.test.ts`:
- Around line 15-17: The afterAll hook currently calls harness.cleanup()
unguarded, which will throw a TypeError if createMcpTestHarness failed and left
harness undefined; update the afterAll callback to check that the harness
variable is defined (and optionally has a cleanup method) before awaiting
harness.cleanup(), e.g. guard with if (harness) await harness.cleanup();
reference the harness variable and the afterAll block (and createMcpTestHarness
in beforeAll) so the cleanup only runs when setup actually succeeded.
In `@src/smoke-tests/__tests__/e2e-mcp-scaffolding.test.ts`:
- Around line 13-15: The afterAll teardown calls await harness.cleanup() without
guarding against harness being undefined, which can throw if beforeAll failed;
update the afterAll block in e2e-mcp-scaffolding.test.ts to check the harness
before calling cleanup (e.g., if (harness) await harness.cleanup(); or if
(typeof harness?.cleanup === 'function') await harness.cleanup()), referencing
the afterAll callback and the harness variable so the cleanup only runs when
harness is defined.
In `@src/smoke-tests/__tests__/e2e-mcp-sessions.test.ts`:
- Around line 72-95: The negative assertion is fragile because it checks for the
short substring 'App' and '/proj'; update the test "session_clear_defaults
clears all defaults" to use a distinctive scheme and projectPath when setting
defaults (e.g., replace the arguments in the session_set_defaults call from {
scheme: 'App', projectPath: '/proj' } to something unique like { scheme:
'ClearMeScheme', projectPath: '/clear-me-proj' }) and update the expectations to
assert that the result text does not contain 'ClearMeScheme' and does not
contain '/clear-me-proj' when calling session_show_defaults; this makes the
not.toContain checks robust against incidental matches.
🧹 Nitpick comments (9)
src/smoke-tests/__tests__/e2e-mcp-ui-automation.test.ts (2)
66-103: Consider usingexpectContentfor consistency with other test files.Other test suites (e.g.,
e2e-mcp-swift-package.test.ts,e2e-mcp-sessions.test.ts) useexpectContent(result)instead of thegetContent(result)+expect(content.length).toBeGreaterThan(0)two-step. SinceexpectContentis already available intest-helpers.tsand performs the same check, switching to it would reduce verbosity and align with the convention used elsewhere.
66-77: RepetitivesetSimulatorDefaults()+resetCapturedCommands()could live in abeforeEach.Every happy-path test calls
await setSimulatorDefaults(); harness.resetCapturedCommands();. Moving this into abeforeEachwithin the relevantdescribeblocks (or a nested describe wrapping the happy-path tests) would reduce boilerplate. The error-paths section already deviates (e.g.,clearDefaults), so a nested structure would keep things clean.src/smoke-tests/__tests__/e2e-mcp-logging.test.ts (1)
22-84: Consider adding abeforeEachto clear session defaults between tests.Unlike
e2e-mcp-sessions.test.ts, this file does not clear session defaults between tests. Thestart_device_log_captest (line 54) inherits thesimulatorIdset bystart_sim_log_cap(line 27). While this likely doesn't affect correctness for these particular tools, adding abeforeEachthat clears defaults would prevent subtle cross-test contamination as the suite grows.src/smoke-tests/__tests__/cli-surface.test.ts (1)
13-19: CodeQL: shell command built from environment values.Static analysis flags line 14 because
CLIis interpolated intoexecSync. SinceCLIis derived from__dirname(a controlled path within the repo), this is a false positive in this test context. However, you could avoid the interpolation entirely by usingexecFileSyncwith an array, which is also slightly safer against argument-injection edge cases.Proposed fix using execFileSync
-import { execSync } from 'child_process'; +import { execFileSync } from 'child_process'; import { resolve } from 'path'; const CLI = resolve(__dirname, '../../../build/cli.js'); @@ -13,8 +13,9 @@ })(); -const run = (args: string): string => { - return execSync(`node ${CLI} ${args}`, { +const run = (args: string): string => { + return execFileSync('node', [CLI, ...args.split(' ')], { encoding: 'utf8', timeout: 15_000, env: cliEnv, }); };Note:
args.split(' ')is sufficient here since all test invocations use simple space-separated arguments without quoting.src/smoke-tests/__tests__/e2e-mcp-discovery.test.ts (2)
22-40: Magic-number tolerance of 10 in the tool-count assertion may silently mask regressions.Line 39 allows up to 10 predicate-gated tools to be absent without failing, but this tolerance is fixed regardless of how many predicate-gated tools actually exist. If predicate-gated tools are removed over time, a real registration regression could be hidden. Consider deriving this tolerance from the manifest (e.g. count tools that have predicates) instead of a hardcoded literal.
16-168: Repeatedharness.client.listTools()call in every test — consider hoisting.Each test individually calls
listTools(). Since this suite's harness is read-only (no mutations between tests), the tool list could be fetched once in abeforeAllor a shared variable to reduce redundant round-trips. This is purely a minor ergonomic/performance observation for ~18 tests.src/smoke-tests/test-helpers.ts (1)
7-20:isErrorResponsekeyword scanning may produce false positives on legitimate success responses.The function matches on broad terms like
'required'and'missing', which could appear in non-error tool output (e.g. "no missing dependencies" or "all required files present"). In practice this is mitigated by the controlled smoke-test environment, but it's worth noting for future test authors.Consider lowering
.toLowerCase()once per item instead of calling it five times:♻️ Minor optimisation
return r.content.some( - (c) => - typeof c.text === 'string' && - (c.text.toLowerCase().includes('error') || - c.text.toLowerCase().includes('required') || - c.text.toLowerCase().includes('missing') || - c.text.toLowerCase().includes('must provide') || - c.text.toLowerCase().includes('fail')), + (c) => { + if (typeof c.text !== 'string') return false; + const lower = c.text.toLowerCase(); + return ( + lower.includes('error') || + lower.includes('required') || + lower.includes('missing') || + lower.includes('must provide') || + lower.includes('fail') + ); + }, );src/smoke-tests/__tests__/e2e-mcp-invocation.test.ts (1)
42-69: Broad invocation smoke test — consider logging tool names on failure for easier debugging.The test calls every registered tool with empty args and records results. If a tool unexpectedly fails (returns
ok: false), the assertion at line 67 (expect(r.ok).toBe(true)) won't indicate which tool failed. Sinceokis always set totrue(both in the try and catch branches), the assertion as written can never fail — which means this test only verifies that calling tools doesn't crash the server, not that they return valid responses.If the intent is only to verify no crashes, the code is correct but the assertion is vacuous. If you want to catch unexpected protocol errors, you'd need to differentiate between expected schema-validation rejections and unexpected failures.
src/smoke-tests/__tests__/e2e-mcp-simulator.test.ts (1)
75-85: Consider assertingsession_set_defaultssucceeds.If
setSimulatorSessionDefaults()fails silently, every subsequent test will fail with misleading errors. AddingexpectContent(result)or an assertion on the returned value would give a clearer diagnostic when session setup goes wrong.Proposed improvement
function setSimulatorSessionDefaults(): Promise<unknown> { - return harness.client.callTool({ + const result = await harness.client.callTool({ name: 'session_set_defaults', arguments: { scheme: 'MyApp', projectPath: '/path/to/MyApp.xcodeproj', simulatorId: SIM_UUID, bundleId: 'com.test.MyApp', }, }); + expectContent(result); + return result; }Note: the function signature would need to change to
async function setSimulatorSessionDefaults().
Use execFileSync for CLI smoke invocations to avoid shell-string execution and satisfy code scanning feedback. Require non-empty output in the --output json smoke test even when the tool exits non-zero on non-macOS. Guard harness cleanup in doctor and scaffolding suites so setup failures do not mask root causes. Use distinctive session defaults in clear-defaults assertions to avoid fragile substring matches. Co-Authored-By: Claude <noreply@anthropic.com>
Adds comprehensive smoke-test and e2e test infrastructure for the MCP server. The test harness boots a real MCP server with in-memory transport and mocked command executors, enabling tests to verify tool registration, invocation, session management, error paths, and command capture without executing real system commands.
What's included
Test harness (
mcp-test-harness.ts):InMemoryTransportresetCapturedCommands()API for clean test isolationcommandResponseswith longest-match-first pattern resolutionShared test helpers (
test-helpers.ts):extractText,isErrorResponse,getContent,expectContent12 test suites / 145 tests:
cli-surface-- CLI help, version, workflow subcommands, tool-specific help, invocatione2e-mcp-discovery-- tool registration, annotations, manifest completenesse2e-mcp-invocation-- representative tools with valid args, error handlinge2e-mcp-sessions-- set/show/clear defaults, defaults flowing into toolse2e-mcp-error-paths-- missing defaults, command failures, invalid paramse2e-mcp-simulator-- build, test, launch, install, management toolse2e-mcp-device-macos-- device and macOS tool invocation and error pathse2e-mcp-ui-automation-- tap, swipe, button, gesture, key, touch, screenshote2e-mcp-swift-package-- build, test, run, stop, liste2e-mcp-logging-- sim/device log capture start/stope2e-mcp-scaffolding-- iOS/macOS project scaffoldinge2e-mcp-doctor-- diagnostic outputProduction changes (test hooks only):
command.ts--__setTestCommandExecutorOverride,__setTestFileSystemExecutorOverride,__clearTestExecutorOverridestool-context.ts--__setTestDebuggerToolContextOverride,__clearTestDebuggerToolContextOverrideserver-state.ts--__resetServerStateForTeststool-registry.ts--__resetToolRegistryForTests(with try-catch guard for post-close safety)Also includes unit tests for all 8 debugging tools (
debugging-tools.test.ts).Note
Medium Risk
Mostly test-only additions, but it introduces new override/reset hooks in core execution/debugger/server/tool-registry singletons that could affect runtime behavior if misused or accidentally invoked outside tests.
Overview
Adds a dedicated MCP smoke/e2e testing harness (
src/smoke-tests/mcp-test-harness.ts) that boots a real server overInMemoryTransport, captures commands via a mocked executor, and resets singletons/registries between runs (including patching both source and builtbuild/module instances).Introduces a new
test:smokescript andvitest.smoke.config.ts, and adds extensive smoke/e2e suites validating CLI help/version/tool help, MCP tool discovery/registration, representative tool invocation across workflows (simulator/device/macOS/swift-package/logging/ui/scaffolding/doctor), and key error paths.To support the harness, adds test-only hooks to production code: executor override setters in
utils/command.ts(re-exported fromutils/execution), debugger tool-context override setters inutils/debugger/tool-context.ts, plus reset helpers inserver-state.tsandtool-registry.ts.Written by Cursor Bugbot for commit 5092733. This will update automatically on new commits. Configure here.