Skip to content

refactor: add provider-first device lab tests#542

Merged
thymikee merged 84 commits into
mainfrom
codex/provider-device-lab-refactor
May 18, 2026
Merged

refactor: add provider-first device lab tests#542
thymikee merged 84 commits into
mainfrom
codex/provider-device-lab-refactor

Conversation

@thymikee
Copy link
Copy Markdown
Member

@thymikee thymikee commented May 15, 2026

Summary

This PR implements the accepted Provider-First Integration Scenarios architecture from docs/adr/0001-provider-first-integration-scenarios.md.

The goal is to make agent-device easier to extend across mobile, TV, and desktop without relying on brittle handler-level mocks. We keep Interactor as the high-level semantic API, but introduce request-scoped provider seams below the platform modules so integration tests can run the real daemon request path while replacing only external device/tool boundaries.

Architecture: Before

The old test and runtime shape had three problems:

  • Most command behavior was proven by mocked handler unit tests. These tests often mocked dispatch, platform modules, device readiness, and session state at the same time. They were cheap to write, but they mostly verified local branching and mock wiring.
  • Platform seams were uneven. Android already had an ADB provider seam, while Apple and Linux code paths still reached directly into xcrun, runner transport, helper tools, xdotool, clipboard, screenshots, and logs.
  • Adding command flags/defaults had high blast radius. Defaults and flag plumbing could spread through CLI, client, daemon context, handlers, dispatch, and platform code, with weak integration coverage proving the whole path.

In practice, this made the codebase harder to change safely as support grew across Android, iOS, tvOS, macOS, and Linux.

Architecture: After

This PR moves the system toward one coherent integration suite under test/integration:

  • Provider-backed scenarios live at test/integration/provider-scenarios. They are integration tests, not a separate test layer. The subfolder only names the implementation style: real daemon request path, scripted provider boundaries, no real devices.
  • Real daemon path in tests. Provider-backed scenarios run through request admission, request-scoped providers, handler routing, session state, dispatch, platform translation, and result normalization.
  • External boundaries are provider seams. Tests replace Android ADB, Apple tool/runner/macOS host/helper, Linux desktop/a11y/clipboard/screenshot/input, app-log, recording, and inventory providers instead of mocking handlers.
  • Semantic providers where pressure exists. Android, Apple, Linux, recording, and app-log contracts now expose intent directly enough for future local, remote, or cloud-backed adapters.
  • HTTP is intentionally narrow. Broad provider-backed scenarios use in-process request handling for speed and determinism. HTTP tests remain focused on transport/auth/upload/download behavior.
  • Command coverage is measurable. pnpm test:integration:progress tracks command coverage, flag coverage, mock-heavy unit-test pressure, and generic provider usage.
  • Integration responsibilities are explicit. Root test/integration/*.test.ts keeps process/package/CLI smoke boundaries. test/integration/provider-scenarios owns broader daemon-path scenarios and detailed provider/transport contracts. Pure logic, such as retry behavior, stays in unit tests.
  • Fallow duplication signal is clearer. Duplication is now checked in production mode instead of hiding behind a large duplicate baseline, making real production clone groups visible without flooding CI with legacy test noise.

Benefits

  • Higher confidence per test. Integration tests now cover full workflows instead of only mocked handler branches.
  • Lower long-term test cost. Redundant handler-level success tests were removed where provider-backed integration scenarios now prove the behavior through the real request path.
  • Better extension point for remote/cloud/device-farm backends. New backends can implement provider contracts rather than reverse-engineering command strings.
  • Less platform branching leakage. Provider seams keep Android/Apple/Linux specifics behind named contracts.
  • Clearer command/flag regression coverage. Provider-backed scenarios cover all public commands and all device-observable workflow flags.
  • Improved operational diagnostics. Provider and diagnostic paths are more consistent, including Apple tool diagnostics and Android IME capture reporting.

Trade-offs Accepted

  • Large PR / high review load. This is a broad architecture migration rather than a small feature patch.
  • Coarser failure localization. A provider-backed scenario failure may require reading across request routing, handler, and provider transcript layers. That is the price of testing realistic behavior.
  • Coverage target changed. The original 90% goal was not the right forcing function for entrypoints, subprocess workers, and transport glue. Current coverage is above the enforced floor at about 82.08% statements / 84.20% lines, with provider-backed integration progress tracked separately.
  • Some generic Apple host-tool calls remain local-only. The ADR intentionally leaves mdfind, plutil, otool, sync helpers, and similar low-pressure calls as local until a remote adapter proves the need for semantic promotion.
  • Unit tests still matter for edge cases. The new strategy reduces mock-heavy happy-path tests, but keeps focused unit tests for parsing, error normalization, retries, contracts, and difficult edge cases.

Review Notes

  • Codex P2 about iOS logs bypassing provider seams was a misread: app logs are scoped one layer up through AppLogProvider; local runCmdBackground is only the local provider implementation.
  • Codex P2 about Android fill verification and unknown active IMEs was real and is fixed in this PR. Fill verification now carries the active IME package into hierarchy ownership classification, with regression coverage for vendor IME capture.
  • Final integration cleanup keeps smoke-daemon-http.test.ts as a single real-process HTTP smoke. Detailed HTTP method/auth/upload/lease behavior lives in provider-scenarios/daemon-http-server.test.ts, and pure retry coverage moved from integration to src/utils/__tests__/retry.test.ts.
  • Merge readiness still needs human approval because this PR changes a large amount of architecture and test strategy.

Validation

  • pnpm format
  • pnpm check:quick
  • pnpm check:unit
  • pnpm build
  • pnpm test:integration
  • pnpm test:integration:provider
  • pnpm test:integration:node
  • pnpm test:coverage
  • pnpm check:fallow --base origin/main
  • pnpm test:integration:progress
  • targeted provider-backed integration and platform tests for Android fill/IME, Android lifecycle, iOS runner/session, macOS helper/desktop, install-source, screenshot, and remote daemon scenarios

Current provider-backed integration progress:

  • Public commands covered by provider-backed integration: 43/43
  • Device-observable workflow flags covered by provider-backed integration: 55/55
  • Public CLI flags intentionally outside provider-backed integration: 46
  • Public CLI flags unclassified by progress script: 0

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-542/

Built to branch gh-pages at 2026-05-18 12:19 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ca1f0c2f2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +258 to +261
inputMethodOwned: isAndroidInputMethodOwnedNode({
packageName: node.packageName,
resourceId: node.resourceId,
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Thread active IME package into fill ownership

When a non-hard-coded keyboard package captures the text, this verification path still calls isAndroidInputMethodOwnedNode without the active input method package that getAndroidKeyboardStatusWithAdb now parses. In that scenario actualInput.inputMethodOwned remains false for packages like com.vendor.keyboard, so isAndroidImeCapture does not fire and fill retries/fails as a generic text mismatch instead of reporting the IME-capture condition this change is trying to detect.

Useful? React with 👍 / 👎.

Comment thread src/daemon/app-log-ios.ts
Comment on lines +3 to +4
import { runCmdBackground } from '../utils/exec.ts';
import { runXcrun } from '../platforms/ios/tool-provider.ts';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route iOS log streaming through the provider seam

With a request-scoped Apple tool provider, logs start and logs clear --restart for iOS/macOS sessions still hit the local host because this file keeps runCmdBackground and startAppleAppLogStream uses it directly for xcrun/log streams. In Device Lab or remote-provider contexts running off macOS, those commands bypass the scripted provider and either fail with a missing local tool or attach to the wrong host, even though other Apple tool calls in the same request are scoped.

Useful? React with 👍 / 👎.

@thymikee thymikee force-pushed the codex/provider-device-lab-refactor branch 4 times, most recently from af4a319 to b600a3f Compare May 15, 2026 17:40
@thymikee thymikee merged commit 59d28e8 into main May 18, 2026
19 checks passed
@thymikee thymikee deleted the codex/provider-device-lab-refactor branch May 18, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant