|
17 | 17 |
|
18 | 18 | - NEVER mock external services in tests — use real implementations (Docker containers for databases/services, real HTTP servers for network tests, real binaries for CLI tool tests) |
19 | 19 | - tests that validate sandbox behavior MUST run code through the secure-exec sandbox (NodeRuntime/proc.exec()), never directly on the host |
20 | | -- NOTHING runs on the host except Docker containers for e2e-docker tests — all CLI tool code executes inside the sandbox VM |
21 | | -- Pi, Claude Code, and OpenCode are ALL pure JavaScript/TypeScript — they ALL run in-VM via `import()` through `kernel.openShell()` or `kernel.spawn()` |
22 | | -- Claude Code is a bundled ESM Node.js script (`@anthropic-ai/claude-code/cli.js`), not a native binary — its `.node` addons (tree-sitter, audio-capture) are optional and gracefully degrade |
23 | | -- OpenCode is TypeScript (https://github.com/anomalyco/opencode) — the distributed `opencode` command is a Bun-compiled binary but the source is vanilla TS/JS; build the JS bundle from source and run it in-VM |
24 | | -- if the sandbox can't run something, that is a secure-exec bug to fix, not a reason to spawn on the host |
25 | | -- NEVER work around sandbox limitations with host-side execution — this is the #1 rule for CLI tool tests: |
26 | | - - do NOT use `child_process.spawn` or `child_process.spawnSync` from INSIDE sandbox code to run a tool binary on the host (e.g. `spawnSync('claude', [...args])` routing through the child_process bridge) — this is still host execution, the tool's JS runs on the host not in the V8 isolate |
27 | | - - do NOT use `node:child_process.spawn` from TEST code to run tools on the host |
28 | | - - do NOT create `HostBinaryDriver` classes that spawn binaries on the host |
29 | | - - do NOT use `script -qefc` or `python3 pty.spawn` to give host processes a PTY |
30 | | - - do NOT add `sandboxSkip` / probe-based skip logic that silently skips when the sandbox can't do something |
31 | | - - do NOT mark a story as passing if the tool runs on the host instead of in the V8 isolate |
32 | | - - the ONLY correct pattern is: `kernel.spawn('node', ['-e', 'import("tool-entry.js")'])` or equivalent — the tool's JavaScript executes inside the V8 sandbox isolate |
33 | | - - if `import()` hangs, if ESM loading fails, if the TUI crashes — those are secure-exec bugs to fix in packages/nodejs/src/, packages/core/src/, or native/v8-runtime/src/ |
| 20 | +- CLI tool tests (Pi, Claude Code, OpenCode) must execute inside the sandbox: Pi runs as JS in the VM, Claude Code and OpenCode spawn their binaries via the sandbox's child_process.spawn bridge |
34 | 21 | - e2e-docker fixtures connect to real Docker containers (Postgres, MySQL, Redis, SSH/SFTP) — skip gracefully via `skipUnlessDocker()` when Docker is unavailable |
35 | 22 | - interactive/PTY tests must use `kernel.openShell()` with `@xterm/headless`, not host PTY via `script -qefc` |
36 | | -- CLI tool tests (Pi, Claude Code, OpenCode) must support both mock and real LLM API tokens: |
37 | | - - check `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` env vars at test startup |
38 | | - - if a real token is present, use it instead of the mock LLM server — this validates true e2e behavior |
39 | | - - Pi supports both Anthropic and OpenAI tokens; OpenCode uses OpenAI; Claude Code uses Anthropic |
40 | | - - log which mode each test suite is using at startup: `"Using real ANTHROPIC_API_KEY"`, `"Using real OPENAI_API_KEY"`, or `"Using mock LLM server"` |
41 | | - - tests must pass with both mock and real tokens — mock is the fallback, real is preferred |
42 | | - - to run with real tokens locally: `source ~/misc/env.txt` before running tests |
43 | | - - real-token tests may use longer timeouts (up to 60s) since they hit external APIs |
| 23 | + |
| 24 | +### POSIX Conformance Test Integrity |
| 25 | + |
| 26 | +- **no test-only workarounds** — if a C override fixes broken libc behavior (fcntl, realloc, strfmon, etc.), it MUST go in the patched sysroot (`native/wasmvm/patches/wasi-libc/`) so all WASM programs get the fix; never link overrides only into test binaries — that inflates conformance numbers while real users still hit the bug |
| 27 | +- **never replace upstream test source files** — if an os-test `.c` file fails due to a platform difference (e.g. `sizeof(long)`), exclude it via `posix-exclusions.json` with the real reason; do not swap in a rewritten version that changes what the test validates |
| 28 | +- **kernel behavior belongs in the kernel, not the test runner** — if a test requires runtime state (POSIX directories like `/tmp`, `/usr`, device nodes, etc.), implement it in the kernel/device-layer so all users get it; the test runner should not create kernel state that real users won't have |
| 29 | +- **no suite-specific VFS special-casing** — the test runner must not branch on suite name to inject different filesystem state; if a test needs files to exist, either the kernel should provide them or the test should be excluded |
| 30 | +- **categorize exclusions honestly** — if a failure is fixable with a patch or build flag, it's `implementation-gap`, not `wasm-limitation`; reserve `wasm-limitation` for things genuinely impossible in wasm32-wasip1 (no 80-bit long double, no fork, no mmap) |
44 | 31 |
|
45 | 32 | ## Tooling |
46 | 33 |
|
|
54 | 41 | - check GitHub Actions test/typecheck status per commit to identify when a failure first appeared |
55 | 42 | - do not use `contract` in test filenames; use names like `suite`, `behavior`, `parity`, `integration`, or `policy` instead |
56 | 43 |
|
| 44 | +## GitHub Issues |
| 45 | + |
| 46 | +- when fixing a bug or implementation gap tracked by a GitHub issue, close the issue in the same PR using `gh issue close <number> --comment "Fixed in <commit-hash>"` |
| 47 | +- when removing a test from `posix-exclusions.json` because the fix landed, close the linked issue |
| 48 | +- do not leave resolved issues open — verify with `gh issue view <number>` if unsure |
| 49 | + |
57 | 50 | ## Tool Integration Policy |
58 | 51 |
|
59 | 52 | - NEVER implement a from-scratch reimplementation of a tool when the PRD specifies using an existing upstream project (e.g., codex, curl, git, make) |
|
0 commit comments