docs: add TEST-NEEDS.md and/or PROOF-NEEDS.md from audit

hyperpolymath · claude · hyperpolymath · commit 892f2c272c6b · 2026-03-30T13:23:00.000+01:00
Documents testing and proof gaps identified during batch audit.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/PROOF-NEEDS.md b/PROOF-NEEDS.md
@@ -0,0 +1,22 @@
+# Proof Requirements
+
+## Current state
+- `src/abi/Types.idr` (194 lines) — System operations types
+- `src/abi/Layout.idr` (177 lines) — Memory layout
+- `src/abi/Foreign.idr` (217 lines) — FFI declarations
+- No dangerous patterns in ABI layer
+- 109K lines; includes emergency-room, session-sentinel, and system management tools
+- Claims: "panic-safe intake", safety and trust principles
+
+## What needs proving
+- **Emergency room idempotency**: Prove that emergency stabilization operations are idempotent (running twice does not cause harm)
+- **Session sentinel state machine**: Prove the session lifecycle (start -> active -> suspended -> terminated) has no invalid transitions or resource leaks
+- **Service restart safety**: Prove restart/recovery operations do not corrupt persistent state
+- **Privilege escalation prevention**: Prove system operations respect the principle of least privilege (no operation escalates beyond its declared scope)
+- **Rollback atomicity**: Prove that failed operations roll back completely (no partial state)
+
+## Recommended prover
+- **Idris2** — State machines and idempotency properties are natural fits for dependent types
+
+## Priority
+- **MEDIUM** — AmbientOps manages system operations where incorrect behavior can destabilize the host. The emergency-room and session-sentinel components have the highest proof priority within the monorepo.
diff --git a/TEST-NEEDS.md b/TEST-NEEDS.md
@@ -0,0 +1,90 @@
+# Test & Benchmark Requirements
+
+## Current State
+- Unit tests: ~69 Elixir test files + 2 Gleam test files + ~17 Zig integration tests — counts unknown (cannot run mix test / gleam test without correct versions)
+- Integration tests: partial (Zig FFI integration tests exist)
+- E2E tests: NONE
+- Benchmarks: 2 files (czech_file_knife_bench.rs, benchmark_database.jl)
+- panic-attack scan: NEVER RUN
+
+## What's Missing
+### Point-to-Point (P2P)
+This is a monorepo with 20+ components. Coverage is extremely uneven:
+
+#### Tested (Elixir — 69 test files)
+- observatory/ — has tests
+- network-dashboard/ — has tests
+- composer/ — has Gleam tests (2 files)
+
+#### UNTESTED Components
+- **clinician/** (Rust) — Cargo.toml exists, 0 test files
+- **hardware-crash-team/** (Rust) — Cargo.toml exists, 0 test files
+- **contracts-rust/** (Rust) — Cargo.toml exists, 0 test files
+- **czech-file-knife/** (Rust) — bench file exists but 0 test files
+- **displace/** — no tests
+- **emergency-button/** — no tests
+- **emergency-room/** — no tests
+- **nano-aider/** — no tests
+- **nerdsafe-restart/** — no tests
+- **network-orchestrator/** — no tests
+- **nick-shells/** — no tests
+- **panoptes/** — no tests
+- **session-sentinel/** — no tests (Ephapax rewrite WIP)
+- **broad-spectrum/** — no tests
+- **cicada/** — no tests
+- **ambulances/** — no tests
+- **immutable-linux-auditor/** — no tests
+- **hybrid-automation-router/** — no tests
+- **ffi/fuse/** (Zig — 7+ files) — only template integration test
+- **ffi/systemd/** (Zig) — only template integration test
+- **monitoring/systems-observatory/** (Julia) — no tests
+- **contracts/** (Deno) — no tests
+
+Total: 163 Rust + 121 Elixir + 73 Zig + 46 Julia + 79 ReScript + 44 V source files.
+Test coverage concentrated in Elixir components only.
+
+### End-to-End (E2E)
+- Full system health monitoring pipeline (observatory -> alerts -> emergency-room)
+- Network dashboard monitoring cycle
+- Hardware crash detection and recovery workflow
+- Immutable Linux audit cycle
+- Session sentinel lifecycle
+- FUSE filesystem mount/unmount/operations cycle
+- Systemd unit management workflow
+- Composer plan execution
+
+### Aspect Tests
+- [ ] Security (FUSE filesystem privilege escalation, network dashboard auth, systemd unit injection)
+- [ ] Performance (monitoring overhead, FUSE latency, systemd watcher CPU usage)
+- [ ] Concurrency (multiple monitoring agents, concurrent FUSE operations, race conditions)
+- [ ] Error handling (hardware failures, network timeouts, service crashes)
+- [ ] Accessibility (N/A — infrastructure tools)
+
+### Build & Execution
+- [ ] cargo build for all Rust components — not verified
+- [ ] mix compile for Elixir components — not verified (version mismatch)
+- [ ] gleam build for composer — not verified
+- [ ] zig build for FFI — not verified
+- [ ] Self-diagnostic — none
+
+### Benchmarks Needed
+- FUSE filesystem throughput (read/write/metadata)
+- Monitoring agent resource overhead (CPU, memory)
+- Czech file knife benchmarks (file exists — verify it runs)
+- Systems observatory database benchmarks (file exists — verify it runs)
+- Network orchestration latency
+- Alert propagation time
+
+### Self-Tests
+- [ ] panic-attack assail on own repo
+- [ ] Built-in health check for each component
+- [ ] Systemd unit file validation
+
+## Priority
+- **HIGH** — Massive monorepo (163 Rust + 121 Elixir + 73 Zig + 46 Julia files across 20+ components) with tests concentrated only in the Elixir components. The Rust, Zig, Julia, and ReScript components are essentially untested. Infrastructure tools need especially high reliability.
+
+## FAKE-FUZZ ALERT
+
+- `tests/fuzz/placeholder.txt` is a scorecard placeholder inherited from rsr-template-repo — it does NOT provide real fuzz testing
+- Replace with an actual fuzz harness (see rsr-template-repo/tests/fuzz/README.adoc) or remove the file
+- Priority: P2 — creates false impression of fuzz coverage