Skip to content

Commit 892f2c2

Browse files
hyperpolymathclaude
andcommitted
docs: add TEST-NEEDS.md and/or PROOF-NEEDS.md from audit
Documents testing and proof gaps identified during batch audit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent c4974f2 commit 892f2c2

2 files changed

Lines changed: 112 additions & 0 deletions

File tree

PROOF-NEEDS.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Proof Requirements
2+
3+
## Current state
4+
- `src/abi/Types.idr` (194 lines) — System operations types
5+
- `src/abi/Layout.idr` (177 lines) — Memory layout
6+
- `src/abi/Foreign.idr` (217 lines) — FFI declarations
7+
- No dangerous patterns in ABI layer
8+
- 109K lines; includes emergency-room, session-sentinel, and system management tools
9+
- Claims: "panic-safe intake", safety and trust principles
10+
11+
## What needs proving
12+
- **Emergency room idempotency**: Prove that emergency stabilization operations are idempotent (running twice does not cause harm)
13+
- **Session sentinel state machine**: Prove the session lifecycle (start -> active -> suspended -> terminated) has no invalid transitions or resource leaks
14+
- **Service restart safety**: Prove restart/recovery operations do not corrupt persistent state
15+
- **Privilege escalation prevention**: Prove system operations respect the principle of least privilege (no operation escalates beyond its declared scope)
16+
- **Rollback atomicity**: Prove that failed operations roll back completely (no partial state)
17+
18+
## Recommended prover
19+
- **Idris2** — State machines and idempotency properties are natural fits for dependent types
20+
21+
## Priority
22+
- **MEDIUM** — AmbientOps manages system operations where incorrect behavior can destabilize the host. The emergency-room and session-sentinel components have the highest proof priority within the monorepo.

TEST-NEEDS.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Test & Benchmark Requirements
2+
3+
## Current State
4+
- Unit tests: ~69 Elixir test files + 2 Gleam test files + ~17 Zig integration tests — counts unknown (cannot run mix test / gleam test without correct versions)
5+
- Integration tests: partial (Zig FFI integration tests exist)
6+
- E2E tests: NONE
7+
- Benchmarks: 2 files (czech_file_knife_bench.rs, benchmark_database.jl)
8+
- panic-attack scan: NEVER RUN
9+
10+
## What's Missing
11+
### Point-to-Point (P2P)
12+
This is a monorepo with 20+ components. Coverage is extremely uneven:
13+
14+
#### Tested (Elixir — 69 test files)
15+
- observatory/ — has tests
16+
- network-dashboard/ — has tests
17+
- composer/ — has Gleam tests (2 files)
18+
19+
#### UNTESTED Components
20+
- **clinician/** (Rust) — Cargo.toml exists, 0 test files
21+
- **hardware-crash-team/** (Rust) — Cargo.toml exists, 0 test files
22+
- **contracts-rust/** (Rust) — Cargo.toml exists, 0 test files
23+
- **czech-file-knife/** (Rust) — bench file exists but 0 test files
24+
- **displace/** — no tests
25+
- **emergency-button/** — no tests
26+
- **emergency-room/** — no tests
27+
- **nano-aider/** — no tests
28+
- **nerdsafe-restart/** — no tests
29+
- **network-orchestrator/** — no tests
30+
- **nick-shells/** — no tests
31+
- **panoptes/** — no tests
32+
- **session-sentinel/** — no tests (Ephapax rewrite WIP)
33+
- **broad-spectrum/** — no tests
34+
- **cicada/** — no tests
35+
- **ambulances/** — no tests
36+
- **immutable-linux-auditor/** — no tests
37+
- **hybrid-automation-router/** — no tests
38+
- **ffi/fuse/** (Zig — 7+ files) — only template integration test
39+
- **ffi/systemd/** (Zig) — only template integration test
40+
- **monitoring/systems-observatory/** (Julia) — no tests
41+
- **contracts/** (Deno) — no tests
42+
43+
Total: 163 Rust + 121 Elixir + 73 Zig + 46 Julia + 79 ReScript + 44 V source files.
44+
Test coverage concentrated in Elixir components only.
45+
46+
### End-to-End (E2E)
47+
- Full system health monitoring pipeline (observatory -> alerts -> emergency-room)
48+
- Network dashboard monitoring cycle
49+
- Hardware crash detection and recovery workflow
50+
- Immutable Linux audit cycle
51+
- Session sentinel lifecycle
52+
- FUSE filesystem mount/unmount/operations cycle
53+
- Systemd unit management workflow
54+
- Composer plan execution
55+
56+
### Aspect Tests
57+
- [ ] Security (FUSE filesystem privilege escalation, network dashboard auth, systemd unit injection)
58+
- [ ] Performance (monitoring overhead, FUSE latency, systemd watcher CPU usage)
59+
- [ ] Concurrency (multiple monitoring agents, concurrent FUSE operations, race conditions)
60+
- [ ] Error handling (hardware failures, network timeouts, service crashes)
61+
- [ ] Accessibility (N/A — infrastructure tools)
62+
63+
### Build & Execution
64+
- [ ] cargo build for all Rust components — not verified
65+
- [ ] mix compile for Elixir components — not verified (version mismatch)
66+
- [ ] gleam build for composer — not verified
67+
- [ ] zig build for FFI — not verified
68+
- [ ] Self-diagnostic — none
69+
70+
### Benchmarks Needed
71+
- FUSE filesystem throughput (read/write/metadata)
72+
- Monitoring agent resource overhead (CPU, memory)
73+
- Czech file knife benchmarks (file exists — verify it runs)
74+
- Systems observatory database benchmarks (file exists — verify it runs)
75+
- Network orchestration latency
76+
- Alert propagation time
77+
78+
### Self-Tests
79+
- [ ] panic-attack assail on own repo
80+
- [ ] Built-in health check for each component
81+
- [ ] Systemd unit file validation
82+
83+
## Priority
84+
- **HIGH** — Massive monorepo (163 Rust + 121 Elixir + 73 Zig + 46 Julia files across 20+ components) with tests concentrated only in the Elixir components. The Rust, Zig, Julia, and ReScript components are essentially untested. Infrastructure tools need especially high reliability.
85+
86+
## FAKE-FUZZ ALERT
87+
88+
- `tests/fuzz/placeholder.txt` is a scorecard placeholder inherited from rsr-template-repo — it does NOT provide real fuzz testing
89+
- Replace with an actual fuzz harness (see rsr-template-repo/tests/fuzz/README.adoc) or remove the file
90+
- Priority: P2 — creates false impression of fuzz coverage

0 commit comments

Comments
 (0)