Skip to content

Harden test suite against silent-skip patterns#38

Merged
jserv merged 1 commit into
mainfrom
test-suite
May 17, 2026
Merged

Harden test suite against silent-skip patterns#38
jserv merged 1 commit into
mainfrom
test-suite

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 17, 2026

In tests/test-proctitle-low-stack.sh: the -gt 8192 heuristic was a no-op on macOS Apple Silicon hosts whose default RLIMIT_STACK is 8176 KiB, so the regression test never actually capped the host stack. A wider audit surfaced the same shape across the suite: regression checks that quietly stopped checking on common configurations.

Proctitle coverage:

  • tests/test-proctitle-low-stack.sh: drop broken -gt heuristic. Apply an unconditional 1024 KiB soft-stack cap, env-overridable via PROCTITLE_LOW_STACK_KIB. The new cap sits well below every observed macOS default and well above the ~560 KiB floor where elfuse cannot start. Distinct exit codes 98/99 separate ulimit-setup failure from elfuse failure.
  • tests/test-proctitle-host.c (new, host-side native test): synthesizes a contiguous argv block with its NUL terminator at the last writable byte of a page, with the next page mapped PROT_NONE and a sentinel environ string mmapped above it. The fixed runtime_set_process_title walks only the contiguous argv block and stores byte-by-byte through a volatile pointer; any overshoot or a reverted argv+envp upper-bound walk SIGSEGVs against the guard. Verified by restoring the pre-fix proctitle.c and observing rc=139. Makefile gains a build rule that links against the project's proctitle.o so the exact in-tree code is exercised, no HVF entitlement required.

Driver correctness (tests/driver.sh):

  • evaluate_result no longer reports OK for rc=0 when expected_rc=N is set. The previous rc==0 OR (expected AND rc==expected) accepted a buggy exit of 0 against an explicit non-zero expectation; test-complex declares expected_rc=42 and was the existing silent-pass victim.
  • ALLOW_MISSING_BINARIES default flips from auto (skip-on-missing for any non-canonical TESTDIR) to 0 (strict). Callers that want permissive-skip-mode now set ALLOW_MISSING_BINARIES=1 explicitly.

Recipe-level exit propagation (mk/tests.mk):

  • test-sysroot-rename and test-sysroot-create-paths gain set -e. The earlier semicolon-chained recipes ran post-conditions after the elfuse invocation, so a non-zero elfuse exit was swallowed whenever the residual filesystem state satisfied the checks.

Test-runner timeout discipline (tests/lib/test-runner.sh):

  • run() and run_check() wrap every invocation in timeout $TEST_TIMEOUT. The asymmetry with run_pipe / run_timeout let a deadlocked elfuse hang make check forever.
  • A new _test_runner_epoch_us helper (bash 5.0 EPOCHREALTIME) disambiguates a real harness timeout from the guest's own timeout(1) returning rc=124. The two share an exit code; comparing microsecond elapsed against TEST_TIMEOUT * 1_000_000 is the only reliable distinguisher, and the seconds-resolution SECONDS alternative could undercount by almost a full second at small timeouts.

Coreutils optional-binary accounting (tests/lib/coreutils-suite.sh):

  • Raw if [ -e "$BIN/X" ]; then run_check ...; fi guards around base32, basenc, sha224sum, sha384sum, b2sum, sum, and numfmt are removed. Missing tools now route through test_skip_missing_tool inside run_check, which reports SKIP with accounting under TEST_SKIP_MISSING_TOOLS=1 (smoke profile) and a hard FAIL under the full profile, instead of silently erasing the assertion.

Matrix runner accounting (tests/test-matrix.sh):

  • New require_binary() and skip_suite() helpers always increment the skip counter and emit a visible skip line. 14 raw existence guards plus 3 silent suite-level drops (static-bins, dynamic-coreutils musl/glibc) now go through them, so missing fixtures surface in the per-mode summary instead of looking like a full pass.
  • Calls use the if require_binary X Y; then ... fi form, not require_binary X Y && test_check .... Under set -e the chain form propagates the helper's return-1 as the calling function's exit status, aborting the script when the last optional binary in a function happened to be missing.
  • test_check and test_pipe now require rc==0 before trusting the regex. A crashing tool that printed the expected substring before dying used to be reported OK; the precondition matches the corrected driver.sh evaluate_result.

Perf benchmark integrity (tests/test-perf.sh):

  • benchmark() captures each sample's exit status. PERF_FAILED accumulates and the script exits 1 if any sample failed. The previous ... || true swallowed every failure, so a missing native binary, an elfuse crash, or a host SIP block degraded into median 0 ms PASS.
  • The cat|wc pipelines now run under bash -c "set -o pipefail; ..." so a failing producer surfaces through the rc capture. The outer script's pipefail does not propagate into sh -c children.

Additional /proc test hardening (in test-proc-fidelity.c):

  • test_proc_oom_score_write_fails opens O_WRONLY, the path the test name claims to cover. Non-root EACCES becomes an explicit PROCFS_SKIP with accounting instead of a silent PASS that hid the actual write-rejection coverage.
  • test_proc_oom_adj_reread_tracks_score_adj_updates, test_proc_oom_adj_scaling, and test_proc_oom_adj_same_fd_roundtrip fail hard when /proc/self/oom_adj is missing. The earlier silent PASS turned each regression into a no-op on any host that did not ship the legacy compat node.

Close #37


Summary by cubic

Hardens the test suite to remove silent skips, enforce strict exit/timeout handling, and split coverage into deterministic suites for proctitle, syscalls, /proc, and fd-family. make check now fails on hangs and missing fixtures with clear PASS/FAIL/SKIP accounting.

  • New Features

    • Added native macOS argv-tail guard test-proctitle-host linked against in-tree proctitle.o; runs in make check.
    • Reworked low-stack proctitle regression: unconditional 1024 KiB cap via PROCTITLE_LOW_STACK_KIB, with distinct exit codes for ulimit setup vs guest failure.
    • Introduced test-syscall-fidelity (fchmodat2, getcpu, openat2 RESOLVE_*, O_PATH, madvise, low-hint mmap) and test-fd-family (signalfd EFAULT-preserves-pending); split /proc into test-proc-fidelity with explicit SKIPs and fail-hard on missing oom_adj. Manifest reorganized into /proc, syscall fidelity, and fd-family sections.
    • FUSE validation no longer requires Alpine fixtures; falls back to a minimal scratch sysroot containing /mnt/fuse when SYSROOT_DIR is absent.
  • Refactors

    • Driver: evaluate_result now requires exact expected_rc; default ALLOW_MISSING_BINARIES=0 (strict).
    • Runner: wrap run/run_check in timeout; use bash EPOCHREALTIME to detect harness timeouts; timeouts are FAILs.
    • Coreutils and matrix: route optional tools through visible SKIPs (test_skip_missing_tool, require_binary, skip_suite); require rc==0 before trusting regex.
    • Make/test recipes: set -e for sysroot rename/create-paths.
    • Perf: capture per-sample rc and exit non-zero on any failure; run pipelines under bash -c "set -o pipefail; ..." to surface producer failures.

Written for commit d3b4800. Summary will update on new commits. Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv jserv force-pushed the test-suite branch 2 times, most recently from 7222d49 to 51d6f35 Compare May 17, 2026 12:55
In tests/test-proctitle-low-stack.sh: the -gt 8192 heuristic was a no-op
on macOS Apple Silicon hosts whose default RLIMIT_STACK is 8176 KiB, so
the regression test never actually capped the host stack. A wider audit
surfaced the same shape across the suite: regression checks that quietly
stopped checking on common configurations.

Proctitle coverage:
- tests/test-proctitle-low-stack.sh: drop broken -gt heuristic. Apply
  an unconditional 1024 KiB soft-stack cap, env-overridable via
  PROCTITLE_LOW_STACK_KIB. The new cap sits well below every observed
  macOS default and well above the ~560 KiB floor where elfuse cannot
  start. Distinct exit codes 98/99 separate ulimit-setup failure from
  elfuse failure.
- tests/test-proctitle-host.c (new, host-side native test):
  synthesizes a contiguous argv block with its NUL terminator at the
  last writable byte of a page, with the next page mapped PROT_NONE
  and a sentinel environ string mmapped above it. The fixed
  runtime_set_process_title walks only the contiguous argv block and
  stores byte-by-byte through a volatile pointer; any overshoot or a
  reverted argv+envp upper-bound walk SIGSEGVs against the guard.
  Verified by restoring the pre-fix proctitle.c and observing rc=139.
  Makefile gains a build rule that links against the project's
  proctitle.o so the exact in-tree code is exercised, no HVF
  entitlement required.

Driver correctness (tests/driver.sh):
- evaluate_result no longer reports OK for rc=0 when expected_rc=N
  is set. The previous rc==0 OR (expected AND rc==expected) accepted
  a buggy exit of 0 against an explicit non-zero expectation;
  test-complex declares expected_rc=42 and was the existing
  silent-pass victim.
- ALLOW_MISSING_BINARIES default flips from auto (skip-on-missing
  for any non-canonical TESTDIR) to 0 (strict). Callers that want
  permissive-skip-mode now set ALLOW_MISSING_BINARIES=1 explicitly.

Recipe-level exit propagation (mk/tests.mk):
- test-sysroot-rename and test-sysroot-create-paths gain set -e.
  The earlier semicolon-chained recipes ran post-conditions after
  the elfuse invocation, so a non-zero elfuse exit was swallowed
  whenever the residual filesystem state satisfied the checks.

Test-runner timeout discipline (tests/lib/test-runner.sh):
- run() and run_check() wrap every invocation in
  timeout \$TEST_TIMEOUT. The asymmetry with run_pipe / run_timeout
  let a deadlocked elfuse hang make check forever.
- A new _test_runner_epoch_us helper (bash 5.0 EPOCHREALTIME)
  disambiguates a real harness timeout from the guest's own
  timeout(1) returning rc=124. The two share an exit code; comparing
  microsecond elapsed against TEST_TIMEOUT * 1_000_000 is the only
  reliable distinguisher, and the seconds-resolution SECONDS
  alternative could undercount by almost a full second at small
  timeouts.

Coreutils optional-binary accounting (tests/lib/coreutils-suite.sh):
- Raw if [ -e "\$BIN/X" ]; then run_check ...; fi guards around
  base32, basenc, sha224sum, sha384sum, b2sum, sum, and numfmt are
  removed. Missing tools now route through test_skip_missing_tool
  inside run_check, which reports SKIP with accounting under
  TEST_SKIP_MISSING_TOOLS=1 (smoke profile) and a hard FAIL under
  the full profile, instead of silently erasing the assertion.

Matrix runner accounting (tests/test-matrix.sh):
- New require_binary() and skip_suite() helpers always increment
  the skip counter and emit a visible skip line. 14 raw existence
  guards plus 3 silent suite-level drops (static-bins,
  dynamic-coreutils musl/glibc) now go through them, so missing
  fixtures surface in the per-mode summary instead of looking like
  a full pass.
- Calls use the if require_binary X Y; then ... fi form, not
  require_binary X Y && test_check .... Under set -e the chain form
  propagates the helper's return-1 as the calling function's exit
  status, aborting the script when the last optional binary in a
  function happened to be missing.
- test_check and test_pipe now require rc==0 before trusting the
  regex. A crashing tool that printed the expected substring before
  dying used to be reported OK; the precondition matches the
  corrected driver.sh evaluate_result.

Perf benchmark integrity (tests/test-perf.sh):
- benchmark() captures each sample's exit status. PERF_FAILED
  accumulates and the script exits 1 if any sample failed. The
  previous ... || true swallowed every failure, so a missing native
  binary, an elfuse crash, or a host SIP block degraded into
  median 0 ms PASS.
- The cat|wc pipelines now run under
  bash -c "set -o pipefail; ..." so a failing producer surfaces
  through the rc capture. The outer script's pipefail does not
  propagate into sh -c children.

Additional /proc test hardening (in test-proc-fidelity.c):
- test_proc_oom_score_write_fails opens O_WRONLY, the path the
  test name claims to cover. Non-root EACCES becomes an explicit
  PROCFS_SKIP with accounting instead of a silent PASS that hid the
  actual write-rejection coverage.
- test_proc_oom_adj_reread_tracks_score_adj_updates,
  test_proc_oom_adj_scaling, and test_proc_oom_adj_same_fd_roundtrip
  fail hard when /proc/self/oom_adj is missing. The earlier silent
  PASS turned each regression into a no-op on any host that did not
  ship the legacy compat node.

Close #37
@jserv jserv merged commit 1760a99 into main May 17, 2026
4 checks passed
@jserv jserv deleted the test-suite branch May 17, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stack cap is unreachable on default macOS shells (8176 KiB), so the regression branch might never executes

1 participant