From d3b48008f796038d47269c879fdcb61716692a0c Mon Sep 17 00:00:00 2001 From: Jim Huang Date: Sun, 17 May 2026 07:17:54 -0500 Subject: [PATCH] Harden test suite against silent-skip patterns In tests/test-proctitle-low-stack.sh: the -gt 8192 heuristic was a no-op on macOS Apple Silicon hosts whose default RLIMIT_STACK is 8176 KiB, so the regression test never actually capped the host stack. A wider audit surfaced the same shape across the suite: regression checks that quietly stopped checking on common configurations. Proctitle coverage: - tests/test-proctitle-low-stack.sh: drop broken -gt heuristic. Apply an unconditional 1024 KiB soft-stack cap, env-overridable via PROCTITLE_LOW_STACK_KIB. The new cap sits well below every observed macOS default and well above the ~560 KiB floor where elfuse cannot start. Distinct exit codes 98/99 separate ulimit-setup failure from elfuse failure. - tests/test-proctitle-host.c (new, host-side native test): synthesizes a contiguous argv block with its NUL terminator at the last writable byte of a page, with the next page mapped PROT_NONE and a sentinel environ string mmapped above it. The fixed runtime_set_process_title walks only the contiguous argv block and stores byte-by-byte through a volatile pointer; any overshoot or a reverted argv+envp upper-bound walk SIGSEGVs against the guard. Verified by restoring the pre-fix proctitle.c and observing rc=139. Makefile gains a build rule that links against the project's proctitle.o so the exact in-tree code is exercised, no HVF entitlement required. Driver correctness (tests/driver.sh): - evaluate_result no longer reports OK for rc=0 when expected_rc=N is set. The previous rc==0 OR (expected AND rc==expected) accepted a buggy exit of 0 against an explicit non-zero expectation; test-complex declares expected_rc=42 and was the existing silent-pass victim. - ALLOW_MISSING_BINARIES default flips from auto (skip-on-missing for any non-canonical TESTDIR) to 0 (strict). Callers that want permissive-skip-mode now set ALLOW_MISSING_BINARIES=1 explicitly. Recipe-level exit propagation (mk/tests.mk): - test-sysroot-rename and test-sysroot-create-paths gain set -e. The earlier semicolon-chained recipes ran post-conditions after the elfuse invocation, so a non-zero elfuse exit was swallowed whenever the residual filesystem state satisfied the checks. Test-runner timeout discipline (tests/lib/test-runner.sh): - run() and run_check() wrap every invocation in timeout \$TEST_TIMEOUT. The asymmetry with run_pipe / run_timeout let a deadlocked elfuse hang make check forever. - A new _test_runner_epoch_us helper (bash 5.0 EPOCHREALTIME) disambiguates a real harness timeout from the guest's own timeout(1) returning rc=124. The two share an exit code; comparing microsecond elapsed against TEST_TIMEOUT * 1_000_000 is the only reliable distinguisher, and the seconds-resolution SECONDS alternative could undercount by almost a full second at small timeouts. Coreutils optional-binary accounting (tests/lib/coreutils-suite.sh): - Raw if [ -e "\$BIN/X" ]; then run_check ...; fi guards around base32, basenc, sha224sum, sha384sum, b2sum, sum, and numfmt are removed. Missing tools now route through test_skip_missing_tool inside run_check, which reports SKIP with accounting under TEST_SKIP_MISSING_TOOLS=1 (smoke profile) and a hard FAIL under the full profile, instead of silently erasing the assertion. Matrix runner accounting (tests/test-matrix.sh): - New require_binary() and skip_suite() helpers always increment the skip counter and emit a visible skip line. 14 raw existence guards plus 3 silent suite-level drops (static-bins, dynamic-coreutils musl/glibc) now go through them, so missing fixtures surface in the per-mode summary instead of looking like a full pass. - Calls use the if require_binary X Y; then ... fi form, not require_binary X Y && test_check .... Under set -e the chain form propagates the helper's return-1 as the calling function's exit status, aborting the script when the last optional binary in a function happened to be missing. - test_check and test_pipe now require rc==0 before trusting the regex. A crashing tool that printed the expected substring before dying used to be reported OK; the precondition matches the corrected driver.sh evaluate_result. Perf benchmark integrity (tests/test-perf.sh): - benchmark() captures each sample's exit status. PERF_FAILED accumulates and the script exits 1 if any sample failed. The previous ... || true swallowed every failure, so a missing native binary, an elfuse crash, or a host SIP block degraded into median 0 ms PASS. - The cat|wc pipelines now run under bash -c "set -o pipefail; ..." so a failing producer surfaces through the rc capture. The outer script's pipefail does not propagate into sh -c children. Additional /proc test hardening (in test-proc-fidelity.c): - test_proc_oom_score_write_fails opens O_WRONLY, the path the test name claims to cover. Non-root EACCES becomes an explicit PROCFS_SKIP with accounting instead of a silent PASS that hid the actual write-rejection coverage. - test_proc_oom_adj_reread_tracks_score_adj_updates, test_proc_oom_adj_scaling, and test_proc_oom_adj_same_fd_roundtrip fail hard when /proc/self/oom_adj is missing. The earlier silent PASS turned each regression into a no-op on any host that did not ship the legacy compat node. Close #37 --- Makefile | 19 +- mk/tests.mk | 67 +- src/syscall/fd.c | 2 +- tests/driver.sh | 27 +- tests/lib/coreutils-suite.sh | 37 +- tests/lib/test-runner.sh | 64 +- tests/manifest.txt | 12 +- tests/test-fd-family.c | 75 ++ tests/test-matrix.sh | 109 ++- tests/test-perf.sh | 46 +- tests/{test-tier-b.c => test-proc-fidelity.c} | 718 +++--------------- tests/test-proctitle-host.c | 153 ++++ tests/test-proctitle-low-stack.sh | 80 +- tests/test-syscall-fidelity.c | 614 +++++++++++++++ 14 files changed, 1270 insertions(+), 753 deletions(-) create mode 100644 tests/test-fd-family.c rename tests/{test-tier-b.c => test-proc-fidelity.c} (61%) create mode 100644 tests/test-proctitle-host.c create mode 100644 tests/test-syscall-fidelity.c diff --git a/Makefile b/Makefile index 72297d0..405651b 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -# elfuse — aarch64-linux ELF executor on macOS Apple Silicon +# elfuse -- aarch64-linux ELF executor on macOS Apple Silicon # # Copyright 2026 elfuse contributors # SPDX-License-Identifier: Apache-2.0 @@ -8,7 +8,7 @@ # # Example: make elfuse # make test-hello -# make V=1 elfuse (verbose — show full commands) +# make V=1 elfuse (verbose -- show full commands) .DEFAULT_GOAL := help .DELETE_ON_ERROR: @@ -90,7 +90,7 @@ define link-and-sign mv "$$tmp" "$1" endef -# ── Main executable ────────────────────────────────────────────── +# Main executable .PHONY: all elfuse .PHONY: gen-syscall-dispatch check-syscall-dispatch @@ -119,7 +119,7 @@ elfuse: $(ELFUSE_BIN) $(ELFUSE_BIN): $(OBJS) | $(BUILD_DIR) $(call link-and-sign,$@,$(OBJS)) -# ── Native test binaries (macOS, Hypervisor.framework) ─────────── +# Native test binaries (macOS, Hypervisor.framework) ## Build the multi-vCPU HVF validation test (native macOS binary) $(BUILD_DIR)/test-multi-vcpu: $(BUILD_DIR)/test-multi-vcpu.o | $(BUILD_DIR) @@ -129,7 +129,16 @@ $(BUILD_DIR)/test-multi-vcpu: $(BUILD_DIR)/test-multi-vcpu.o | $(BUILD_DIR) $(BUILD_DIR)/test-rwx: $(BUILD_DIR)/test-rwx.o | $(BUILD_DIR) $(call link-and-sign,$@,$<) -# ── Guest test binaries (cross-compiled, aarch64-linux) ────────── +## Build the proctitle argv-tail regression test (native macOS binary) +# Links against the project-built proctitle.o so the exact in-tree code is +# exercised; no HVF entitlement is needed because the test only manipulates +# mmap and PROT_NONE. The codesign step is skipped for the same reason. +$(BUILD_DIR)/test-proctitle-host: $(BUILD_DIR)/test-proctitle-host.o \ + $(BUILD_DIR)/runtime/proctitle.o | $(BUILD_DIR) + @echo " LD $@" + $(Q)$(CC) $(CFLAGS) -o $@ $^ + +# Guest test binaries (cross-compiled, aarch64-linux) # Only used when GUEST_TEST_BINARIES is not set. ifndef GUEST_TEST_BINARIES diff --git a/mk/tests.mk b/mk/tests.mk index 35f48aa..155025f 100644 --- a/mk/tests.mk +++ b/mk/tests.mk @@ -7,7 +7,7 @@ test-matrix test-matrix-elfuse-aarch64 test-matrix-qemu-aarch64 \ test-full test-multi-vcpu test-rwx test-sysroot-rename \ test-case-collision test-case-collision-fallback test-sysroot-create-paths \ - test-proctitle-low-stack \ + test-proctitle-host test-proctitle-low-stack \ test-sysroot-procfs-exec test-timeout-disable test-fuse-alpine \ test-sysroot-nofollow test-sysroot-chdir perf @@ -23,6 +23,8 @@ check-syscall-coverage: ## Run the unit test suite plus busybox applet validation check: $(ELFUSE_BIN) $(TEST_DEPS) check-syscall-coverage @bash tests/driver.sh -e $(ELFUSE_BIN) -d $(TEST_DIR) -v + @printf "\n$(BLUE)━━━ proctitle argv-tail regression ━━━$(RESET)\n" + @$(MAKE) --no-print-directory test-proctitle-host @printf "\n$(BLUE)━━━ proctitle low-stack regression ━━━$(RESET)\n" @$(MAKE) --no-print-directory test-proctitle-low-stack @printf "\n$(BLUE)━━━ busybox applet validation ━━━$(RESET)\n" @@ -35,7 +37,8 @@ check: $(ELFUSE_BIN) $(TEST_DEPS) check-syscall-coverage @$(MAKE) --no-print-directory test-timeout-disable test-sysroot-rename: $(ELFUSE_BIN) $(BUILD_DIR)/test-sysroot-rename - @tmpdir=$$(mktemp -d); \ + @set -e; \ + tmpdir=$$(mktemp -d); \ trap 'rm -rf "$$tmpdir"; rm -f /tmp/elfuse-sysroot-rename-dst.txt' EXIT; \ mkdir -p "$$tmpdir/tmp"; \ printf 'inside-sysroot\n' > "$$tmpdir/tmp/elfuse-sysroot-rename-src.txt"; \ @@ -74,7 +77,8 @@ test-case-collision-fallback: $(ELFUSE_BIN) $(BUILD_DIR)/test-case-collision $(ELFUSE_BIN) --sysroot "$$tmpdir" $(BUILD_DIR)/test-case-collision test-sysroot-create-paths: $(ELFUSE_BIN) $(BUILD_DIR)/test-sysroot-create-paths - @tmpdir=$$(mktemp -d); \ + @set -e; \ + tmpdir=$$(mktemp -d); \ guest_tmp="/tmp/elfuse-sysroot-create-paths/file.txt"; \ mounted_tmp="$$tmpdir/case-sysroot/tmp/elfuse-sysroot-create-paths/file.txt"; \ host_out_dir="$$tmpdir/host-out"; \ @@ -113,8 +117,7 @@ test-gdbstub: $(ELFUSE_BIN) $(TEST_DIR)/test-hello ## Alias for check (backward compat) test-all: check -# ── Coreutils integration test ─────────────────────────────────── - +# Coreutils integration test FIXTURES_DIR ?= $(CURDIR)/externals/test-fixtures ifeq ($(origin GUEST_COREUTILS), undefined) @@ -171,8 +174,7 @@ test-coreutils: $(ELFUSE_BIN) bash tests/test-coreutils.sh $(ELFUSE_BIN) $(COREUTILS_BIN); \ fi -# ── Busybox integration test ───────────────────────────────────── - +# Busybox integration test ifneq ($(wildcard $(BUILD_DIR)/busybox),) BUSYBOX_BIN ?= $(BUILD_DIR)/busybox else ifdef GUEST_BUSYBOX @@ -256,8 +258,7 @@ test-proctitle-low-stack: $(ELFUSE_BIN) $(BUSYBOX_DEPS) fi @bash tests/test-proctitle-low-stack.sh $(ELFUSE_BIN) $(BUSYBOX_BIN) -# ── Static binary integration tests ────────────────────────────── - +# Static binary integration tests ifdef GUEST_STATIC_BINS ifneq ($(wildcard $(GUEST_STATIC_BINS)/bin),) STATIC_BINS_DIR ?= $(GUEST_STATIC_BINS)/bin @@ -278,8 +279,7 @@ test-static-bins: $(ELFUSE_BIN) bash tests/test-static-bins.sh $(ELFUSE_BIN) $(STATIC_BINS_DIR); \ fi -# ── Dynamic linking tests ──────────────────────────────────────── - +# Dynamic linking tests # Musl sysroot with dynamic linker + libc.so. SYSROOT_DIR ?= $(GUEST_SYSROOT) ifdef GUEST_DYNAMIC_COREUTILS @@ -299,13 +299,24 @@ test-dynamic: $(ELFUSE_BIN) @printf "$(BLUE)▸ Running$(RESET) dynamic hello-dynamic (--sysroot)\n" $(ELFUSE_BIN) --sysroot $(SYSROOT_DIR) $(GUEST_DYNAMIC_TESTS)/bin/hello-dynamic -## Run guest FUSE validation against the Alpine musl sysroot +## Run guest FUSE validation +# test-fuse-basic is statically linked and accesses exactly one host path: +# /mnt/fuse (open + access). /dev/fuse is intercepted by elfuse internally. +# A minimal sysroot under build/ that contains only /mnt/fuse is therefore +# sufficient coverage; the earlier dependency on the full Alpine fixture +# tree was incidental and broke `make distclean && make check` whenever +# the Alpine CDN pruned a pinned package version. +# +# An explicit SYSROOT_DIR override is still honored for users who want +# the test to run against their own sysroot (e.g. the Alpine fixtures +# fetched separately for the broader matrix runner). test-fuse-alpine: $(ELFUSE_BIN) $(BUILD_DIR)/test-fuse-basic - @if [ -z "$(SYSROOT_DIR)" ] || [ ! -d "$(SYSROOT_DIR)" ]; then \ - printf "$(YELLOW)SKIP$(RESET) Alpine sysroot not found. Set SYSROOT_DIR=/path/to/sysroot or run tests/fetch-fixtures.sh.\n"; \ - exit 0; \ - fi - @bash tests/test-fuse-alpine.sh $(ELFUSE_BIN) $(SYSROOT_DIR) $(BUILD_DIR)/test-fuse-basic + @sysroot="$(SYSROOT_DIR)"; \ + if [ -z "$$sysroot" ] || [ ! -d "$$sysroot" ]; then \ + sysroot="$(BUILD_DIR)/fuse-scratch-sysroot"; \ + mkdir -p "$$sysroot/mnt/fuse"; \ + fi; \ + bash tests/test-fuse-alpine.sh $(ELFUSE_BIN) "$$sysroot" $(BUILD_DIR)/test-fuse-basic ## Run dynamically-linked coreutils tests (--sysroot) test-dynamic-coreutils: $(ELFUSE_BIN) @@ -323,8 +334,7 @@ test-dynamic-coreutils: $(ELFUSE_BIN) bash tests/test-dynamic-coreutils.sh $(ELFUSE_BIN) $(SYSROOT_DIR) $(DYNAMIC_COREUTILS_BIN); \ fi -# ── glibc dynamic linking tests ─────────────────────────────────── - +# glibc dynamic linking tests # glibc sysroot with dynamic linker + libc.so. GLIBC_SYSROOT_DIR ?= $(GUEST_GLIBC_SYSROOT) ifdef GUEST_GLIBC_DYNAMIC_COREUTILS @@ -358,8 +368,7 @@ test-glibc-coreutils: $(ELFUSE_BIN) SUITE_SUMMARY="glibc results" \ bash tests/test-dynamic-coreutils.sh $(ELFUSE_BIN) $(GLIBC_SYSROOT_DIR) $(GLIBC_DYNAMIC_COREUTILS_BIN) -# ── Performance benchmark ───────────────────────────────────────── - +# Performance benchmark ifneq ($(wildcard $(BUILD_DIR)/busybox),) PERF_BIN ?= $(BUILD_DIR)/perf-bin PERF_DEPS := $(addprefix $(PERF_BIN)/,grep wc cat sort) @@ -385,8 +394,7 @@ test-perf: $(ELFUSE_BIN) $(PERF_DEPS) ## Alias for test-perf perf: test-perf -# ── Test matrix (elfuse + qemu, aarch64) ──────────────────────────────── - +# Test matrix (elfuse + qemu, aarch64) ## Run full test matrix (all modes: elfuse + qemu, aarch64) test-matrix: $(ELFUSE_BIN) $(TEST_DEPS) @bash tests/test-matrix.sh all @@ -399,8 +407,7 @@ test-matrix-elfuse-aarch64: $(ELFUSE_BIN) $(TEST_DEPS) test-matrix-qemu-aarch64: $(ELFUSE_BIN) $(TEST_DEPS) @bash tests/test-matrix.sh qemu-aarch64 -# ── Full test suite ────────────────────────────────────────────────── - +# Full test suite ## Run the complete test suite (aarch64: unit + busybox + gdbstub + coreutils + static + dynamic) test-full: $(ELFUSE_BIN) @printf "\n$(CYAN)╔══════════════════════════════════════════════════════╗$(RESET)\n" @@ -438,15 +445,19 @@ test-full: $(ELFUSE_BIN) printf "$(CYAN)╚══════════════════════════════════════════════════════╝$(RESET)\n"; \ [ "$$fail" -eq 0 ] -# ── Multi-vCPU validation test ───────────────────────────────────── +# Multi-vCPU validation test # Build rules in top-level Makefile; these are just run targets. ## Run multi-vCPU validation tests (5 tests) test-multi-vcpu: $(BUILD_DIR)/test-multi-vcpu $(BUILD_DIR)/test-multi-vcpu -# ── RWX page table entry test ─────────────────────────────────── - +# RWX page table entry test ## Run RWX page table entry test (does HVF allow W+X?) test-rwx: $(BUILD_DIR)/test-rwx $(BUILD_DIR)/test-rwx + +# Proctitle argv-tail regression +## Run the deterministic argv-tail overshoot guard test +test-proctitle-host: $(BUILD_DIR)/test-proctitle-host + $(BUILD_DIR)/test-proctitle-host diff --git a/src/syscall/fd.c b/src/syscall/fd.c index 3903e9b..6e6992b 100644 --- a/src/syscall/fd.c +++ b/src/syscall/fd.c @@ -1028,7 +1028,7 @@ int64_t signalfd_read(int guest_fd, if (written == 0) { /* No bytes transferred: surface EFAULT, leave the queue * untouched so the signal is not lost. Matches the elfuse - * promise locked in by tests/test-tier-b's + * promise locked in by tests/test-fd-family's * test_signalfd_efault_preserves_pending. */ if (pending != pending_stack) diff --git a/tests/driver.sh b/tests/driver.sh index 6d3a0b4..164cf70 100755 --- a/tests/driver.sh +++ b/tests/driver.sh @@ -27,7 +27,14 @@ FILTER="" LIST_ONLY=0 VERBOSE=0 TAP=0 -ALLOW_MISSING_BINARIES="${ALLOW_MISSING_BINARIES:-auto}" +# Three values: 0 (strict, default), 1 (skip missing), auto (legacy). +# In strict mode any missing test binary is a FAIL. The legacy "auto" +# value flips to skip when TESTDIR is not the canonical build/ or +# build/bin tree, which used to silently turn a partial out-of-tree +# fixture set into a wall of green skips. Callers that genuinely want +# permissive-skip-mode behavior should set ALLOW_MISSING_BINARIES=1 +# explicitly. +ALLOW_MISSING_BINARIES="${ALLOW_MISSING_BINARIES:-0}" usage() { @@ -230,15 +237,21 @@ evaluate_result() if [ "$rc" -eq 124 ]; then return 1 fi - if [ "$rc" -eq 0 ] || { - [ -n "$expected" ] && [ "$rc" -eq "$expected" ] - }; then - if [ -n "$stdout_pat" ] && ! grep -qE "$stdout_pat" <<< "$output"; then + # When the manifest declares expected_rc=N, only that exact rc passes. + # Without this guard, a test that mistakenly exits 0 instead of its + # declared non-zero code (e.g. test-complex with expected_rc=42) + # would be reported PASS because rc=0 short-circuited the OR clause. + if [ -n "$expected" ]; then + if [ "$rc" -ne "$expected" ]; then return 1 fi - return 0 + elif [ "$rc" -ne 0 ]; then + return 1 + fi + if [ -n "$stdout_pat" ] && ! grep -qE "$stdout_pat" <<< "$output"; then + return 1 fi - return 1 + return 0 } report_case() diff --git a/tests/lib/coreutils-suite.sh b/tests/lib/coreutils-suite.sh index 6d68384..1a9c84c 100644 --- a/tests/lib/coreutils-suite.sh +++ b/tests/lib/coreutils-suite.sh @@ -47,30 +47,24 @@ coreutils_suite_extended_text() coreutils_suite_basic_encoding() { coreutils_print_section "Encoding / hashing" - if [ -e "$BIN/base32" ]; then - run_check base32 "NBSWY" "$TMPDIR/hello.txt" - fi + # Optional binaries (base32/basenc/sha224sum/sha384sum/b2sum/sum) are + # gated by test_skip_missing_tool inside run_check: when the wrapper + # sets TEST_SKIP_MISSING_TOOLS=1 they report SKIP with accounting, + # otherwise the missing binary surfaces as a hard FAIL. The previous + # raw "if [ -e ... ]; then" blocks bypassed both paths, silently + # erasing assertions whenever the binary was absent. + run_check base32 "NBSWY" "$TMPDIR/hello.txt" run_check base64 "aGVsbG8" "$TMPDIR/hello.txt" - if [ -e "$BIN/basenc" ]; then - run_check basenc "aGVsbG8" "--base64" "$TMPDIR/hello.txt" - fi + run_check basenc "aGVsbG8" "--base64" "$TMPDIR/hello.txt" run_check md5sum "hello.txt" "$TMPDIR/hello.txt" run_check sha1sum "hello.txt" "$TMPDIR/hello.txt" - if [ -e "$BIN/sha224sum" ]; then - run_check sha224sum "95041d" "$TMPDIR/hello.txt" - fi + run_check sha224sum "95041d" "$TMPDIR/hello.txt" run_check sha256sum "hello.txt" "$TMPDIR/hello.txt" - if [ -e "$BIN/sha384sum" ]; then - run_check sha384sum "6b3b69" "$TMPDIR/hello.txt" - fi + run_check sha384sum "6b3b69" "$TMPDIR/hello.txt" run_check sha512sum "hello.txt" "$TMPDIR/hello.txt" - if [ -e "$BIN/b2sum" ]; then - run_check b2sum "hello.txt" "$TMPDIR/hello.txt" - fi + run_check b2sum "hello.txt" "$TMPDIR/hello.txt" run_check cksum "hello.txt" "$TMPDIR/hello.txt" - if [ -e "$BIN/sum" ]; then - run_check sum "[0-9]" "$TMPDIR/hello.txt" - fi + run_check sum "[0-9]" "$TMPDIR/hello.txt" } coreutils_suite_basic_files() @@ -164,9 +158,10 @@ coreutils_suite_basic_math() run_check seq "5" "1" "5" run_check expr "3" "1" "+" "2" run_check factor "2 2 3" "12" - if [ -e "$BIN/numfmt" ]; then - run_check numfmt "1\\.0[kK]" "--to=si" "1000" - fi + # numfmt is optional in some packages; rely on test_skip_missing_tool + # so absence becomes a SKIP under TEST_SKIP_MISSING_TOOLS=1 and a FAIL + # otherwise, rather than a silent omission. + run_check numfmt "1\\.0[kK]" "--to=si" "1000" } coreutils_suite_basic_sysinfo() diff --git a/tests/lib/test-runner.sh b/tests/lib/test-runner.sh index b01d8f1..a0892a9 100644 --- a/tests/lib/test-runner.sh +++ b/tests/lib/test-runner.sh @@ -45,6 +45,21 @@ elif ! command -v timeout > /dev/null 2>&1; then unset _timeout_bin _candidate fi +# Convert bash $EPOCHREALTIME (seconds.microseconds) to integer microseconds. +# run() uses this to disambiguate guest timeout(1) returning rc=124 from a +# harness watchdog firing at TEST_TIMEOUT; SECONDS resolution would mistake +# either case at short caps. Requires bash 5.0+, already assumed elsewhere +# (e.g. tests/test-perf.sh epoch_us). +_test_runner_epoch_us() +{ + local t="$EPOCHREALTIME" + local sec="${t%%.*}" + local frac="${t##*.}" + frac="${frac}000000" + frac="${frac:0:6}" + printf '%s' "$((sec * 1000000 + 10#$frac))" +} + if [ -t 1 ]; then # Use ANSI-C quoting so the variables hold real ESC bytes, not the literal # 4-char "\033" sequence. Without this, callers that pass colors as printf @@ -122,17 +137,50 @@ run() return fi - if output=$("${TEST_RUNNER[@]}" "$(test_tool_path "$tool")" "$@" 2>&1); then + # Wrap every invocation in `timeout` so a hanging guest tool cannot + # freeze the entire suite. run_pipe and run_timeout already do this; + # the omission here used to let a deadlocked elfuse syscall path + # hang make check forever. + # + # GNU timeout reports rc=124 on its own timeout, but coreutils-suite + # also runs the guest's own timeout(1) with expect_rc=124. Exit code + # alone cannot tell the two apart, so wall-clock elapsed time is + # used as an out-of-band marker: a harness firing means elapsed is + # at or above TEST_TIMEOUT, while the guest case completes well + # under it. EPOCHREALTIME (bash 5.0+, already required elsewhere in + # this suite) is microsecond-resolution; comparing seconds alone + # via SECONDS could undercount by almost a full second and let a + # real harness timeout slip through as a guest-OK at small + # TEST_TIMEOUT values. + local start_us end_us elapsed_us limit_us + start_us=$(_test_runner_epoch_us) + if output=$(timeout "$TEST_TIMEOUT" "${TEST_RUNNER[@]}" \ + "$(test_tool_path "$tool")" "$@" 2>&1); then rc=0 else rc=$? fi + end_us=$(_test_runner_epoch_us) + elapsed_us=$((end_us - start_us)) + limit_us=$((TEST_TIMEOUT * 1000000)) + local harness_timed_out=0 + if [ "$rc" -eq 124 ] && [ "$elapsed_us" -ge "$limit_us" ]; then + harness_timed_out=1 + fi - if [ "$rc" = "$expect_rc" ]; then + if [ "$harness_timed_out" -eq 1 ]; then + test_report fail "$tool" " (timeout after ${TEST_TIMEOUT}s)" + test_excerpt "$output" + fail=$((fail + 1)) + elif [ "$rc" = "$expect_rc" ]; then detail="" [ "$expect_rc" -ne 0 ] && detail=" (exit $rc)" test_report ok "$tool" "$detail" pass=$((pass + 1)) + elif [ "$rc" -eq 124 ]; then + test_report fail "$tool" " (timeout after ${TEST_TIMEOUT}s)" + test_excerpt "$output" + fail=$((fail + 1)) else test_report fail "$tool" " (got $rc, expected $expect_rc)" test_excerpt "$output" @@ -152,13 +200,21 @@ run_check() return fi - if output=$("${TEST_RUNNER[@]}" "$(test_tool_path "$tool")" "$@" 2>&1); then + # See run() for the timeout-vs-expected ordering rationale. run_check + # has no explicit expect_rc parameter (zero is implied), so any rc=124 + # here is treated as a harness timeout. + if output=$(timeout "$TEST_TIMEOUT" "${TEST_RUNNER[@]}" \ + "$(test_tool_path "$tool")" "$@" 2>&1); then rc=0 else rc=$? fi - if [ "$rc" -ne 0 ]; then + if [ "$rc" -eq 124 ]; then + test_report fail "$tool" " (timeout after ${TEST_TIMEOUT}s)" + test_excerpt "$output" + fail=$((fail + 1)) + elif [ "$rc" -ne 0 ]; then test_report fail "$tool" " (exit rc=$rc)" test_excerpt "$output" fail=$((fail + 1)) diff --git a/tests/manifest.txt b/tests/manifest.txt index 564a4c6..ff9631b 100644 --- a/tests/manifest.txt +++ b/tests/manifest.txt @@ -1,4 +1,4 @@ -# manifest.txt — Declarative test list for elfuse test driver +# manifest.txt -- Declarative test list for elfuse test driver # # Copyright 2026 elfuse contributors # Copyright 2025 Moritz Angermann, zw3rk pte. ltd. @@ -171,8 +171,14 @@ test-ancillary [section] Tier A compatibility tests test-tier-a # diff=skip -[section] Tier B correctness tests -test-tier-b # diff=skip +[section] /proc fidelity tests +test-proc-fidelity # diff=skip + +[section] Linux syscall fidelity tests +test-syscall-fidelity # diff=skip + +[section] fd-family tests +test-fd-family # diff=skip [section] SCM_CREDENTIALS tests test-scm-creds # diff=skip diff --git a/tests/test-fd-family.c b/tests/test-fd-family.c new file mode 100644 index 0000000..24d80ab --- /dev/null +++ b/tests/test-fd-family.c @@ -0,0 +1,75 @@ +/* fd-family tests + * + * Copyright 2026 elfuse contributors + * SPDX-License-Identifier: Apache-2.0 + * + * Verifies semantics of the eventfd/timerfd/signalfd family of + * descriptors. The current coverage focuses on signalfd's promise + * that an EFAULT during read leaves the pending signal queue intact + * so a subsequent good-pointer read still observes the signal. + * Future eventfd and timerfd primitive tests should land here. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "test-harness.h" + +int passes = 0, fails = 0; + +static void test_signalfd_efault_preserves_pending(void) +{ + TEST("signalfd EFAULT preserves pending signal"); + + sigset_t mask; + sigemptyset(&mask); + sigaddset(&mask, SIGUSR1); + sigprocmask(SIG_BLOCK, &mask, NULL); + + int fd = signalfd(-1, &mask, SFD_NONBLOCK); + if (fd < 0) { + FAIL("signalfd"); + return; + } + + kill(getpid(), SIGUSR1); + errno = 0; + /* Deliberately bad pointer to verify the kernel reports EFAULT. */ + ssize_t bad = syscall(SYS_read, fd, + /* cppcheck-suppress intToPointerCast */ + (void *) 1, sizeof(struct signalfd_siginfo)); + if (bad != -1 || errno != EFAULT) { + close(fd); + FAIL("expected EFAULT"); + return; + } + + struct signalfd_siginfo info; + memset(&info, 0, sizeof(info)); + ssize_t good = read(fd, &info, sizeof(info)); + close(fd); + if (good == (ssize_t) sizeof(info) && + info.ssi_signo == (uint32_t) SIGUSR1) { + PASS(); + } else { + FAIL("signal was lost after EFAULT"); + } +} + +int main(void) +{ + printf("fd-family tests:\n"); + + test_signalfd_efault_preserves_pending(); + + printf("\ntest-fd-family: %d passed, %d failed%s\n", passes, fails, + fails == 0 ? " - PASS" : " - FAIL"); + return fails ? 1 : 0; +} diff --git a/tests/test-matrix.sh b/tests/test-matrix.sh index ec6e929..d383fc0 100755 --- a/tests/test-matrix.sh +++ b/tests/test-matrix.sh @@ -1,13 +1,13 @@ #!/usr/bin/env bash -# test-matrix.sh — Run aarch64 test suites under both elfuse and self-contained +# test-matrix.sh -- Run aarch64 test suites under both elfuse and self-contained # qemu-system-aarch64 reference VM. # # Modes: -# elfuse-aarch64 — run binaries on macOS via build/elfuse -# qemu-aarch64 — run binaries natively inside qemu-system-aarch64 +# elfuse-aarch64 -- run binaries on macOS via build/elfuse +# qemu-aarch64 -- run binaries natively inside qemu-system-aarch64 # (boots an Alpine minirootfs initramfs that the # fixture script downloads on demand) -# all — run both modes back-to-back +# all -- run both modes back-to-back # # Environment overrides (defaults point at externals/test-fixtures/): # GUEST_TEST_BINARIES dir of internal test binaries (build/ by default) @@ -217,6 +217,33 @@ report_timeout() fail=$((fail + 1)) } +# Account for an optional binary or fixture being absent. The previous +# pattern (`if [ -e "$bin/X" ]; then test_check ... fi`) silently erased +# the assertion when X was missing, so the suite summary could report +# "all passed" while major coverage blocks never ran. require_binary +# always increments skip and emits a skip line, so absences are visible +# in the summary line at the bottom of each mode. +require_binary() +{ + local label="$1" path="$2" + if [ -e "$path" ]; then + return 0 + fi + test_report skip "$label" " (missing $path)" + skip=$((skip + 1)) + return 1 +} + +# Suite-level analog of require_binary for whole fixture directories. +# The label names the suite that is being skipped. Use this in place of +# bare `printf "SKIP\n"` lines so the skip counter reflects reality. +skip_suite() +{ + local label="$1" reason="$2" + test_report skip "$label" " ($reason)" + skip=$((skip + 1)) +} + test_check() { local runner="$1" @@ -235,11 +262,20 @@ test_check() report_timeout "$label" return fi - if echo "$output" | grep -qE "$pattern"; then + # Require a clean exit before trusting the regex. A crashing tool can + # still emit the expected substring on stdout before dying, and the + # earlier "regex match alone passes" behavior would have reported + # that as OK -- the same silent-skip shape that motivated the + # tightening of tests/driver.sh evaluate_result. + if [ "$rc" -ne 0 ]; then + test_report fail "$label" " (exit $rc)" + test_excerpt "$output" + fail=$((fail + 1)) + elif echo "$output" | grep -qE "$pattern"; then test_report ok "$label" pass=$((pass + 1)) else - test_report fail "$label" " (exit $rc)" + test_report fail "$label" " (pattern '$pattern' not found, rc=$rc)" test_excerpt "$output" fail=$((fail + 1)) fi @@ -295,11 +331,19 @@ test_pipe() report_timeout "$label" return fi - if echo "$output" | grep -qE "$pattern"; then + # See test_check for the rc=0 precondition rationale: a non-zero + # exit must surface as FAIL even when the regex matches, otherwise + # a crashing pipeline that happens to print the expected substring + # would be reported OK. + if [ "$rc" -ne 0 ]; then + test_report fail "$label" " (exit $rc)" + test_excerpt "$output" + fail=$((fail + 1)) + elif echo "$output" | grep -qE "$pattern"; then test_report ok "$label" pass=$((pass + 1)) else - test_report fail "$label" " (exit $rc)" + test_report fail "$label" " (pattern '$pattern' not found, rc=$rc)" test_excerpt "$output" fail=$((fail + 1)) fi @@ -438,16 +482,20 @@ run_coreutils_tests() test_rc "$runner" "timeout" 0 "$bindir/timeout" 5 "$bindir/true" printf "\nCoreutils encoding%s\n" "$_COREUTILS_SUFFIX" - if [ -e "$bindir/base32" ]; then + # The if/then form contains require_binary's exit status so missing + # binaries do not propagate as a function-exit-1 under `set -e`. The + # earlier `&& test_check` chain failed the matrix script outright + # whenever the LAST optional binary in a function was absent. + if require_binary "base32" "$bindir/base32"; then test_check "$runner" "base32" "NBSWY" "$bindir/base32" "$TEST_TMPDIR/hello.txt" fi test_check "$runner" "sha1sum" "hello.txt" "$bindir/sha1sum" "$TEST_TMPDIR/hello.txt" test_check "$runner" "sha512sum" "hello.txt" "$bindir/sha512sum" "$TEST_TMPDIR/hello.txt" - if [ -e "$bindir/b2sum" ]; then + if require_binary "b2sum" "$bindir/b2sum"; then test_check "$runner" "b2sum" "hello.txt" "$bindir/b2sum" "$TEST_TMPDIR/hello.txt" fi test_check "$runner" "cksum" "hello.txt" "$bindir/cksum" "$TEST_TMPDIR/hello.txt" - if [ -e "$bindir/numfmt" ]; then + if require_binary "numfmt" "$bindir/numfmt"; then test_check "$runner" "numfmt" "1\\.0[kK]" "$bindir/numfmt" --to=si 1000 fi } @@ -500,45 +548,49 @@ run_static_tests() printf "Static bins\n" - if [ -e "$bindir/dash" ]; then + if require_binary "dash" "$bindir/dash"; then test_check "$runner" "dash echo" "hello" "$bindir/dash" -c "echo hello" test_check "$runner" "dash arithmetic" "2\\+3=5" "$bindir/dash" -c 'echo "2+3=$((2+3))"' fi - if [ -e "$bindir/bash" ]; then + if require_binary "bash" "$bindir/bash"; then test_check "$runner" "bash echo" "hello" "$bindir/bash" -c "echo hello" test_pipe "$runner" "bash subshell" "sub=25" "" "$bindir/bash" -c 'echo "sub=$(echo $((5*5)))"' fi + # lua has two acceptable names; prefer 5.4, then fall back to plain lua, + # and skip with accounting if neither is present. if [ -e "$bindir/lua5.4" ]; then test_check "$runner" "lua hello" "Hello" "$bindir/lua5.4" -e 'print("Hello from " .. _VERSION)' test_check "$runner" "lua fib(30)" "832040" "$bindir/lua5.4" -e 'local function f(n) if n<2 then return n end; return f(n-1)+f(n-2) end; print(f(30))' elif [ -e "$bindir/lua" ]; then test_check "$runner" "lua hello" "Hello" "$bindir/lua" -e 'print("Hello from " .. _VERSION)' test_check "$runner" "lua fib(30)" "832040" "$bindir/lua" -e 'local function f(n) if n<2 then return n end; return f(n-1)+f(n-2) end; print(f(30))' + else + skip_suite "lua" "neither lua5.4 nor lua under $bindir" fi - if [ -e "$bindir/gawk" ]; then + if require_binary "gawk" "$bindir/gawk"; then test_pipe "$runner" "gawk field" "world" "hello world" "$bindir/gawk" '{print $2}' fi - if [ -e "$bindir/grep" ]; then + if require_binary "grep" "$bindir/grep"; then test_pipe "$runner" "grep basic" "hello" "hello world" "$bindir/grep" hello fi - if [ -e "$bindir/sed" ]; then + if require_binary "sed" "$bindir/sed"; then test_pipe "$runner" "sed subst" "HELLO" "hello" "$bindir/sed" 's/hello/HELLO/' fi - if [ -e "$bindir/jq" ]; then + if require_binary "jq" "$bindir/jq"; then test_pipe "$runner" "jq simple" "^1$" '{"a":1}' "$bindir/jq" '.a' test_pipe "$runner" "jq filter" "Alice" '{"users":[{"name":"Alice","age":30},{"name":"Bob","age":25}]}' "$bindir/jq" '.users[] | select(.age > 28) | .name' fi - if [ -e "$bindir/sqlite3" ]; then + if require_binary "sqlite3" "$bindir/sqlite3"; then test_check "$runner" "sqlite version" "^3\\." "$bindir/sqlite3" ":memory:" "SELECT sqlite_version();" test_check "$runner" "sqlite arith" "^42$" "$bindir/sqlite3" ":memory:" "SELECT 6 * 7;" fi - if [ -e "$bindir/tree" ]; then + if require_binary "tree" "$bindir/tree"; then test_check "$runner" "tree" "director" "$bindir/tree" "$TEST_TMPDIR" fi - if [ -e "$bindir/find" ]; then + if require_binary "find" "$bindir/find"; then test_check "$runner" "find" "hello.txt" "$bindir/find" "$TEST_TMPDIR" -name "hello.txt" fi - if [ -e "$bindir/diff" ]; then + if require_binary "diff" "$bindir/diff"; then test_rc "$runner" "diff identical" 0 "$bindir/diff" "$TEST_TMPDIR/hello.txt" "$TEST_TMPDIR/hello.txt" fi } @@ -587,29 +639,38 @@ run_suite() if [ -d "$GUEST_STATIC_BINS" ]; then run_static_tests "$runner" "$GUEST_STATIC_BINS" + else + skip_suite "static-bins" "no $GUEST_STATIC_BINS" fi - # Dynamic-musl coreutils — elfuse needs --sysroot, qemu just runs natively. + # Dynamic-musl coreutils: elfuse needs --sysroot, qemu runs natively. + # The skip line always increments the counter so a partial fixture + # set surfaces in the per-mode summary instead of looking like a + # full pass. if [ -d "$GUEST_DYNAMIC_COREUTILS" ]; then if [ "$mode" = "elfuse-aarch64" ] && [ -z "$GUEST_SYSROOT" ]; then - printf "\nDynamic coreutils (musl) — SKIP (no GUEST_SYSROOT)\n" + skip_suite "dyn-coreutils (musl)" "no GUEST_SYSROOT" else _COREUTILS_SUFFIX=" (musl dyn)" _SYSROOT="$GUEST_SYSROOT" run_coreutils_tests "$dyn_runner" "$GUEST_DYNAMIC_COREUTILS" _COREUTILS_SUFFIX="" fi + else + skip_suite "dyn-coreutils (musl)" "no $GUEST_DYNAMIC_COREUTILS" fi if [ -n "$GUEST_GLIBC_DYNAMIC_COREUTILS" ] && [ -d "$GUEST_GLIBC_DYNAMIC_COREUTILS" ]; then if [ "$mode" = "elfuse-aarch64" ] && [ -z "$GUEST_GLIBC_SYSROOT" ]; then - printf "\nDynamic coreutils (glibc) — SKIP (no GUEST_GLIBC_SYSROOT)\n" + skip_suite "dyn-coreutils (glibc)" "no GUEST_GLIBC_SYSROOT" else _COREUTILS_SUFFIX=" (glibc dyn)" _SYSROOT="$GUEST_GLIBC_SYSROOT" run_coreutils_tests "$dyn_runner" "$GUEST_GLIBC_DYNAMIC_COREUTILS" _COREUTILS_SUFFIX="" fi + else + skip_suite "dyn-coreutils (glibc)" "no GUEST_GLIBC_DYNAMIC_COREUTILS" fi _SYSROOT="" diff --git a/tests/test-perf.sh b/tests/test-perf.sh index 8175409..81879af 100755 --- a/tests/test-perf.sh +++ b/tests/test-perf.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# test-perf.sh — Performance comparison: native vs elfuse for grep/wc/cat +# test-perf.sh -- Performance comparison: native vs elfuse for grep/wc/cat # # Copyright 2026 elfuse contributors # Copyright 2025 Moritz Angermann, zw3rk pte. ltd. @@ -16,6 +16,10 @@ # Example: tests/test-perf.sh build/elfuse /path/to/tool/bin set -euo pipefail +# pipefail in particular matters here: several benchmarks pipe an +# elfuse-hosted producer (e.g. cat) into a native consumer (e.g. wc). +# Without pipefail, a producer crash returns rc=0 from the pipeline, +# so the elfuse-side failure was silently smoothed into a "fast" sample. ELFUSE="${1:?Usage: $0 }" TOOL_BIN="${2:?Usage: $0 }" @@ -45,8 +49,15 @@ epoch_us() echo $((sec * 1000000 + 10#$frac)) } +PERF_FAILED=0 + # Collect $RUNS timing samples for a command, print median and stats. # Args: label command... +# Earlier revisions swallowed every sample's exit status with `|| true`, +# which made a missing native binary, an elfuse crash, or a host SIP +# block silently degrade into "median 0 ms PASS". Now any non-zero +# sample aborts the timing for that label and flips PERF_FAILED so the +# script exits non-zero after running every other benchmark. benchmark() { local label="$1" @@ -54,10 +65,17 @@ benchmark() local times=() for _ in $(seq 1 $RUNS); do - local start end us + local start end us rc start=$(epoch_us) - "$@" > /dev/null 2>&1 || true + rc=0 + "$@" > /dev/null 2>&1 || rc=$? end=$(epoch_us) + if [ "$rc" -ne 0 ]; then + printf " %-22s ${YELLOW}FAIL${RESET} sample exited rc=%d\n" \ + "$label" "$rc" + PERF_FAILED=1 + return + fi us=$((end - start)) # Store as fractional ms string (1 decimal place) local ms_int=$((us / 1000)) @@ -85,25 +103,25 @@ for tool in grep wc cat sort; do fi done -# --- Test 1: Recursive grep across elfuse source --- +# Test 1: Recursive grep across elfuse source printf "${YELLOW}▸ grep -r '%s' (recursive, many file opens)${RESET}\n" "$PATTERN" benchmark "native /usr/bin/grep" /usr/bin/grep -r "$PATTERN" "$SRC_SUBDIR" benchmark "elfuse guest grep" "$ELFUSE" "$TOOL_BIN/grep" -r "$PATTERN" "$SRC_SUBDIR" echo -# --- Test 2: Single-file grep (measures startup overhead) --- +# Test 2: Single-file grep (measures startup overhead) printf "${YELLOW}▸ grep -c 'case' syscall.c (single file, startup-dominated)${RESET}\n" benchmark "native /usr/bin/grep" /usr/bin/grep -c "case" "$SYSCALL_C" benchmark "elfuse guest grep" "$ELFUSE" "$TOOL_BIN/grep" -c "case" "$SYSCALL_C" echo -# --- Test 3: wc -l on all source files --- +# Test 3: wc -l on all source files printf "${YELLOW}▸ wc -l *.c *.h (many small files)${RESET}\n" benchmark "native /usr/bin/wc" sh -c "/usr/bin/wc -l '$SRC_SUBDIR'/*.c '$SRC_SUBDIR'/*.h" benchmark "elfuse guest wc" sh -c "'$ELFUSE' '$TOOL_BIN/wc' -l '$SRC_SUBDIR'/*.c '$SRC_SUBDIR'/*.h" echo -# --- Test 4: I/O throughput — cat large file through wc --- +# Test 4: I/O throughput, cat large file through wc printf "${YELLOW}▸ cat ~10MiB | wc -l (I/O throughput)${RESET}\n" TMPFILE=$(mktemp) trap 'rm -f "$TMPFILE"' EXIT @@ -111,11 +129,14 @@ trap 'rm -f "$TMPFILE"' EXIT for _ in $(seq 1 100); do cat "$SYSCALL_C" >> "$TMPFILE"; done TMPSIZE=$(wc -c < "$TMPFILE" | tr -d ' ') printf " ${CYAN}(test file: %s bytes)${RESET}\n" "$TMPSIZE" -benchmark "native cat|wc" sh -c "cat '$TMPFILE' | wc -l" -benchmark "elfuse cat|wc" sh -c "'$ELFUSE' '$TOOL_BIN/cat' '$TMPFILE' | wc -l" +# sh -c spawns a child shell that does not inherit the outer pipefail +# from the script's `set -o pipefail`. Run the pipeline under bash so +# pipefail is available on systems whose /bin/sh is not bash-compatible. +benchmark "native cat|wc" bash -c "set -o pipefail; cat '$TMPFILE' | wc -l" +benchmark "elfuse cat|wc" bash -c "set -o pipefail; '$ELFUSE' '$TOOL_BIN/cat' '$TMPFILE' | wc -l" echo -# --- Test 5: sort (CPU + I/O mix) --- +# Test 5: sort (CPU + I/O mix) printf "${YELLOW}▸ sort syscall.c (CPU-bound sorting + I/O)${RESET}\n" benchmark "native /usr/bin/sort" /usr/bin/sort "$SYSCALL_C" benchmark "elfuse guest sort" "$ELFUSE" "$TOOL_BIN/sort" "$SYSCALL_C" @@ -124,3 +145,8 @@ echo printf "${BLUE}━━━ Done ━━━${RESET}\n" printf "${CYAN}Overhead is dominated by: VM startup (~1-3ms), per-syscall vmexit (~1-5us),\n" printf "and macOS VFS translation. Pure computation runs at native speed.${RESET}\n" + +if [ "$PERF_FAILED" -ne 0 ]; then + printf "\n${YELLOW}One or more benchmark samples failed (see FAIL lines).${RESET}\n" >&2 + exit 1 +fi diff --git a/tests/test-tier-b.c b/tests/test-proc-fidelity.c similarity index 61% rename from tests/test-tier-b.c rename to tests/test-proc-fidelity.c index 90668b5..43d2490 100644 --- a/tests/test-tier-b.c +++ b/tests/test-proc-fidelity.c @@ -1,29 +1,31 @@ -/* Tier B correctness and fidelity tests +/* /proc fidelity tests * * Copyright 2026 elfuse contributors * SPDX-License-Identifier: Apache-2.0 * - * Tests: fchmodat2, openat2 RESOLVE_*, O_PATH enforcement, madvise - * parity, /proc/self/oom_score_adj, /proc/self/fdinfo, cpuinfo scaling. + * Exercises the /proc nodes that elfuse synthesizes through procemu: + * /proc/self/oom_score_adj (read/write/persist/range/zero-length writev), + * the legacy /proc/self/oom_adj scaling alias, /proc/self/oom_score + * (open-time and write-time enforcement), /proc/self/fdinfo entries for + * generic fds plus eventfd/signalfd/timerfd, /proc/net/tcp serial-number + * density across mixed socket types, and /proc/cpuinfo CPU enumeration. + * These tests pin the host-visible byte format so a regression to the + * wrong separator, scaling factor, or sparse layout fails loudly. */ #include #include #include -#include +#include #include #include #include -#include #include #include #include +#include #include -#include -#include #include -#include -#include #include #include #include @@ -31,493 +33,24 @@ #include "test-harness.h" -#ifndef MAP_FIXED_NOREPLACE -#define MAP_FIXED_NOREPLACE 0x100000 -#endif - int passes = 0, fails = 0; -/* fchmodat2 (SYS 452). */ - -#ifndef SYS_fchmodat2 -#define SYS_fchmodat2 452 -#endif - -#ifndef SYS_getcpu -#define SYS_getcpu 168 -#endif - -static void test_fchmodat2_basic(void) -{ - TEST("fchmodat2 basic"); - char path[] = "/tmp/elfuse-test-fchmodat2-XXXXXX"; - int fd = mkstemp(path); - if (fd < 0) { - FAIL("mkstemp"); - return; - } - close(fd); - /* fchmodat2(AT_FDCWD, path, 0644, 0) should work like fchmodat */ - long rc = syscall(SYS_fchmodat2, AT_FDCWD, path, 0644, 0); - if (rc < 0) { - FAIL("fchmodat2"); - unlink(path); - return; - } - struct stat st; - stat(path, &st); - unlink(path); - EXPECT_TRUE((st.st_mode & 0777) == 0644, "mode mismatch"); -} - -static void test_getcpu_basic(void) -{ - TEST("getcpu basic"); - unsigned cpu = 99, node = 99; - long rc = syscall(SYS_getcpu, &cpu, &node, 0); - if (rc < 0) { - FAIL("getcpu"); - return; - } - EXPECT_TRUE(cpu == 0, "cpu should be 0"); - EXPECT_TRUE(node == 0, "node should be 0"); -} - -static void test_fchmodat2_symlink_nofollow(void) -{ - TEST("fchmodat2 AT_SYMLINK_NOFOLLOW"); - char target[] = "/tmp/elfuse-test-fchmodat2-target-XXXXXX"; - char linkpath[64]; - - int fd = mkstemp(target); - if (fd < 0) { - FAIL("mkstemp target"); - return; - } - close(fd); - /* Derive the symlink name from the unique target so we never call mktemp, - * which is racy and triggers a linker warning on glibc. - */ - snprintf(linkpath, sizeof(linkpath), "%s.lnk", target); - - if (symlink(target, linkpath) < 0) { - FAIL("symlink"); - unlink(target); - return; - } - - /* AT_SYMLINK_NOFOLLOW must change the symlink's mode, not the target's. */ - long rc = - syscall(SYS_fchmodat2, AT_FDCWD, linkpath, 0700, AT_SYMLINK_NOFOLLOW); - if (rc < 0) { - FAIL("fchmodat2 nofollow"); - goto out; - } - - struct stat st_link, st_target; - if (lstat(linkpath, &st_link) < 0) { - FAIL("lstat link"); - goto out; - } - if (stat(target, &st_target) < 0) { - FAIL("stat target"); - goto out; - } - - EXPECT_TRUE((st_link.st_mode & 0777) == 0700, "link mode mismatch"); - EXPECT_TRUE((st_target.st_mode & 0777) == 0600, "target mode changed"); - -out: - unlink(linkpath); - unlink(target); -} - -/* openat2 (SYS 437). */ - -#ifndef SYS_openat2 -#define SYS_openat2 437 -#endif - -struct open_how { - unsigned long long flags, mode, resolve; -}; - -#define RESOLVE_BENEATH 0x08 -#define RESOLVE_IN_ROOT 0x10 -#define RESOLVE_NO_MAGICLINKS 0x02 -#define RESOLVE_NO_SYMLINKS 0x04 - -static void test_openat2_basic(void) -{ - TEST("openat2 basic open"); - struct open_how how = {.flags = O_RDONLY, .mode = 0, .resolve = 0}; - long fd = syscall(SYS_openat2, AT_FDCWD, "/dev/null", &how, sizeof(how)); - if (fd < 0) { - FAIL("openat2"); - return; - } - close(fd); - PASS(); -} - -static void test_openat2_resolve_beneath(void) -{ - TEST("openat2 RESOLVE_BENEATH rejects .."); - /* Open a directory first */ - int dirfd = open("/tmp", O_RDONLY | O_DIRECTORY); - if (dirfd < 0) { - FAIL("open /tmp"); - return; - } - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; - long fd = syscall(SYS_openat2, dirfd, "../etc/passwd", &how, sizeof(how)); - close(dirfd); - if (fd >= 0) { - close(fd); - FAIL("should have rejected .. traversal"); - return; - } - EXPECT_TRUE(errno == EXDEV, "wrong errno"); -} - -static void test_openat2_resolve_beneath_allows_internal_dotdot(void) -{ - TEST("openat2 RESOLVE_BENEATH allows in-root .."); - - char dir_template[] = "/tmp/elfuse-openat2-beneath-XXXXXX"; - char subdir[PATH_MAX], target[PATH_MAX]; - int dirfd = -1, filefd = -1; - - if (!mkdtemp(dir_template)) { - FAIL("mkdtemp"); - return; - } - - snprintf(subdir, sizeof(subdir), "%s/subdir", dir_template); - snprintf(target, sizeof(target), "%s/file", dir_template); - if (mkdir(subdir, 0700) < 0) { - FAIL("mkdir"); - goto out; - } - - filefd = open(target, O_CREAT | O_RDONLY, 0600); - if (filefd < 0) { - FAIL("open file"); - goto out; - } - close(filefd); - filefd = -1; - - dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); - if (dirfd < 0) { - FAIL("open dir"); - goto out; - } - - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; - long fd = syscall(SYS_openat2, dirfd, "subdir/../file", &how, sizeof(how)); - if (fd < 0) { - FAIL("openat2"); - goto out; - } - close((int) fd); - PASS(); - -out: - if (dirfd >= 0) - close(dirfd); - if (filefd >= 0) - close(filefd); - unlink(target); - rmdir(subdir); - rmdir(dir_template); -} - -static void test_openat2_resolve_in_root_clamps_dotdot(void) -{ - TEST("openat2 RESOLVE_IN_ROOT clamps .. at root"); - - char dir_template[] = "/tmp/elfuse-openat2-inroot-XXXXXX"; - char target[PATH_MAX]; - int dirfd = -1, filefd = -1; - - if (!mkdtemp(dir_template)) { - FAIL("mkdtemp"); - return; - } - - snprintf(target, sizeof(target), "%s/file", dir_template); - filefd = open(target, O_CREAT | O_RDONLY, 0600); - if (filefd < 0) { - FAIL("open file"); - goto out; - } - close(filefd); - filefd = -1; - - dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); - if (dirfd < 0) { - FAIL("open dir"); - goto out; - } - - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_IN_ROOT}; - long fd = syscall(SYS_openat2, dirfd, "/../file", &how, sizeof(how)); - if (fd < 0) { - FAIL("openat2"); - goto out; - } - close((int) fd); - PASS(); - -out: - if (dirfd >= 0) - close(dirfd); - if (filefd >= 0) - close(filefd); - unlink(target); - rmdir(dir_template); -} - -static void test_openat2_resolve_no_symlinks_intermediate(void) -{ - TEST("openat2 RESOLVE_NO_SYMLINKS rejects intermediate symlink"); - - char dir_template[] = "/tmp/elfuse-openat2-XXXXXX"; - char target_dir[PATH_MAX], subfile[PATH_MAX]; - char link_path[PATH_MAX]; - int dirfd = -1, filefd = -1; - - if (!mkdtemp(dir_template)) { - FAIL("mkdtemp"); - return; - } - - snprintf(target_dir, sizeof(target_dir), "%s/real", dir_template); - snprintf(subfile, sizeof(subfile), "%s/subfile", target_dir); - snprintf(link_path, sizeof(link_path), "%s/link", dir_template); - - if (mkdir(target_dir, 0700) < 0) { - FAIL("mkdir"); - goto out; - } - filefd = open(subfile, O_CREAT | O_RDWR, 0600); - if (filefd < 0) { - FAIL("open subfile"); - goto out; - } - close(filefd); - filefd = -1; - if (symlink("real", link_path) < 0) { - FAIL("symlink"); - goto out; - } - - dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); - if (dirfd < 0) { - FAIL("open dir"); - goto out; - } - - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_SYMLINKS}; - long fd = syscall(SYS_openat2, dirfd, "link/subfile", &how, sizeof(how)); - if (fd >= 0) { - close((int) fd); - FAIL("expected ELOOP"); - goto out; - } - EXPECT_TRUE(errno == ELOOP, "wrong errno"); - -out: - if (dirfd >= 0) - close(dirfd); - if (filefd >= 0) - close(filefd); - unlink(link_path); - unlink(subfile); - rmdir(target_dir); - rmdir(dir_template); -} - -static void test_openat2_resolve_beneath_rejects_symlink_escape(void) -{ - TEST("openat2 RESOLVE_BENEATH rejects symlink escape"); - - char dir_template[] = "/tmp/elfuse-openat2-escape-XXXXXX"; - char link_path[PATH_MAX]; - int dirfd = -1; - - if (!mkdtemp(dir_template)) { - FAIL("mkdtemp"); - return; - } - - snprintf(link_path, sizeof(link_path), "%s/link", dir_template); - if (symlink("/etc", link_path) < 0) { - FAIL("symlink"); - goto out; - } - - dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); - if (dirfd < 0) { - FAIL("open dir"); - goto out; - } - - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; - long fd = syscall(SYS_openat2, dirfd, "link/passwd", &how, sizeof(how)); - if (fd >= 0) { - close((int) fd); - FAIL("expected EXDEV"); - goto out; - } - EXPECT_TRUE(errno == EXDEV, "wrong errno"); - -out: - if (dirfd >= 0) - close(dirfd); - unlink(link_path); - rmdir(dir_template); -} - -static void test_openat2_resolve_no_magiclinks_proc_fd(void) -{ - TEST("openat2 RESOLVE_NO_MAGICLINKS rejects /proc/self/fd"); - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_MAGICLINKS}; - long fd = - syscall(SYS_openat2, AT_FDCWD, "/proc/self/fd/0", &how, sizeof(how)); - if (fd >= 0) { - close((int) fd); - FAIL("expected ELOOP"); - return; - } - EXPECT_TRUE(errno == ELOOP, "wrong errno"); -} - -static void test_openat2_resolve_no_magiclinks_proc_cwd(void) -{ - TEST("openat2 RESOLVE_NO_MAGICLINKS rejects proc cwd magiclinks"); - struct open_how how = { - .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_MAGICLINKS}; - char cwd[256]; - - if (!getcwd(cwd, sizeof(cwd))) { - FAIL("getcwd"); - return; - } - if (chdir("/proc") < 0) { - FAIL("chdir"); - return; - } - - errno = 0; - long fd = syscall(SYS_openat2, AT_FDCWD, "self/fd/0", &how, sizeof(how)); - int saved_errno = errno; - if (chdir(cwd) < 0) { - FAIL("restore cwd"); - return; - } - - errno = saved_errno; - if (fd >= 0) { - close((int) fd); - FAIL("expected ELOOP"); - return; - } - EXPECT_TRUE(errno == ELOOP, "wrong errno"); -} - -/* O_PATH enforcement. */ - -#ifndef O_PATH -#define O_PATH 010000000 -#endif - -#ifndef MADV_COLD -#define MADV_COLD 20 -#endif - -static void test_opath_read_fails(void) -{ - TEST("O_PATH fd rejects read"); - int fd = open("/dev/null", O_PATH); - if (fd < 0) { - FAIL("open O_PATH"); - return; - } - char buf[1]; - ssize_t n = read(fd, buf, 1); - close(fd); - EXPECT_TRUE(n < 0 && errno == EBADF, "read should return EBADF"); -} - -static void test_opath_write_fails(void) -{ - TEST("O_PATH fd rejects write"); - int fd = open("/dev/null", O_PATH); - if (fd < 0) { - FAIL("open O_PATH"); - return; - } - ssize_t n = write(fd, "x", 1); - close(fd); - EXPECT_TRUE(n < 0 && errno == EBADF, "write should return EBADF"); -} - -static void test_opath_fstat_works(void) -{ - TEST("O_PATH fd allows fstat"); - int fd = open("/dev/null", O_PATH); - if (fd < 0) { - FAIL("open O_PATH"); - return; - } - struct stat st; - int rc = fstat(fd, &st); - close(fd); - EXPECT_TRUE(rc == 0, "fstat should work on O_PATH"); -} - -/* madvise parity. */ - -static void test_madvise_cold(void) -{ - TEST("madvise MADV_COLD accepted"); - void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (p == MAP_FAILED) { - FAIL("mmap"); - return; - } - int rc = madvise(p, 4096, MADV_COLD); - munmap(p, 4096); - EXPECT_TRUE(rc == 0, "madvise MADV_COLD"); -} - -static void test_madvise_dontneed_unmapped(void) -{ - TEST("madvise DONTNEED on unmapped returns ENOMEM"); - /* Map a page, then unmap the second half to create a hole */ - void *p = mmap(NULL, 8192, PROT_READ | PROT_WRITE, - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (p == MAP_FAILED) { - FAIL("mmap"); - return; - } - munmap((char *) p + 4096, 4096); - /* MADV_DONTNEED across the boundary should fail */ - int rc = madvise(p, 8192, MADV_DONTNEED); - munmap(p, 4096); - EXPECT_TRUE(rc < 0 && errno == ENOMEM, "expected ENOMEM for unmapped hole"); -} +/* Some procfs write-rejection tests can only be exercised as root (root + * bypasses the 0444 open-time gate, so the kernel actually invokes the + * proc node's write handler). On non-root the test cannot prove the + * lower layer is correct without also reproducing a kernel-internal + * regression; counting these as skips keeps the summary honest instead + * of either masking real regressions behind a bogus PASS or pretending + * the path was exercised. + */ +static int procfs_skips = 0; +#define PROCFS_SKIP(reason) \ + do { \ + printf("SKIP: %s\n", reason); \ + procfs_skips++; \ + } while (0) -/* /proc paths. */ +/* /proc/self/oom_* */ static void test_proc_oom_score_adj(void) { @@ -574,65 +107,6 @@ static void test_proc_oom_score_adj_persists_write(void) EXPECT_TRUE(atoi(buf) == 123, "value did not persist"); } -static void test_signalfd_efault_preserves_pending(void) -{ - TEST("signalfd EFAULT preserves pending signal"); - - sigset_t mask; - sigemptyset(&mask); - sigaddset(&mask, SIGUSR1); - sigprocmask(SIG_BLOCK, &mask, NULL); - - int fd = signalfd(-1, &mask, SFD_NONBLOCK); - if (fd < 0) { - FAIL("signalfd"); - return; - } - - kill(getpid(), SIGUSR1); - errno = 0; - /* Deliberately bad pointer to verify the kernel reports EFAULT. */ - ssize_t bad = syscall(SYS_read, fd, - /* cppcheck-suppress intToPointerCast */ - (void *) 1, sizeof(struct signalfd_siginfo)); - if (bad != -1 || errno != EFAULT) { - close(fd); - FAIL("expected EFAULT"); - return; - } - - struct signalfd_siginfo info; - memset(&info, 0, sizeof(info)); - ssize_t good = read(fd, &info, sizeof(info)); - close(fd); - if (good == (ssize_t) sizeof(info) && - info.ssi_signo == (uint32_t) SIGUSR1) { - PASS(); - } else { - FAIL("signal was lost after EFAULT"); - } -} - -static void test_proc_fdinfo(void) -{ - TEST("/proc/self/fdinfo/0 readable"); - int fd = open("/proc/self/fdinfo/0", O_RDONLY); - if (fd < 0) { - FAIL("open"); - return; - } - char buf[256]; - ssize_t n = read(fd, buf, sizeof(buf) - 1); - close(fd); - if (n > 0) { - buf[n] = '\0'; - EXPECT_TRUE(strstr(buf, "pos:") && strstr(buf, "flags:"), - "missing pos/flags fields"); - } else { - FAIL("read"); - } -} - static void test_proc_oom_score_adj_rejects_out_of_range(void) { TEST("/proc/self/oom_score_adj rejects out-of-range writes"); @@ -665,8 +139,13 @@ static void test_proc_oom_adj_scaling(void) int fd = open("/proc/self/oom_adj", O_RDWR); if (fd < 0) { - /* Some Linux configs deprecate oom_adj; treat absence as OK. */ - PASS(); + /* Older silent-PASS treated absence as acceptable, which turned + * the scaling regression into a no-op on any host that did not + * ship the legacy compat node. elfuse must expose it via the + * procemu layer, and current Linux kernels still keep the alias, + * so absence is a real regression. + */ + FAIL("open /proc/self/oom_adj"); return; } /* Linux fs/proc/base.c oom_adj_write special-cases OOM_ADJUST_MAX so @@ -706,7 +185,11 @@ static void test_proc_oom_adj_same_fd_roundtrip(void) int fd = open("/proc/self/oom_adj", O_RDWR); if (fd < 0) { - PASS(); + /* See test_proc_oom_adj_scaling for rationale: silent PASS on + * absent oom_adj turned the same-fd readback regression into a + * no-op. Fail hard so a missing compat alias surfaces. + */ + FAIL("open /proc/self/oom_adj"); return; } if (write(fd, "15\n", 3) != 3) { @@ -764,13 +247,20 @@ static void test_proc_oom_score_no_write(void) static void test_proc_oom_score_write_fails(void) { TEST("/proc/self/oom_score write is rejected"); + /* The intended coverage is the proc node's write handler returning + * EIO. That handler is only reached when open succeeds with write + * access. Only root can open the 0444 file O_WRONLY; non-root sees + * EACCES at open and exits the write-rejection path entirely. The + * sibling test_proc_oom_score_open_enforces_read_only covers the + * open-time EACCES branch separately, so explicitly skip here when + * the write path cannot be reached. + */ int fd = open("/proc/self/oom_score", O_WRONLY); if (fd < 0) { - /* Non-root environments cannot open read-only file for write; - * that is also acceptable proof the file is not writable. - */ if (errno == EACCES) { - PASS(); + PROCFS_SKIP( + "non-root cannot open oom_score O_WRONLY; write path " + "covered only when run as root"); return; } FAIL("open WRONLY"); @@ -821,7 +311,13 @@ static void test_proc_oom_adj_reread_tracks_score_adj_updates(void) int fd = open("/proc/self/oom_adj", O_RDONLY); if (fd < 0) { - PASS(); + /* The legacy oom_adj compat node must exist whenever the test + * runs under elfuse (procemu emits it) or under a current Linux + * kernel that still ships the compat alias. The previous version + * silently PASSed on open failure, which turned this regression + * into a no-op on any host where the file was absent. + */ + FAIL("open /proc/self/oom_adj"); return; } @@ -972,6 +468,28 @@ static void test_proc_oom_stat_size_zero(void) EXPECT_TRUE(st.st_size == 0, "st_size should be 0"); } +/* /proc/self/fdinfo */ + +static void test_proc_fdinfo(void) +{ + TEST("/proc/self/fdinfo/0 readable"); + int fd = open("/proc/self/fdinfo/0", O_RDONLY); + if (fd < 0) { + FAIL("open"); + return; + } + char buf[256]; + ssize_t n = read(fd, buf, sizeof(buf) - 1); + close(fd); + if (n > 0) { + buf[n] = '\0'; + EXPECT_TRUE(strstr(buf, "pos:") && strstr(buf, "flags:"), + "missing pos/flags fields"); + } else { + FAIL("read"); + } +} + static void test_proc_fdinfo_eventfd_count(void) { TEST("/proc/self/fdinfo/ exposes eventfd-count"); @@ -1217,6 +735,8 @@ static void test_proc_fdinfo_dirfd_openat_uses_virtual_entries(void) "fdinfo openat should yield synthetic payload"); } +/* /proc/net */ + static int bind_listen_loopback_tcp(void) { int s = socket(AF_INET, SOCK_STREAM, 0); @@ -1365,6 +885,8 @@ static void test_proc_net_dirfd_openat_uses_virtual_entries(void) "proc net dirfd should preserve synthetic tcp table"); } +/* /proc/cpuinfo */ + static void test_proc_cpuinfo_all_cpus(void) { TEST("/proc/cpuinfo lists all CPUs"); @@ -1399,70 +921,10 @@ static void test_proc_cpuinfo_all_cpus(void) } } -static void test_mmap_low_hint_exact(void) -{ - TEST("mmap low hint preserves ET_EXEC-style address"); - size_t len = 0x21000; - static const uintptr_t candidates[] = { - 0x00400000ULL, 0x00800000ULL, 0x01000000ULL, - 0x02000000ULL, 0x04000000ULL, 0x06000000ULL, - }; - void *hint = MAP_FAILED; - for (size_t i = 0; i < sizeof(candidates) / sizeof(candidates[0]); i++) { - hint = mmap((void *) candidates[i], len, PROT_NONE, - MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE, -1, 0); - if (hint != MAP_FAILED) - break; - if (errno != EEXIST && errno != EINVAL) { - FAIL("probe mmap"); - return; - } - } - if (hint == MAP_FAILED) { - FAIL("no free low hint candidate"); - return; - } - munmap(hint, len); - - void *p = mmap(hint, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (p == MAP_FAILED) { - FAIL("mmap"); - return; - } - EXPECT_TRUE((uintptr_t) p == (uintptr_t) hint, - "low mmap hint should be honored when range is free"); - munmap(p, len); -} - int main(void) { - printf("Tier B correctness tests:\n"); - - /* fchmodat2 */ - test_fchmodat2_basic(); - test_fchmodat2_symlink_nofollow(); - test_getcpu_basic(); - - /* openat2 RESOLVE_* */ - test_openat2_basic(); - test_openat2_resolve_beneath(); - test_openat2_resolve_beneath_allows_internal_dotdot(); - test_openat2_resolve_in_root_clamps_dotdot(); - test_openat2_resolve_no_symlinks_intermediate(); - test_openat2_resolve_beneath_rejects_symlink_escape(); - test_openat2_resolve_no_magiclinks_proc_fd(); - test_openat2_resolve_no_magiclinks_proc_cwd(); - - /* O_PATH */ - test_opath_read_fails(); - test_opath_write_fails(); - test_opath_fstat_works(); - - /* madvise */ - test_madvise_cold(); - test_madvise_dontneed_unmapped(); - - /* /proc */ + printf("/proc fidelity tests:\n"); + test_proc_oom_score_adj(); test_proc_oom_score_adj_persists_write(); test_proc_oom_score_adj_rejects_out_of_range(); @@ -1485,11 +947,13 @@ int main(void) test_proc_net_tcp_sl_dense(); test_proc_net_dirfd_openat_uses_virtual_entries(); test_proc_cpuinfo_all_cpus(); - test_mmap_low_hint_exact(); - /* signalfd */ - test_signalfd_efault_preserves_pending(); - - SUMMARY("test-tier-b"); + /* Local summary includes the skip count so missed coverage (e.g. + * non-root oom_score write path) is visible alongside passes and + * fails. Cannot reuse SUMMARY() from test-harness.h because it has + * no skip accounting. + */ + printf("\ntest-proc-fidelity: %d passed, %d failed, %d skipped%s\n", passes, + fails, procfs_skips, fails == 0 ? " - PASS" : " - FAIL"); return fails ? 1 : 0; } diff --git a/tests/test-proctitle-host.c b/tests/test-proctitle-host.c new file mode 100644 index 0000000..704bd34 --- /dev/null +++ b/tests/test-proctitle-host.c @@ -0,0 +1,153 @@ +/* Host-side regression for runtime_set_process_title. + * + * Copyright 2026 elfuse contributors + * SPDX-License-Identifier: Apache-2.0 + * + * This is a native macOS test (not a guest ELF). It deterministically + * catches two distinct regression classes against the proctitle fix: + * + * (a) any overshoot past the contiguous argv block in the current + * implementation. The argv strings are laid out so their NUL + * terminator sits at the last writable byte of a page, with the + * next page mapped PROT_NONE. The volatile bytewise loop cannot + * step into the guard; an optimizing compiler that folds it into + * a libc memset emitting cache-line-aligned stp/DC ZVA overshoot + * on Apple Silicon would trip the guard. + * + * (b) reverts to the pre-fix "argv+envp" upper-bound walk. The old + * code computed avail = max_end(argv, environ) - argv[0] and + * memset across that span. The test overrides the process-global + * environ to point at a sentinel string mmapped immediately above + * the PROT_NONE guard page; any reverted walk that consults + * environ produces an avail spanning the guard and SIGSEGVs. + * Without this override the test's catch on reverted code is + * non-deterministic because environ's address vs the test's + * anonymous mmap is unconstrained. + */ + +#include +#include +#include +#include +#include +#include + +#include "runtime/proctitle.h" + +static void on_sigsegv(int sig) +{ + (void) sig; + /* Cannot rely on stdio inside an async signal handler, but a single + * _exit with a recognizable code is sufficient: the run target prints + * a meaningful failure when the child exits with 139. + */ + _exit(139); +} + +int main(void) +{ + struct sigaction sa = {0}; + sa.sa_handler = on_sigsegv; + sigemptyset(&sa.sa_mask); + sa.sa_flags = SA_RESETHAND; + sigaction(SIGSEGV, &sa, NULL); + sigaction(SIGBUS, &sa, NULL); + + long pgsz = sysconf(_SC_PAGESIZE); + if (pgsz <= 0) { + fprintf(stderr, "test-proctitle-host: sysconf(_SC_PAGESIZE) failed\n"); + return 1; + } + + size_t page = (size_t) pgsz; + /* Layout: [page 0: writable argv] [page 1: PROT_NONE guard] + * [page 2: writable envp sentinel] + * A reverted argv+envp walk that consults environ computes an + * avail spanning all three pages and memsets through the guard. + */ + size_t map_size = page * 3; + char *base = (char *) mmap(NULL, map_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANON, -1, 0); + if (base == MAP_FAILED) { + perror("test-proctitle-host: mmap"); + return 1; + } + if (mprotect(base + page, page, PROT_NONE) < 0) { + perror("test-proctitle-host: mprotect"); + return 1; + } + + /* Synthesize a contiguous argv block whose tail aligns with the + * boundary, mimicking the host kernel placing argv at the top of the + * initial stack. Total length is chosen so the bin field after the + * last "/" character ("busybox") drives the rewritten title past the + * pre-existing argv[0] string length, exercising the truncation + * branch as well as the simple-copy branch on the next argv entry. + */ + static const char *parts[] = { + "elfuse", + "/path/to/busybox", + "echo", + "hello", + }; + const int nparts = (int) (sizeof(parts) / sizeof(parts[0])); + + size_t total = 0; + for (int i = 0; i < nparts; i++) + total += strlen(parts[i]) + 1; + if (total > page) { + fprintf(stderr, + "test-proctitle-host: synthetic argv exceeds one page\n"); + return 1; + } + + char *start = base + page - total; + char *cursor = start; + char *argv[5]; + for (int i = 0; i < nparts; i++) { + argv[i] = cursor; + size_t n = strlen(parts[i]) + 1; + memcpy(cursor, parts[i], n); + cursor += n; + } + argv[nparts] = NULL; + + if (cursor != base + page) { + fprintf(stderr, + "test-proctitle-host: layout did not reach the guard\n"); + return 1; + } + + /* Plant a sentinel envp string above the guard so the reverted + * argv+envp upper-bound walk computes avail spanning the guard. + */ + char *envp_str = base + page * 2; + static const char sentinel[] = "ELFUSE_PROCTITLE_TEST_SENTINEL=1"; + memcpy(envp_str, sentinel, sizeof(sentinel)); + char *synthetic_environ[] = {envp_str, NULL}; + extern char **environ; + char **saved_environ = environ; + environ = synthetic_environ; + + /* Any write past argv[0]+avail-1 (page boundary) trips the guard. */ + runtime_set_process_title(nparts, argv, "/path/to/busybox"); + + environ = saved_environ; + + /* Tail byte must be NUL, the prefix must form a non-empty C string, + * and the rewritten title must not have escaped the argv span (the + * byte after the block tail is unreadable, so verifying the tail + * byte alone is the strongest check available). + */ + if (start[total - 1] != '\0') { + fprintf(stderr, "test-proctitle-host: tail byte was not zeroed\n"); + return 1; + } + if (strnlen(start, total) == 0) { + fprintf(stderr, "test-proctitle-host: argv[0] left empty\n"); + return 1; + } + + printf("test-proctitle-host: PASS\n"); + return 0; +} diff --git a/tests/test-proctitle-low-stack.sh b/tests/test-proctitle-low-stack.sh index 276dc70..1f9262a 100644 --- a/tests/test-proctitle-low-stack.sh +++ b/tests/test-proctitle-low-stack.sh @@ -1,10 +1,23 @@ #!/usr/bin/env bash -# test-proctitle-low-stack.sh — Regress Apple Silicon argv/env stack overwrite +# test-proctitle-low-stack.sh -- Regress Apple Silicon argv/env stack overwrite # # Copyright 2026 elfuse contributors # SPDX-License-Identifier: Apache-2.0 # # Usage: tests/test-proctitle-low-stack.sh +# +# The original failure (see git log for runtime/proctitle.c) was an argv +# tail overshoot by Apple libc memset stp/DC ZVA ladders writing past the +# explicit byte count when the destination touched the stack ceiling. The +# fix walks only the contiguous argv block and stores byte-by-byte through +# a volatile pointer. +# +# This regression launches the standard busybox echo path under a host +# RLIMIT_STACK far below every macOS default (8176 KiB on Apple Silicon, +# 8192 KiB on Intel), and verifies the rewrite still terminates cleanly. +# Earlier revisions of this script gated the cap behind a -gt comparison +# that became a no-op on hosts whose default soft cap already met or +# beat the gate value; the cap is now applied unconditionally. set -euo pipefail @@ -16,40 +29,61 @@ TEST_TIMEOUT="${TEST_TIMEOUT:-10}" # shellcheck source=tests/lib/test-runner.sh source "$SCRIPT_DIR/lib/test-runner.sh" +# Override via env for local experiments. The default sits an order of +# magnitude below every observed macOS shell default and well above the +# floor where elfuse's own host runtime needs more stack than is provided +# (empirically ~560 KiB on macOS 26 / Apple M-series). +PROCTITLE_LOW_STACK_KIB="${PROCTITLE_LOW_STACK_KIB:-1024}" + +# Distinct exit codes from the wrapped child shell let the parent +# distinguish "rlimit setup failed" from "elfuse crashed". +ULIMIT_SETUP_FAIL=98 +ULIMIT_VERIFY_FAIL=99 + output= if output="$( # shellcheck disable=SC2016 # Positional params are expanded by the child shell. timeout "$TEST_TIMEOUT" sh -c ' - current_stack=$(ulimit -S -s) - case "$current_stack" in - unlimited) ulimit -S -s 8192 ;; - "" | *[!0-9]*) ;; - *) - if [ "$current_stack" -gt 8192 ]; then - ulimit -S -s 8192 - fi - ;; - esac + cap=$3 + if ! ulimit -S -s "$cap" 2>/dev/null; then + printf "test-proctitle-low-stack: ulimit -S -s %s rejected by shell\n" \ + "$cap" >&2 + exit 98 + fi + applied=$(ulimit -S -s) + if [ "$applied" != "$cap" ]; then + printf "test-proctitle-low-stack: requested %s KiB, got %s\n" \ + "$cap" "$applied" >&2 + exit 99 + fi exec "$1" "$2" echo hello - ' sh "$ELFUSE" "$BB" + ' sh "$ELFUSE" "$BB" "$PROCTITLE_LOW_STACK_KIB" )"; then : else rc=$? - if [ "$rc" -eq 124 ]; then - printf "test-proctitle-low-stack: elfuse hung under low stack (timeout after %ss)\n" \ - "$TEST_TIMEOUT" >&2 - exit 1 - fi - printf "test-proctitle-low-stack: elfuse failed under low stack (rc=%d)\n" \ - "$rc" >&2 - exit "$rc" + case $rc in + 124) + printf "test-proctitle-low-stack: elfuse hung at %s KiB stack (timeout %ss)\n" \ + "$PROCTITLE_LOW_STACK_KIB" "$TEST_TIMEOUT" >&2 + exit 1 + ;; + "$ULIMIT_SETUP_FAIL" | "$ULIMIT_VERIFY_FAIL") + # The wrapper already explained the failure. + exit 1 + ;; + *) + printf "test-proctitle-low-stack: elfuse failed at %s KiB stack (rc=%d)\n" \ + "$PROCTITLE_LOW_STACK_KIB" "$rc" >&2 + exit "$rc" + ;; + esac fi if [ "$output" != "hello" ]; then - printf "test-proctitle-low-stack: unexpected output under low stack: %s\n" \ - "$output" >&2 + printf "test-proctitle-low-stack: unexpected output at %s KiB stack: %s\n" \ + "$PROCTITLE_LOW_STACK_KIB" "$output" >&2 exit 1 fi -printf "test-proctitle-low-stack: PASS\n" +printf "test-proctitle-low-stack: PASS (stack=%s KiB)\n" "$PROCTITLE_LOW_STACK_KIB" diff --git a/tests/test-syscall-fidelity.c b/tests/test-syscall-fidelity.c new file mode 100644 index 0000000..a4e1001 --- /dev/null +++ b/tests/test-syscall-fidelity.c @@ -0,0 +1,614 @@ +/* Linux syscall fidelity tests + * + * Copyright 2026 elfuse contributors + * SPDX-License-Identifier: Apache-2.0 + * + * Covers Linux syscalls whose semantics elfuse must emulate exactly: + * fchmodat2 (SYS 452) including AT_SYMLINK_NOFOLLOW, getcpu (SYS 168), + * openat2 (SYS 437) with each RESOLVE_* flag variant (BENEATH, + * IN_ROOT, NO_SYMLINKS, NO_MAGICLINKS), O_PATH descriptor enforcement + * for read/write/fstat, madvise corner cases (MADV_COLD acceptance and + * MADV_DONTNEED across an unmapped hole), and the low-address mmap + * hint preservation that ET_EXEC layout depends on. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "test-harness.h" + +#ifndef MAP_FIXED_NOREPLACE +#define MAP_FIXED_NOREPLACE 0x100000 +#endif + +int passes = 0, fails = 0; + +/* Some Linux fidelity tests probe semantics that depend on filesystem + * support (e.g. changing a symlink's own mode via AT_SYMLINK_NOFOLLOW + * fails with EOPNOTSUPP on most Linux filesystems). Counting those as + * skips keeps the summary honest: the syscall path was reached and + * answered correctly, but the kernel declined the specific request. + * Hard-failing on EOPNOTSUPP would turn the regression into a false + * negative on perfectly conforming kernels. + */ +static int syscall_skips = 0; +#define SYSCALL_SKIP(reason) \ + do { \ + printf("SKIP: %s\n", reason); \ + syscall_skips++; \ + } while (0) + +/* fchmodat2 (SYS 452). */ + +#ifndef SYS_fchmodat2 +#define SYS_fchmodat2 452 +#endif + +#ifndef SYS_getcpu +#define SYS_getcpu 168 +#endif + +static void test_fchmodat2_basic(void) +{ + TEST("fchmodat2 basic"); + char path[] = "/tmp/elfuse-test-fchmodat2-XXXXXX"; + int fd = mkstemp(path); + if (fd < 0) { + FAIL("mkstemp"); + return; + } + close(fd); + /* fchmodat2(AT_FDCWD, path, 0644, 0) should work like fchmodat */ + long rc = syscall(SYS_fchmodat2, AT_FDCWD, path, 0644, 0); + if (rc < 0) { + FAIL("fchmodat2"); + unlink(path); + return; + } + struct stat st; + stat(path, &st); + unlink(path); + EXPECT_TRUE((st.st_mode & 0777) == 0644, "mode mismatch"); +} + +static void test_getcpu_basic(void) +{ + TEST("getcpu basic"); + unsigned cpu = 99, node = 99; + long rc = syscall(SYS_getcpu, &cpu, &node, 0); + if (rc < 0) { + FAIL("getcpu"); + return; + } + EXPECT_TRUE(cpu == 0, "cpu should be 0"); + EXPECT_TRUE(node == 0, "node should be 0"); +} + +static void test_fchmodat2_symlink_nofollow(void) +{ + TEST("fchmodat2 AT_SYMLINK_NOFOLLOW"); + char target[] = "/tmp/elfuse-test-fchmodat2-target-XXXXXX"; + char linkpath[64]; + + int fd = mkstemp(target); + if (fd < 0) { + FAIL("mkstemp target"); + return; + } + close(fd); + /* Derive the symlink name from the unique target so we never call mktemp, + * which is racy and triggers a linker warning on glibc. + */ + snprintf(linkpath, sizeof(linkpath), "%s.lnk", target); + + if (symlink(target, linkpath) < 0) { + FAIL("symlink"); + unlink(target); + return; + } + + /* AT_SYMLINK_NOFOLLOW must change the symlink's mode, not the target's. + * Most Linux filesystems (including tmpfs, ext4, btrfs without the + * symlink-mode opt-in) reject this with EOPNOTSUPP because the on-disk + * inode for a symlink has no separately writable mode bit. Treat that + * answer as an honest skip: the kernel reached fchmodat2_write and + * declined the specific request. Any other negative return is a real + * failure that the test should surface. + */ + long rc = + syscall(SYS_fchmodat2, AT_FDCWD, linkpath, 0700, AT_SYMLINK_NOFOLLOW); + if (rc < 0) { + if (errno == EOPNOTSUPP) { + SYSCALL_SKIP( + "fchmodat2 AT_SYMLINK_NOFOLLOW unsupported by host fs"); + goto out; + } + FAIL("fchmodat2 nofollow"); + goto out; + } + + struct stat st_link, st_target; + if (lstat(linkpath, &st_link) < 0) { + FAIL("lstat link"); + goto out; + } + if (stat(target, &st_target) < 0) { + FAIL("stat target"); + goto out; + } + + EXPECT_TRUE((st_link.st_mode & 0777) == 0700, "link mode mismatch"); + EXPECT_TRUE((st_target.st_mode & 0777) == 0600, "target mode changed"); + +out: + unlink(linkpath); + unlink(target); +} + +/* openat2 (SYS 437). */ + +#ifndef SYS_openat2 +#define SYS_openat2 437 +#endif + +struct open_how { + unsigned long long flags, mode, resolve; +}; + +#define RESOLVE_BENEATH 0x08 +#define RESOLVE_IN_ROOT 0x10 +#define RESOLVE_NO_MAGICLINKS 0x02 +#define RESOLVE_NO_SYMLINKS 0x04 + +static void test_openat2_basic(void) +{ + TEST("openat2 basic open"); + struct open_how how = {.flags = O_RDONLY, .mode = 0, .resolve = 0}; + long fd = syscall(SYS_openat2, AT_FDCWD, "/dev/null", &how, sizeof(how)); + if (fd < 0) { + FAIL("openat2"); + return; + } + close(fd); + PASS(); +} + +static void test_openat2_resolve_beneath(void) +{ + TEST("openat2 RESOLVE_BENEATH rejects .."); + /* Open a directory first */ + int dirfd = open("/tmp", O_RDONLY | O_DIRECTORY); + if (dirfd < 0) { + FAIL("open /tmp"); + return; + } + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; + long fd = syscall(SYS_openat2, dirfd, "../etc/passwd", &how, sizeof(how)); + close(dirfd); + if (fd >= 0) { + close(fd); + FAIL("should have rejected .. traversal"); + return; + } + EXPECT_TRUE(errno == EXDEV, "wrong errno"); +} + +static void test_openat2_resolve_beneath_allows_internal_dotdot(void) +{ + TEST("openat2 RESOLVE_BENEATH allows in-root .."); + + char dir_template[] = "/tmp/elfuse-openat2-beneath-XXXXXX"; + char subdir[PATH_MAX], target[PATH_MAX]; + int dirfd = -1, filefd = -1; + + if (!mkdtemp(dir_template)) { + FAIL("mkdtemp"); + return; + } + + snprintf(subdir, sizeof(subdir), "%s/subdir", dir_template); + snprintf(target, sizeof(target), "%s/file", dir_template); + if (mkdir(subdir, 0700) < 0) { + FAIL("mkdir"); + goto out; + } + + filefd = open(target, O_CREAT | O_RDONLY, 0600); + if (filefd < 0) { + FAIL("open file"); + goto out; + } + close(filefd); + filefd = -1; + + dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); + if (dirfd < 0) { + FAIL("open dir"); + goto out; + } + + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; + long fd = syscall(SYS_openat2, dirfd, "subdir/../file", &how, sizeof(how)); + if (fd < 0) { + FAIL("openat2"); + goto out; + } + close((int) fd); + PASS(); + +out: + if (dirfd >= 0) + close(dirfd); + if (filefd >= 0) + close(filefd); + unlink(target); + rmdir(subdir); + rmdir(dir_template); +} + +static void test_openat2_resolve_in_root_clamps_dotdot(void) +{ + TEST("openat2 RESOLVE_IN_ROOT clamps .. at root"); + + char dir_template[] = "/tmp/elfuse-openat2-inroot-XXXXXX"; + char target[PATH_MAX]; + int dirfd = -1, filefd = -1; + + if (!mkdtemp(dir_template)) { + FAIL("mkdtemp"); + return; + } + + snprintf(target, sizeof(target), "%s/file", dir_template); + filefd = open(target, O_CREAT | O_RDONLY, 0600); + if (filefd < 0) { + FAIL("open file"); + goto out; + } + close(filefd); + filefd = -1; + + dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); + if (dirfd < 0) { + FAIL("open dir"); + goto out; + } + + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_IN_ROOT}; + long fd = syscall(SYS_openat2, dirfd, "/../file", &how, sizeof(how)); + if (fd < 0) { + FAIL("openat2"); + goto out; + } + close((int) fd); + PASS(); + +out: + if (dirfd >= 0) + close(dirfd); + if (filefd >= 0) + close(filefd); + unlink(target); + rmdir(dir_template); +} + +static void test_openat2_resolve_no_symlinks_intermediate(void) +{ + TEST("openat2 RESOLVE_NO_SYMLINKS rejects intermediate symlink"); + + char dir_template[] = "/tmp/elfuse-openat2-XXXXXX"; + char target_dir[PATH_MAX], subfile[PATH_MAX]; + char link_path[PATH_MAX]; + int dirfd = -1, filefd = -1; + + if (!mkdtemp(dir_template)) { + FAIL("mkdtemp"); + return; + } + + snprintf(target_dir, sizeof(target_dir), "%s/real", dir_template); + snprintf(subfile, sizeof(subfile), "%s/subfile", target_dir); + snprintf(link_path, sizeof(link_path), "%s/link", dir_template); + + if (mkdir(target_dir, 0700) < 0) { + FAIL("mkdir"); + goto out; + } + filefd = open(subfile, O_CREAT | O_RDWR, 0600); + if (filefd < 0) { + FAIL("open subfile"); + goto out; + } + close(filefd); + filefd = -1; + if (symlink("real", link_path) < 0) { + FAIL("symlink"); + goto out; + } + + dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); + if (dirfd < 0) { + FAIL("open dir"); + goto out; + } + + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_SYMLINKS}; + long fd = syscall(SYS_openat2, dirfd, "link/subfile", &how, sizeof(how)); + if (fd >= 0) { + close((int) fd); + FAIL("expected ELOOP"); + goto out; + } + EXPECT_TRUE(errno == ELOOP, "wrong errno"); + +out: + if (dirfd >= 0) + close(dirfd); + if (filefd >= 0) + close(filefd); + unlink(link_path); + unlink(subfile); + rmdir(target_dir); + rmdir(dir_template); +} + +static void test_openat2_resolve_beneath_rejects_symlink_escape(void) +{ + TEST("openat2 RESOLVE_BENEATH rejects symlink escape"); + + char dir_template[] = "/tmp/elfuse-openat2-escape-XXXXXX"; + char link_path[PATH_MAX]; + int dirfd = -1; + + if (!mkdtemp(dir_template)) { + FAIL("mkdtemp"); + return; + } + + snprintf(link_path, sizeof(link_path), "%s/link", dir_template); + if (symlink("/etc", link_path) < 0) { + FAIL("symlink"); + goto out; + } + + dirfd = open(dir_template, O_RDONLY | O_DIRECTORY); + if (dirfd < 0) { + FAIL("open dir"); + goto out; + } + + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_BENEATH}; + long fd = syscall(SYS_openat2, dirfd, "link/passwd", &how, sizeof(how)); + if (fd >= 0) { + close((int) fd); + FAIL("expected EXDEV"); + goto out; + } + EXPECT_TRUE(errno == EXDEV, "wrong errno"); + +out: + if (dirfd >= 0) + close(dirfd); + unlink(link_path); + rmdir(dir_template); +} + +static void test_openat2_resolve_no_magiclinks_proc_fd(void) +{ + TEST("openat2 RESOLVE_NO_MAGICLINKS rejects /proc/self/fd"); + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_MAGICLINKS}; + long fd = + syscall(SYS_openat2, AT_FDCWD, "/proc/self/fd/0", &how, sizeof(how)); + if (fd >= 0) { + close((int) fd); + FAIL("expected ELOOP"); + return; + } + EXPECT_TRUE(errno == ELOOP, "wrong errno"); +} + +static void test_openat2_resolve_no_magiclinks_proc_cwd(void) +{ + TEST("openat2 RESOLVE_NO_MAGICLINKS rejects proc cwd magiclinks"); + struct open_how how = { + .flags = O_RDONLY, .mode = 0, .resolve = RESOLVE_NO_MAGICLINKS}; + char cwd[256]; + + if (!getcwd(cwd, sizeof(cwd))) { + FAIL("getcwd"); + return; + } + if (chdir("/proc") < 0) { + FAIL("chdir"); + return; + } + + errno = 0; + long fd = syscall(SYS_openat2, AT_FDCWD, "self/fd/0", &how, sizeof(how)); + int saved_errno = errno; + if (chdir(cwd) < 0) { + FAIL("restore cwd"); + return; + } + + errno = saved_errno; + if (fd >= 0) { + close((int) fd); + FAIL("expected ELOOP"); + return; + } + EXPECT_TRUE(errno == ELOOP, "wrong errno"); +} + +/* O_PATH enforcement. */ + +#ifndef O_PATH +#define O_PATH 010000000 +#endif + +#ifndef MADV_COLD +#define MADV_COLD 20 +#endif + +static void test_opath_read_fails(void) +{ + TEST("O_PATH fd rejects read"); + int fd = open("/dev/null", O_PATH); + if (fd < 0) { + FAIL("open O_PATH"); + return; + } + char buf[1]; + ssize_t n = read(fd, buf, 1); + close(fd); + EXPECT_TRUE(n < 0 && errno == EBADF, "read should return EBADF"); +} + +static void test_opath_write_fails(void) +{ + TEST("O_PATH fd rejects write"); + int fd = open("/dev/null", O_PATH); + if (fd < 0) { + FAIL("open O_PATH"); + return; + } + ssize_t n = write(fd, "x", 1); + close(fd); + EXPECT_TRUE(n < 0 && errno == EBADF, "write should return EBADF"); +} + +static void test_opath_fstat_works(void) +{ + TEST("O_PATH fd allows fstat"); + int fd = open("/dev/null", O_PATH); + if (fd < 0) { + FAIL("open O_PATH"); + return; + } + struct stat st; + int rc = fstat(fd, &st); + close(fd); + EXPECT_TRUE(rc == 0, "fstat should work on O_PATH"); +} + +/* madvise parity. */ + +static void test_madvise_cold(void) +{ + TEST("madvise MADV_COLD accepted"); + void *p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (p == MAP_FAILED) { + FAIL("mmap"); + return; + } + int rc = madvise(p, 4096, MADV_COLD); + munmap(p, 4096); + EXPECT_TRUE(rc == 0, "madvise MADV_COLD"); +} + +static void test_madvise_dontneed_unmapped(void) +{ + TEST("madvise DONTNEED on unmapped returns ENOMEM"); + /* Map a page, then unmap the second half to create a hole */ + void *p = mmap(NULL, 8192, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (p == MAP_FAILED) { + FAIL("mmap"); + return; + } + munmap((char *) p + 4096, 4096); + /* MADV_DONTNEED across the boundary should fail */ + int rc = madvise(p, 8192, MADV_DONTNEED); + munmap(p, 4096); + EXPECT_TRUE(rc < 0 && errno == ENOMEM, "expected ENOMEM for unmapped hole"); +} + +/* mmap low-hint preservation. */ + +static void test_mmap_low_hint_exact(void) +{ + TEST("mmap low hint preserves ET_EXEC-style address"); + size_t len = 0x21000; + static const uintptr_t candidates[] = { + 0x00400000ULL, 0x00800000ULL, 0x01000000ULL, + 0x02000000ULL, 0x04000000ULL, 0x06000000ULL, + }; + void *hint = MAP_FAILED; + for (size_t i = 0; i < sizeof(candidates) / sizeof(candidates[0]); i++) { + hint = mmap((void *) candidates[i], len, PROT_NONE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE, -1, 0); + if (hint != MAP_FAILED) + break; + if (errno != EEXIST && errno != EINVAL) { + FAIL("probe mmap"); + return; + } + } + if (hint == MAP_FAILED) { + FAIL("no free low hint candidate"); + return; + } + munmap(hint, len); + + void *p = mmap(hint, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (p == MAP_FAILED) { + FAIL("mmap"); + return; + } + EXPECT_TRUE((uintptr_t) p == (uintptr_t) hint, + "low mmap hint should be honored when range is free"); + munmap(p, len); +} + +int main(void) +{ + printf("Linux syscall fidelity tests:\n"); + + /* fchmodat2 / getcpu */ + test_fchmodat2_basic(); + test_fchmodat2_symlink_nofollow(); + test_getcpu_basic(); + + /* openat2 RESOLVE_* */ + test_openat2_basic(); + test_openat2_resolve_beneath(); + test_openat2_resolve_beneath_allows_internal_dotdot(); + test_openat2_resolve_in_root_clamps_dotdot(); + test_openat2_resolve_no_symlinks_intermediate(); + test_openat2_resolve_beneath_rejects_symlink_escape(); + test_openat2_resolve_no_magiclinks_proc_fd(); + test_openat2_resolve_no_magiclinks_proc_cwd(); + + /* O_PATH */ + test_opath_read_fails(); + test_opath_write_fails(); + test_opath_fstat_works(); + + /* madvise */ + test_madvise_cold(); + test_madvise_dontneed_unmapped(); + + /* mmap low-hint */ + test_mmap_low_hint_exact(); + + printf("\ntest-syscall-fidelity: %d passed, %d failed, %d skipped%s\n", + passes, fails, syscall_skips, fails == 0 ? " - PASS" : " - FAIL"); + return fails ? 1 : 0; +}