Skip to content

Tighten FUSE wait semantics#36

Merged
jserv merged 1 commit into
mainfrom
fuse
May 15, 2026
Merged

Tighten FUSE wait semantics#36
jserv merged 1 commit into
mainfrom
fuse

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 15, 2026

Three correctness gaps remained in the guest-internal FUSE transport and the CLONE_THREAD path after the initial landing. Multi-model review flagged them; this change closes them.

FUSE wait-path EINTR (src/syscall/fuse.c, src/syscall/signal.{c,h}):

  • A new helper signal_pending_interruption(restart_out) inspects every unblocked pending bit and reports whether the effective delivery is non-disruptive for every signal in the set. A signal is non-disruptive when its handler is SIG_IGN, when SIG_DFL resolves to default-ignore (SIGCHLD, SIGURG, SIGWINCH), or when a user handler has SA_RESTART set. Any other signal forces the caller to treat the wait as interrupted, so a SIGTERM hiding behind an ignored SIGCHLD cannot stay invisible to the caller.
  • fuse_request_locked detaches the request and returns -EINTR only when the deliverable set contains a disruptive signal. SA_RESTART and ignored signals let the wait continue until the daemon replies, matching the application-visible contract of those handlers and avoiding a useless FUSE_INTERRUPT for work the guest still wants.

FUSE_FORGET reference-count integrity (src/syscall/fuse.c):

  • fuse_walk_path_locked drops the previous component's lookup hold on any error return so partial-walk failures (e.g. ENOENT on a deep component) no longer leak a reference per surviving prefix.
  • fuse_release_common_locked emits a compensating FUSE_FORGET on the O_PATH path. O_PATH opens skip FUSE_OPEN but still consume an nlookup during the walk; the prior early-return left that ref hanging on the daemon.
  • fuse_lookup_locked issues a single compensating FUSE_FORGET when the per-session ref table is full so the daemon's nlookup view stays balanced even when elfuse runs out of local capacity.
  • The per-session ref table cap rises from 256 to 4096 so realistic recursive walks no longer hit the compensating-FORGET path.

CLONE_THREAD startup-readiness (src/runtime/forkipc.c):

  • sys_clone_thread waits on a thread_startup_t condvar until the worker reports current_thread publication or an explicit -EIO failure. The worker's HVF bring-up (hv_vcpu_create plus every sysreg, GPR, SIMD, and PC write) goes through a checked WORKER_HV macro, so a transient HVF error rolls back instead of aborting the process via HV_CHECK.
  • The startup_failed cleanup path drops the thread slot before destroying the vCPU, so a concurrent thread_interrupt_all cannot observe a slot whose t->vcpu has just been cleared.
  • Both failure paths (pthread_create EAGAIN and post-handshake -EIO) roll back PARENT_SETTID and CHILD_SETTID guest writes so the caller never observes a live-looking TID for a thread that never started.

Summary by cubic

Tightens FUSE wait semantics and lookup ref accounting, and adds a CLONE_THREAD startup handshake to avoid stale TIDs and races. Improves correctness for interrupted FUSE I/O, path walks, and thread bring-up.

  • Bug Fixes
    • FUSE waits now return EINTR only for disruptive signals; ignored and SA_RESTART signals keep waiting. When detaching, send a single FUSE_INTERRUPT and free detached/no-reply requests on close; others fail with ENOTCONN.
    • Prevents nlookup leaks: drop the prior component on walk errors; send FORGET on O_PATH close; also drop the final lookup on RELEASE/RELEASEDIR. Path resolution can retain or drop the final LOOKUP; callers updated and now drop lookup refs on early exits. Adds batch FORGET and compensating FORGET when the local table is full; table cap raised to 4096.
    • CLONE_THREAD handshakes startup via a condvar; worker reports success or -EIO. On failure, deactivate the thread slot before vCPU destroy and roll back PARENT/CHILD_SETTID writes so no fake TIDs remain. vCPU register setup uses checked calls to allow clean rollback.
    • Tests exercise EINTR via SIGUSR1, verify FUSE_INTERRUPT handling, and assert FORGET traffic is observed.

Written for commit ad15853. Summary will update on new commits. Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

Correctness gaps remained in the guest-internal FUSE transport and the
CLONE_THREAD path after the initial landing. Multi-model review flagged
them; this change closes them.

FUSE wait-path EINTR (src/syscall/fuse.c, src/syscall/signal.{c,h}):
- A new helper signal_pending_interruption(restart_out) inspects every
  unblocked pending bit and reports whether the effective delivery is
  non-disruptive for every signal in the set. A signal is
  non-disruptive when its handler is SIG_IGN, when SIG_DFL resolves to
  default-ignore (SIGCHLD, SIGURG, SIGWINCH), or when a user handler
  has SA_RESTART set. Any other signal forces the caller to treat the
  wait as interrupted, so a SIGTERM hiding behind an ignored SIGCHLD
  cannot stay invisible to the caller.
- fuse_request_locked detaches the request and returns -EINTR only
  when the deliverable set contains a disruptive signal. SA_RESTART
  and ignored signals let the wait continue until the daemon replies,
  matching the application-visible contract of those handlers and
  avoiding a useless FUSE_INTERRUPT for work the guest still wants.

FUSE_FORGET reference-count integrity (src/syscall/fuse.c):
- fuse_walk_path_locked drops the previous component's lookup hold on
  any error return so partial-walk failures (e.g. ENOENT on a deep
  component) no longer leak a reference per surviving prefix.
- fuse_release_common_locked emits a compensating FUSE_FORGET on the
  O_PATH path. O_PATH opens skip FUSE_OPEN but still consume an
  nlookup during the walk; the prior early-return left that ref
  hanging on the daemon.
- fuse_lookup_locked issues a single compensating FUSE_FORGET when
  the per-session ref table is full so the daemon's nlookup view
  stays balanced even when elfuse runs out of local capacity.
- The per-session ref table cap rises from 256 to 4096 so realistic
  recursive walks no longer hit the compensating-FORGET path.

CLONE_THREAD startup-readiness (src/runtime/forkipc.c):
- sys_clone_thread waits on a thread_startup_t condvar until the
  worker reports current_thread publication or an explicit -EIO
  failure. The worker's HVF bring-up (hv_vcpu_create plus every
  sysreg, GPR, SIMD, and PC write) goes through a checked WORKER_HV
  macro, so a transient HVF error rolls back instead of aborting the
  process via HV_CHECK.
- The startup_failed cleanup path drops the thread slot before
  destroying the vCPU, so a concurrent thread_interrupt_all cannot
  observe a slot whose t->vcpu has just been cleared.
- Both failure paths (pthread_create EAGAIN and post-handshake -EIO)
  roll back PARENT_SETTID and CHILD_SETTID guest writes so the caller
  never observes a live-looking TID for a thread that never started.
@jserv jserv merged commit 382db3c into main May 15, 2026
4 checks passed
@jserv jserv deleted the fuse branch May 15, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant