Skip to content

ENT-13720: Fixed daemon hang on SIGTERM during child process wait (3.27.x)#6130

Draft
larsewi wants to merge 2 commits into
cfengine:3.27.xfrom
larsewi:cf-serverd-sigterm-3.27.x
Draft

ENT-13720: Fixed daemon hang on SIGTERM during child process wait (3.27.x)#6130
larsewi wants to merge 2 commits into
cfengine:3.27.xfrom
larsewi:cf-serverd-sigterm-3.27.x

Conversation

@larsewi
Copy link
Copy Markdown
Contributor

@larsewi larsewi commented May 18, 2026

ShellCommandReturnsZero retried waitpid() unconditionally on EINTR, so daemons blocked waiting for a child — e.g. cf-promises during policy validation — stayed unresponsive to SIGTERM until the child finished. HandleSignalsForDaemon sets PENDING_TERMINATION, but the main loop never got control back to check it.

On interrupted waitpid, we now check IsPendingTermination() and, if set, stop the child via ProcessSignalTerminate and reap it, so the daemon can exit.

Observed in the valgrind-checks CI job, where pkill -f cf-serverd failed to kill the bootstrap daemon and the valgrind-wrapped replacement could not bind to the listening port.

Backported from #6129

Build Status

ShellCommandReturnsZero retried waitpid() unconditionally on EINTR, so
daemons (cf-serverd, cf-execd, cf-monitord) blocked waiting for a child
process -- such as cf-promises during policy validation -- stayed
unresponsive to SIGTERM until the child finished. The signal handler
set PENDING_TERMINATION but the main loop never got control back to
check it.

Now, when waitpid is interrupted and termination is pending, the child
is stopped via ProcessSignalTerminate (SIGINT -> SIGTERM -> SIGKILL)
and reaped, so the daemon's main loop can exit promptly.

Ticket: ENT-13720
Changelog: Title
Signed-off-by: Lars Erik Wik <lars.erik.wik@northern.tech>
(cherry picked from commit 243a10f)
The previous attempt checked IsPendingTermination() in the EINTR
branch of the blocking waitpid() loop, but that branch is never
reached: signal() on Linux/glibc installs handlers with SA_RESTART,
so the kernel transparently restarts waitpid() after the handler
runs and the userspace EINTR check never fires. The daemon stays
blocked in waitpid() until the child exits on its own, which is the
exact symptom we set out to fix.

Poll the child with waitpid(WNOHANG) instead, so we get control back
between iterations and can react to PENDING_TERMINATION regardless of
whether the signal interrupts the syscall. nanosleep() between polls
keeps the loop from busy-spinning; since it is never restarted across
signals, SIGTERM wakes us up promptly and the 100 ms interval is only
an upper bound on idle wakeup latency.

References (Linux man-pages 6.9.1):

  signal(2):
    "By default, in glibc 2 and later, the signal() wrapper function
     does not invoke the kernel system call. Instead, it calls
     sigaction(2) using flags that supply BSD semantics. [...] The
     BSD semantics are equivalent to calling sigaction(2) with the
     following flags: sa.sa_flags = SA_RESTART;"

  signal(7), "Interruption of system calls and library functions by
  signal handlers":
    "If a blocked call to one of the following interfaces is
     interrupted by a signal handler, then the call is automatically
     restarted after the signal handler returns if the SA_RESTART
     flag was used; otherwise the call fails with the error EINTR:
     [...] wait(2), wait3(2), wait4(2), waitid(2), and waitpid(2)."

  signal(7), same section:
    "The following interfaces are never restarted after being
     interrupted by a signal handler, regardless of the use of
     SA_RESTART; they always fail with the error EINTR when
     interrupted by a signal handler: [...] Sleep interfaces:
     clock_nanosleep(2), nanosleep(2), and usleep(3)."

Ticket: ENT-13720
Signed-off-by: Lars Erik Wik <lars.erik.wik@northern.tech>
(cherry picked from commit ec2627e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant