Forward termination signals to child process#332
Merged
Conversation
Commit d2a3a16, "On Linux, relaunch the current process when running as PID 1 (buildbarn#311)", relaunches the command in a child process and then waits for it to exit, if the parent process has PID 1. If the command is ran in a container, any termination signal sent to the container is forwarded to the parent process. This unfortunately prevents the persistent state files from being saved to disk if the container is killed, as the child process never receives the shutdown signal. This commit forwards `SIGINT` and `SIGTERM` signals to the child process, which then saves the state files to disk, exits, and triggers a graceful exit in PID 1. Co-authored-by: Fredrik Medley <fredrik@meroton.com>
|
Contributor
Author
|
EdSchouten
approved these changes
Mar 24, 2026
Member
\o/ |
Contributor
Author
|
@EdSchouten Great! Is there anything else or can we go ahead and merge? |
Member
|
Oh sorry. Was waiting for CI to complete, and forgot to merge. |
aron-muon
added a commit
to aron-muon/bb-storage
that referenced
this pull request
May 13, 2026
The previous fix bumped time.Sleep(5) -> time.Sleep(time.Second) and
os.Exit(1) -> os.Exit(0) on the assumption that the fallback was
firing on clean shutdown. A bb_worker termination test on staging
showed the program still exits with code 2, never reaching the
fallback at all.
Root cause traced via strace + Go runtime source:
terminateWithSignal() does signal.Reset(sig) followed by
process.Signal(sig) to make the program "die from the original
signal" so init systems see a signal-style exit. But signal.Reset()
only disables Go's user-channel routing for that signal — it does
NOT install SIG_DFL. The runtime's signal trampoline is still in
place and intercepts the raised signal, dispatching it to
runtime.dieFromSignal() (runtime/signal_unix.go).
dieFromSignal() then does:
raise(sig); osyield x3; setsig(sig, _SIG_DFL); raise(sig); osyield x3
exit(2)
PID 1 in a PID namespace and multi-goroutine Go programs reliably
hit the exit(2) fall-through. Our time.Sleep(time.Second) +
os.Exit(0) never executed because the runtime exited first. The
behaviour predates buildbarn#332 (signal forwarding to child); buildbarn#332 just made
it observable for bb_worker because graceful shutdown is now reachable.
Fix: skip the signal-raise dance entirely. We initiated this
shutdown intentionally — either an explicit signal from the parent
or all routines completed cleanly — so just exit 0. Container exit 0
is "Succeeded" in k8s, vs the spurious "Failed: Error" the runtime
exit(2) produced.
Tradeoff: callers no longer observe a signal-style exit (e.g. 143
for SIGTERM). For PID 1 / single-process supervisors that's a real
loss; for k8s/systemd-restartPolicy=Always it's a no-op since
exit-code-meaning is identical. Better to lose the cosmetic signal
exit code than to ship a fundamentally broken shutdown path.
Validated on staging:
ghcr.io/aron-muon/bb-worker:20260513T141137Z-1d0616a (with sleep+exit(0))
-> exit 2 across 4 termination tests
ghcr.io/aron-muon/bb-worker:<next> (this commit)
-> expected: exit 0
References:
golang/go#19326
golang/go#46321
Go runtime: src/runtime/signal_unix.go dieFromSignal()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Commit d2a3a16,
"On Linux, relaunch the current process when running as PID 1 (#311)", relaunches the command in a child process and then waits for it to exit, if the parent process has PID 1. If the command is ran in a container, any termination signal sent to the container is forwarded to the parent process. This unfortunately prevents the persistent state files from being saved to disk if the container is killed, as the child process never receives the shutdown signal.
This commit forwards
SIGINTandSIGTERMsignals to the child process, which then saves the state files to disk, exits, and triggers a graceful exit in PID 1.