Move poststart hook from runc create to runc start#5186
Move poststart hook from runc create to runc start#5186rata merged 3 commits intoopencontainers:mainfrom
Conversation
0aee5a8 to
2d88d2e
Compare
|
We will have to revert this if it breaks anyone. But I don't think hooks are so used (they were mostly unexported to upper layers), so hopefully we can do this and no one notices? |
|
I thought we decided to just keep the broken poststart hook as is and push people to use the new createRuntime and friends? EDIT: I think the issue is that opencontainers/runtime-spec#1169 missed a few changes when changing the explanation of |
|
I suspect the issue is we just need to update the lifecycle doc in the spec, since the description of
|
|
@cyphar this one is about poststart only. Only Currently runc implements I feel we need to fix the discrepancy in one way or another. |
|
@kolyshkin I built runc from your branch poststart (git commit 2d88d2e). |
|
can you add Co-authored add me @kolyshkin |
|
/cc @elezar : Evan, would it break any of the usages in nvidia container toolkit? |
|
The NVIDIA tooling uses only createContainer hooks in the CDI-based approach and the precreate hook in the legacy stack. Changing poststart behaviour should not affect this. |
halaney
left a comment
There was a problem hiding this comment.
We use a good bit of OCI hooks, but don't use this one (anymore, and digging thru the history it seems that's actually due to this bug I think).
This seems to make sense to me and matches the spec as I read it.
I haven't actually used any of your code here (although I've reviewed it). In any case, thank you for your contribution! I have acknowledged it by adding the |
rata
left a comment
There was a problem hiding this comment.
I'm a little scared of this change, does this enables something that was not possible before?
But if the consensus is to do the change, as most big users seem are fine with this, I'm fine. We can always revert. LGTM
|
cc @robmry would this address moby/moby#51077 ? |
|
The fedora failure seemed unrelated, re-running |
I think so, yes. |
lifubang
left a comment
There was a problem hiding this comment.
I think we should address the race condition for runc run here:
#5186 (comment)
rata
left a comment
There was a problem hiding this comment.
Found the race, posted an explanation and fixes. @lifubang @kolyshkin PTAL
| defer func() { | ||
| if retErr != nil { | ||
| // A poststart hook failed; kill the container. | ||
| if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) { |
There was a problem hiding this comment.
| if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) { | |
| if err := ignoreTerminateErrors(parent.terminate()); err != nil { |
This line is causing the race that @lifubang detected. With the suggestion here, I can't trigger it anymore. It is what the old code was doing (I didn't realize at first that this was changed! 😱 ). The key is that parent.terminate() does a kill AND a wait.
Doing this change alone is enough to fix the race. But there is another issue, that I think we should also fix.
The call to currentOCIState() has the side effect of changing the state to running. So the following happens:
func (r *runner) run(config *specs.Process) from util_linux.go
--> r.container.Run(process)
--> func (c *Container) Run(process *Process) error
---> return c.exec() // This returns the post-start error with PR
But to get there, we called currentOCIState() that ends up calling refreshState(). Because we already deleted the fifo (we call handle fifo just before the post-start hook, that calls the handleFifoResult that deletes the fifo), that can end up setting the state to running.
That is a complicated (or detailed, you choose :D) way to say that calling currentOCIState() ends up setting the state to running. And that causes issues with Destroy().
The thing is, the outer most run function aforementioned (the one that takes configs as param), has the r.destroy() defer in case of errors (that we do hit an error here). And destroy when we are in the running state returns an error if we still have an init.
Sooo, I think we should:
a) Keep the code as it was using p.terminate(), so we kill and wait for the process
b) Don't call currentOCIState() that causes the state machine to be inconsistent, probably just set the status to running manually (having a quick look, this seems to be what crun also does for these hooks).
So something like this:
diff --git libcontainer/container_linux.go libcontainer/container_linux.go
index adddebbf..2f7186e2 100644
--- libcontainer/container_linux.go
+++ libcontainer/container_linux.go
@@ -268,15 +268,24 @@ func (c *Container) postStart() (retErr error) {
defer func() {
if retErr != nil {
// A poststart hook failed; kill the container.
- if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) {
+ if err := ignoreTerminateErrors(c.initProcess.terminate()); err != nil {
logrus.WithError(err).Warn("kill after failed poststart")
}
}
}()
- s, err := c.currentOCIState()
- if err != nil {
- return err
+ // Build the OCI state directly rather than calling currentOCIState(),
+ // which would trigger refreshState() and transition c.state to
+ // "running". That transition causes Destroy() to fail if the init
+ // process is still alive when the poststart hook fails.
+ bundle, annotations := utils.Annotations(c.config.Labels)
+ s := &specs.State{
+ Version: specs.Version,
+ ID: c.ID(),
+ Bundle: bundle,
+ Annotations: annotations,
+ Status: specs.StateRunning,
+ Pid: c.initProcess.pid(),
}
return c.config.Hooks.Run(configs.Poststart, s)
}
There was a problem hiding this comment.
Well, the c.initProcess.terminate() (killing init) won't work when a container don't have own pidns (the case described and handled in (*container).signal.
OTOH I see that missing wait is the issue in the current implementation because we're still init's parent / child reaper. Wonder if adding c.initProcess.wait() will fix both issues (one with the state as well, because after wait the init is gone) without introducing some other bug.
There was a problem hiding this comment.
The test results seem stable now.
Rename c.signal to c.signalInit, and add c.signal which is a lock-less form of c.Signal. To be used by the next patch. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
No functional change. To be used by the next patch. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The runtime-spec [1] currently says: > 6. Runtime's start command is invoked with the unique identifier of > the container. > 7. The startContainer hooks MUST be invoked by the runtime. If any > startContainer hook fails, the runtime MUST generate an error, stop > the container, and continue the lifecycle at step 12. > 8. The runtime MUST run the user-specified program, as specified by > process. > 9. The poststart hooks MUST be invoked by the runtime. If any > poststart hook fails, the runtime MUST generate an error, stop the > container, and continue the lifecycle at step 12. > ... > 11. Runtime's delete command is invoked with the unique identifier of > the container. > 12. The container MUST be destroyed by undoing the steps performed > during create phase (step 2). > 13. The poststop hooks MUST be invoked by the runtime. If any poststop > hook fails, the runtime MUST log a warning, but the remaining hooks > and lifecycle continue as if the hook had succeeded. Currently, we do 9 before 8 (heck, even before 6), which is clearly against the spec and results in issues like the one described in [2]. Let's move running poststart hook to after the user-specified process has started. NOTE this patch only fixes the order and does not implement removing the container when the poststart hook failed (as this part of the spec is controversial -- destroy et al and should probably be, and currently are, part of "runc delete"). [1]: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#lifecycle [2]: opencontainers#5182 Reported-by: ningmingxiao <ning.mingxiao@zte.com.cn> Reported-by: Erik Sjölund <erik.sjolund@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
lifubang
left a comment
There was a problem hiding this comment.
LGTM
I think aligning with the OCI runtime spec is the right choice.
|
@rata can you take another look? |
rata
left a comment
There was a problem hiding this comment.
Sorry, I was sick the last days. Left one comment :)
The runtime-spec 1 currently says:
Currently, we do 9 before 8 (heck, even before 6), which is clearly
against the spec and results in issues like the one described in 2.
Let's move running poststart hook to after the user-specified process
has started.
NOTE this patch only fixes the order and does not implement removing
the container when the poststart hook failed (as this part of the spec
is controversial -- destroy et al and should probably be, and currently
are, part of "runc delete").
Fixes: #5182
Fixes: #4347
Closes: #4348