Skip to content

Move poststart hook from runc create to runc start#5186

Merged
rata merged 3 commits intoopencontainers:mainfrom
kolyshkin:poststart
Apr 8, 2026
Merged

Move poststart hook from runc create to runc start#5186
rata merged 3 commits intoopencontainers:mainfrom
kolyshkin:poststart

Conversation

@kolyshkin
Copy link
Copy Markdown
Contributor

The runtime-spec 1 currently says:

  1. Runtime's start command is invoked with the unique identifier of
    the container.
  2. The startContainer hooks MUST be invoked by the runtime. If any
    startContainer hook fails, the runtime MUST generate an error, stop
    the container, and continue the lifecycle at step 12.
  3. The runtime MUST run the user-specified program, as specified by
    process.
  4. The poststart hooks MUST be invoked by the runtime. If any
    poststart hook fails, the runtime MUST generate an error, stop the
    container, and continue the lifecycle at step 12.
    ...
  5. Runtime's delete command is invoked with the unique identifier of
    the container.
  6. The container MUST be destroyed by undoing the steps performed
    during create phase (step 2).
  7. The poststop hooks MUST be invoked by the runtime. If any poststop
    hook fails, the runtime MUST log a warning, but the remaining hooks
    and lifecycle continue as if the hook had succeeded.

Currently, we do 9 before 8 (heck, even before 6), which is clearly
against the spec and results in issues like the one described in 2.

Let's move running poststart hook to after the user-specified process
has started.

NOTE this patch only fixes the order and does not implement removing
the container when the poststart hook failed (as this part of the spec
is controversial -- destroy et al and should probably be, and currently
are, part of "runc delete").

Fixes: #5182
Fixes: #4347
Closes: #4348

@kolyshkin
Copy link
Copy Markdown
Contributor Author

@ningmingxiao @eriksjolund PTAL

@kolyshkin kolyshkin requested review from cyphar, lifubang and rata March 19, 2026 23:08
@kolyshkin kolyshkin added this to the 1.6.0 milestone Mar 19, 2026
@kolyshkin kolyshkin force-pushed the poststart branch 2 times, most recently from 0aee5a8 to 2d88d2e Compare March 19, 2026 23:12
@rata
Copy link
Copy Markdown
Member

rata commented Mar 20, 2026

We will have to revert this if it breaks anyone. But I don't think hooks are so used (they were mostly unexported to upper layers), so hopefully we can do this and no one notices?

@cyphar
Copy link
Copy Markdown
Member

cyphar commented Mar 20, 2026

I thought we decided to just keep the broken poststart hook as is and push people to use the new createRuntime and friends?

EDIT: I think the issue is that opencontainers/runtime-spec#1169 missed a few changes when changing the explanation of prestart to match what runc does in practice.

@cyphar
Copy link
Copy Markdown
Member

cyphar commented Mar 20, 2026

I suspect the issue is we just need to update the lifecycle doc in the spec, since the description of prestart says:

The prestart hooks MUST be called as part of the create operation after the runtime environment has been created (according to the configuration in config.json) but before the pivot_root or any equivalent operation has been executed.

@kolyshkin
Copy link
Copy Markdown
Contributor Author

@cyphar this one is about poststart only. Only prestart is deprecated in the spec (in favor of createContainer and friends).

Currently runc implements poststart hook as part of runc create, while both crun and youki implement it as part of start (i.e. per spec).

I feel we need to fix the discrepancy in one way or another.

@eriksjolund
Copy link
Copy Markdown
Contributor

@ningmingxiao @eriksjolund PTAL

@kolyshkin I built runc from your branch poststart (git commit 2d88d2e).
Then I tried the reproducer in #5182 .
The file /tmp/poststop-created was created.
In other words, it seems this PR fixes #5182 .

@rata
Copy link
Copy Markdown
Member

rata commented Mar 23, 2026

@kad this is the PR we were talking yesterday, can you please take a look?

I'd like to the @kad input before we merge, hooks are very much used by GPUs.

@ningmingxiao
Copy link
Copy Markdown
Contributor

ningmingxiao commented Mar 25, 2026

can you add Co-authored add me @kolyshkin

@kad
Copy link
Copy Markdown

kad commented Mar 25, 2026

/cc @elezar : Evan, would it break any of the usages in nvidia container toolkit?

@elezar
Copy link
Copy Markdown

elezar commented Mar 25, 2026

The NVIDIA tooling uses only createContainer hooks in the CDI-based approach and the precreate hook in the legacy stack. Changing poststart behaviour should not affect this.

Copy link
Copy Markdown

@halaney halaney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a good bit of OCI hooks, but don't use this one (anymore, and digging thru the history it seems that's actually due to this bug I think).

This seems to make sense to me and matches the spec as I read it.

@kolyshkin
Copy link
Copy Markdown
Contributor Author

can you add Co-authored add me @kolyshkin

I haven't actually used any of your code here (although I've reviewed it). In any case, thank you for your contribution! I have acknowledged it by adding the Reported-by: tag to the last commit, hope it's all right.

Copy link
Copy Markdown
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little scared of this change, does this enables something that was not possible before?

But if the consensus is to do the change, as most big users seem are fine with this, I'm fine. We can always revert. LGTM

@thaJeztah
Copy link
Copy Markdown
Member

cc @robmry would this address moby/moby#51077 ?

@rata
Copy link
Copy Markdown
Member

rata commented Mar 31, 2026

The fedora failure seemed unrelated, re-running

@robmry
Copy link
Copy Markdown

robmry commented Mar 31, 2026

cc @robmry would this address moby/moby#51077 ?

I think so, yes.

Copy link
Copy Markdown
Member

@lifubang lifubang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should address the race condition for runc run here:
#5186 (comment)

Copy link
Copy Markdown
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found the race, posted an explanation and fixes. @lifubang @kolyshkin PTAL

defer func() {
if retErr != nil {
// A poststart hook failed; kill the container.
if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) {
if err := ignoreTerminateErrors(parent.terminate()); err != nil {

This line is causing the race that @lifubang detected. With the suggestion here, I can't trigger it anymore. It is what the old code was doing (I didn't realize at first that this was changed! 😱 ). The key is that parent.terminate() does a kill AND a wait.

Doing this change alone is enough to fix the race. But there is another issue, that I think we should also fix.

The call to currentOCIState() has the side effect of changing the state to running. So the following happens:

func (r *runner) run(config *specs.Process) from util_linux.go
 --> r.container.Run(process)
       --> func (c *Container) Run(process *Process) error
             ---> return c.exec()  // This returns the post-start error with PR         

But to get there, we called currentOCIState() that ends up calling refreshState(). Because we already deleted the fifo (we call handle fifo just before the post-start hook, that calls the handleFifoResult that deletes the fifo), that can end up setting the state to running.

That is a complicated (or detailed, you choose :D) way to say that calling currentOCIState() ends up setting the state to running. And that causes issues with Destroy().

The thing is, the outer most run function aforementioned (the one that takes configs as param), has the r.destroy() defer in case of errors (that we do hit an error here). And destroy when we are in the running state returns an error if we still have an init.

Sooo, I think we should:
a) Keep the code as it was using p.terminate(), so we kill and wait for the process
b) Don't call currentOCIState() that causes the state machine to be inconsistent, probably just set the status to running manually (having a quick look, this seems to be what crun also does for these hooks).

So something like this:

diff --git libcontainer/container_linux.go libcontainer/container_linux.go
index adddebbf..2f7186e2 100644
--- libcontainer/container_linux.go
+++ libcontainer/container_linux.go
@@ -268,15 +268,24 @@ func (c *Container) postStart() (retErr error) {
        defer func() {
                if retErr != nil {
                        // A poststart hook failed; kill the container.
-                       if err := c.signal(unix.SIGKILL); err != nil && !errors.Is(err, ErrNotRunning) {
+                       if err := ignoreTerminateErrors(c.initProcess.terminate()); err != nil {
                                logrus.WithError(err).Warn("kill after failed poststart")
                        }
                }
        }()
 
-       s, err := c.currentOCIState()
-       if err != nil {
-               return err
+       // Build the OCI state directly rather than calling currentOCIState(),
+       // which would trigger refreshState() and transition c.state to
+       // "running". That transition causes Destroy() to fail if the init
+       // process is still alive when the poststart hook fails.
+       bundle, annotations := utils.Annotations(c.config.Labels)
+       s := &specs.State{
+               Version:     specs.Version,
+               ID:          c.ID(),
+               Bundle:      bundle,
+               Annotations: annotations,
+               Status:      specs.StateRunning,
+               Pid:         c.initProcess.pid(),
        }
        return c.config.Hooks.Run(configs.Poststart, s)
 }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the c.initProcess.terminate() (killing init) won't work when a container don't have own pidns (the case described and handled in (*container).signal.

OTOH I see that missing wait is the issue in the current implementation because we're still init's parent / child reaper. Wonder if adding c.initProcess.wait() will fix both issues (one with the state as well, because after wait the init is gone) without introducing some other bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the updated commit; @rata @lifubang PTAL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test results seem stable now.

Rename c.signal to c.signalInit, and add c.signal which is a lock-less
form of c.Signal.

To be used by the next patch.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
No functional change. To be used by the next patch.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The runtime-spec [1] currently says:

> 6. Runtime's start command is invoked with the unique identifier of
>    the container.
> 7. The startContainer hooks MUST be invoked by the runtime. If any
>    startContainer hook fails, the runtime MUST generate an error, stop
>    the container, and continue the lifecycle at step 12.
> 8. The runtime MUST run the user-specified program, as specified by
>    process.
> 9. The poststart hooks MUST be invoked by the runtime. If any
>    poststart hook fails, the runtime MUST generate an error, stop the
>    container, and continue the lifecycle at step 12.
> ...
> 11. Runtime's delete command is invoked with the unique identifier of
>     the container.
> 12. The container MUST be destroyed by undoing the steps performed
>     during create phase (step 2).
> 13. The poststop hooks MUST be invoked by the runtime. If any poststop
>     hook fails, the runtime MUST log a warning, but the remaining hooks
>     and lifecycle continue as if the hook had succeeded.

Currently, we do 9 before 8 (heck, even before 6), which is clearly
against the spec and results in issues like the one described in [2].

Let's move running poststart hook to after the user-specified process
has started.

NOTE this patch only fixes the order and does not implement removing
the container when the poststart hook failed (as this part of the spec
is controversial -- destroy et al and should probably be, and currently
are, part of "runc delete").

[1]: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#lifecycle
[2]: opencontainers#5182

Reported-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Reported-by: Erik Sjölund <erik.sjolund@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Copy link
Copy Markdown
Member

@lifubang lifubang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I think aligning with the OCI runtime spec is the right choice.

@kolyshkin
Copy link
Copy Markdown
Contributor Author

@rata can you take another look?

Copy link
Copy Markdown
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was sick the last days. Left one comment :)

Copy link
Copy Markdown
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@rata rata merged commit 4c8d72d into opencontainers:main Apr 8, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

poststop hook is not run if poststart hook fails runc's poststart behaviour doesn't match the runtime-spec