Skip to content

rootlessutil: remove dead -r/ from nsenter args#4837

Open
MayCXC wants to merge 1 commit intocontainerd:mainfrom
MayCXC:fix/rootless-nsenter-argv0
Open

rootlessutil: remove dead -r/ from nsenter args#4837
MayCXC wants to merge 1 commit intocontainerd:mainfrom
MayCXC:fix/rootless-nsenter-argv0

Conversation

@MayCXC
Copy link
Copy Markdown

@MayCXC MayCXC commented Apr 9, 2026

Summary

The -r/ flag in ParentMain's nsenter args has been a no-op since it was added. It was placed at args[0], which becomes argv[0] (the program name) when passed to syscall.Exec(arg0, args, env). nsenter consumes it as its own name and never parses it as a flag.

This is harmless today, but if someone corrects the argv ordering (e.g. prepending arg0 to the slice), -r/ would start working and break rootless container creation:

  1. nsenter opens the root fd (/) before setns
  2. After entering the mount namespace, fchdir(root_fd) + chroot(\".\") anchors the process to the host root
  3. In rootless mode, the host's /var/lib/containerd is owned by real root (uid 0), which is unmapped in the user namespace (appears as nobody/65534)
  4. Overlay mount lowerdir resolution fails with EACCES because the process cannot traverse the 0700 host directory

This also fixes argv[0] to be arg0 (the nsenter binary path), matching the standard convention.

Verification

Strace comparison before and after, running nerdctl create in rootless mode:

Before (current code, -r/ accidentally in argv[0]):

execve("/usr/bin/nsenter", ["-r/", "-w/home/user", "--preserve-credentials", "-m", "-U", ...], ...)
setns(CLONE_NEWUSER) = 0
setns(CLONE_NEWNS) = 0
fchdir(3) = 0          # -r/ consumed as argv[0], no chroot
execve("nerdctl", ...)
mount("overlay", ...) = 0   # works because no chroot happened

If -r/ were at argv[1] (the latent bug):

setns(CLONE_NEWUSER) = 0
setns(CLONE_NEWNS) = 0
fchdir(3) = 0
chroot(".") = 0       # anchors to host root
fchdir(4) = 0
execve("nerdctl", ...)
mount("overlay", ...) = -1 EACCES   # host /var/lib/containerd inaccessible

After (this PR, -r/ removed, arg0 as argv[0]):

execve("/usr/bin/nsenter", ["/usr/bin/nsenter", "-w/home/user", "--preserve-credentials", "-m", "-U", ...], ...)
setns(CLONE_NEWUSER) = 0
setns(CLONE_NEWNS) = 0
execve("nerdctl", ...)
mount("overlay", ...) = 0   # paths resolve through mount namespace
``` debugger eval code:1:9

The -r/ flag was placed at args[0], which becomes argv[0] (the
program name) when passed to syscall.Exec. nsenter never parsed it
as a flag, so it has been a no-op since it was added.

If -r/ were moved to a proper argv position, it would break rootless
container creation. nsenter opens the root fd before setns, so
chroot anchors path resolution to the host root. In rootless mode,
the host /var/lib/containerd is owned by real root (unmapped in the
user namespace), causing overlay lowerdir resolution to fail with
EACCES during WithAdditionalGIDs.

Remove -r/ entirely rather than fixing its position.

Signed-off-by: Aaron <aaron@omniband.ca>
Copy link
Copy Markdown

@utafrali utafrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct: -r/ was sitting at argv[0] and was consumed by nsenter as its own program name, never parsed as a flag, so its removal is a no-op today and a safety improvement if argv ordering is ever corrected. Setting args[0] = arg0 now properly follows Unix convention. The only minor gap is the loss of the busybox nsenter compatibility comment without any replacement explanation.

args := []string{
"-r/", // root dir (busybox nsenter wants this to be explicitly specified),
}
args := []string{arg0}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed comment mentioned busybox nsenter compatibility with -r/. Now that the flag is intentionally absent, a short note here explaining why it is omitted would help future readers avoid re-adding it. For example:

// Note: -r/ (root dir) is intentionally omitted. In rootless mode, chrooting to
// the host root before setns would anchor the process to host paths that are
// inaccessible inside the user namespace, breaking overlay mounts.
args := []string{arg0}

@AkihiroSuda AkihiroSuda added this to the v2.3.0 milestone Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants