User namespace#3
Conversation
…n, User Initial work by vikaschoudhary16 <vichoudh@redhat.com> on PR 64005. Rebased on Kubernetes 1.17 by Alban Crequy <alban@kinvolk.io>. Conflicts resolution: pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto was moved to staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto
Initial work by vikaschoudhary16 <vichoudh@redhat.com> on PR 64005 from two commits: - Add Node-Level UserNamespace support - Add HostUserNamespace in the pod spec Using annotations instead of changing the PodSpec: alpha.kinvolk.io/userns = enabled | disabled So, no feature gate is necessary
Updated with the command: hack/update-generated-runtime.sh
When the CRI implementation does not support the new gRPC method GetRuntimeConfigInfo, consider that user namespaces is not enabled. Just print a log but don't consider it a fatal error. Otherwise, the Kubelet would crash at startup with the following: > Kubelet failed to change kubelet pod dir ownership to remapped user: > error while determining usernamespace configuration at runtime: failed > to get container runtime info: container runtime info get failed: rpc > error: code = Unimplemented desc = unknown method GetRuntimeConfigInfo > for service runtime.v1alpha2.RuntimeService
Examples of kubectl commands with the associated logs in containerd/cri:
```
kubectl apply -f userns-tests/pod-simple.yaml
Namespace options for sandbox "4a03d62fa399fa6134f020d325716c659e9b55c539ca6a9c76f747e6d08d2258": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE,}
kubectl apply -f userns-tests/pod-userns.yaml
Namespace options for sandbox "a723293f5498dcfa30e8e6d619131172502923ff5dd3cf9b745f84d29d9dc199": &NamespaceOption{Network:POD,Pid:CONTAINER,Ipc:POD,User:NODE_WIDE_REMAPPED,}
```
mauriciovasquezbernal
left a comment
There was a problem hiding this comment.
General comments:
- I think uint32 should we uses in all places to handle uid/gids. We could return
4294967295("no user ID") when there are errors. The mix between int and uint32 could cause some overflow problems. - If the annotation is "enabled" and the feature is not available in the runtime, it's safer to fail than ignore the problem.
I still have to do a more detailed run...
| uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} | ||
| gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} |
There was a problem hiding this comment.
Is it really needed to do uint32(0)?
There was a problem hiding this comment.
I guess not. Probably just to be explicit.
| userns, _ := pod.Annotations["alpha.kinvolk.io/userns"] | ||
| klog.V(4).Infof("pod userns setting: %v", userns) | ||
| if userns == "enabled" { | ||
| if err := kl.chownDirForRemappedIDs(kl.getPodVolumesDir(pod.UID)); err != nil { | ||
| kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedMountVolume, "Unable to set ownership on mount volumes for pod %q: %v", format.Pod(pod), err) | ||
| klog.Errorf("Unable to chown volumes for pod %q: %v; skipping pod", format.Pod(pod), err) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this operation has to be done before L1754, otherwise the container could be started before settings the right permissions on the volumes.
There was a problem hiding this comment.
That sounds right. On the other hand, are the directories (and sub-directory) created before SyncPod()?
Could you comment on the upstream PR 64005? Even though this code is different with labels, the structure is the same in the upstream PR.
| containerUID := 0 | ||
| containerGID := 0 | ||
| uid, err := kl.getHostUID(containerUID) | ||
| if err != nil { | ||
| return fmt.Errorf("Failed to get remapped host UID corresponding to UID 0 in container namespace: %v", err) | ||
| } | ||
| gid, err := kl.getHostGID(containerGID) | ||
| if err != nil { | ||
| return fmt.Errorf("Failed to get remapped host GID corresponding to GID 0 in container namespace: %v", err) | ||
| } |
There was a problem hiding this comment.
I don't understand this logic. Why to calculate it each time if container is always 0 and the result should be the same?
There was a problem hiding this comment.
containerUID is zero (root in the userns) but uid is the mapping on the host (e.g. 100000). So we correctly do a os.Chown(file, 100000, 100000) below.
There was a problem hiding this comment.
What I meant is that the calculation is inside the loop but IIUC it's the same for all the files, so we could do it once.
| if err := kl.chownDirForRemappedIDs(kl.getPodsDir()); err != nil { | ||
| klog.Fatalf("Kubelet failed to change kubelet pod dir ownership to remapped user: %v", err) | ||
| } |
There was a problem hiding this comment.
Is this really needed? I did some tests with these lines commented out and it seems to work fine...
There was a problem hiding this comment.
I thought you had a problem with:
Bad owner of /var/lib/kubelet/pods
I think this code aims to fix that, but with a bug. See comment.
| } | ||
| if fileGID != 0 { | ||
| klog.V(5).Infof("GID, %v, for path %v is not equal to 0. Skipping chowing assuming it to be FsGroup GID ", fileGID, file) | ||
| continue |
There was a problem hiding this comment.
I think this continue is not what we want: If /var/lib/kubelet/pods is owned by 200000 (because of your previous test with docker and /etc/subuid configured with 200000), then this code skips the chown when you give containerd/cri a try with the 100000 configuration, even though the owner is not correct.
| uidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} | ||
| gidMapping := &runtimeapi.LinuxIDMapping{ContainerId: uint32(0)} |
There was a problem hiding this comment.
I guess not. Probably just to be explicit.
| } | ||
| lc.SecurityContext.NamespaceOptions = namespacesForPod(pod) | ||
|
|
||
| if m.runtimeConfig != nil && m.runtimeConfig.IsUserNamespaceSupported() && !userNamespaceDefinedForPod(pod) { |
There was a problem hiding this comment.
I think we should raise an error:
https://github.com/vikaschoudhary16/community/blob/1cfe64b48107575c98cef4cd321e435482c60293/contributors/design-proposals/node/node-usernamespace-remapping.md#pod-admission-at-kubelet
A pod admit handler will be introduced which will fail the pod admission at Kubelet [...]
If pod spec has hostuserNamespace: false AND usernamespace remapping is NOT enabled at container runtime
But I don't know if it is the right place for this check. Shouldn't it be done before?
containerd/cri now implements user namespaces with either runtimeapi.NamespaceMode_POD or runtimeapi.NamespaceMode_NODE.
It can be used together with kinvolk/containerd-cri#1