Skip to content

feat(kubernetes): add sidecar supervisor topology#2076

Open
TaylorMutch wants to merge 6 commits into
mainfrom
feat/kubernetes-sidecar-topology-v2
Open

feat(kubernetes): add sidecar supervisor topology#2076
TaylorMutch wants to merge 6 commits into
mainfrom
feat/kubernetes-sidecar-topology-v2

Conversation

@TaylorMutch

@TaylorMutch TaylorMutch commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the Kubernetes sidecar supervisor topology after the combined-topology base from #2074.
The default combined topology remains unchanged, and sidecar is opt-in.

sidecar moves pod-level network enforcement and gateway forwarding into a dedicated supervisor sidecar. The agent container can run as the resolved sandbox UID/GID with runAsNonRoot, no privilege escalation, and all Linux capabilities dropped.

This draft intentionally includes the prerequisite UID/GID commits from #1973 for now. We expect to rebase those out after that work lands separately.

Runtime validation status:

  • sidecar is experimental with Kata Containers.
  • sidecar is known to fail with gVisor because it depends on pod-local network rule setup.

Sidecar mode preserves gateway session and SSH behavior, but intentionally runs the process supervisor in network-only mode. Filesystem policy, process privilege dropping, and process/binary identity checks are not applied in this mode.

Related Issue

References #1827, #981, #899, #1305.

Related PRs: #1973, #2074, #2016.

Changes

  • Accept numeric sandbox process identities and propagate configurable sandbox UID/GID values.
  • Resolve Kubernetes sandbox UID/GID from explicit config or OpenShift SCC annotations.
  • Add the sidecar supervisor topology and the related processEnforcement and proxyUid configuration.
  • Render sidecar-mode sandbox pods with a network init container, non-root supervisor sidecar, and unprivileged agent container.
  • Add process-supervisor network-only behavior for sidecar mode while keeping SSH/session relay behavior intact.
  • Add sidecar e2e Helm values and Skaffold profile support.
  • Document topology choice, architecture diagrams, permission model, RuntimeClass validation status, and network-only tradeoffs.

Testing

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

sjenning and others added 5 commits June 30, 2026 15:09
Allow run_as_user and run_as_group to be either the literal 'sandbox'
or a numeric UID/GID within [1000, 2_000_000_000]. This removes the
hard dependency on a baked-in 'sandbox' user in container images,
enabling compute drivers to inject resolved UIDs at sandbox creation.

Phase 1 of #1959.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Allow run_as_user and run_as_group to be numeric UIDs/GIDs, removing
the hard dependency on a baked-in 'sandbox' user in container images.

Changes:
- validate_sandbox_user(): accepts numeric UIDs without passwd lookup
  (logs OCSF event); keeps passwd check for "sandbox" name; rejects
  non-numeric non-sandbox strings that fail passwd lookup
- prepare_filesystem(): passes numeric UIDs/GIDs directly to chown()
  instead of requiring a passwd entry
- drop_privileges(): resolves numeric UIDs/GIDs directly via UID::from_raw
  / Gid::from_raw; skips initgroups when target uid matches current euid;
  uses guard conditions before setgid/setuid calls
- session_user_and_home(): falls back to ("{uid}", "/sandbox") for
  numeric UIDs, avoiding a passwd lookup that will fail

Re-exports MIN_SANDBOX_UID and MAX_SANDBOX_UID from openshell-policy
so callers have consistent range constants.

Phase 2 of #1959.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
…hift SCC annotations

Phase 3 of the numeric-UID plan: allow operators to specify explicit
sandbox_uid/sandbox_gid in Kubernetes driver config, auto-detect from
OpenShift SCC namespace annotations, and propagate resolved values to
supervisor container env vars and PVC init container securityContext.

Changes:
- Add sandbox_uid/sandbox_gid fields to KubernetesComputeConfig
- Add SANDBOX_UID/SANDBOX_GID env var constants to openshell-core
- Implement resolve_sandbox_identity() to fetch namespace annotations
  and auto-detect OpenShift SCC UID ranges (sa.scc.uid-range)
- Pass resolved UID/GID through SandboxPodParams to pod spec builder
- Inject SANDBOX_UID/SANDBOX_GID env vars into supervisor container
- Update PVC init container securityContext with resolved UID/GID
  instead of hard-coded root
- Add comprehensive unit tests for resolution logic and annotation
  parsing (resolve_sandbox_uid, resolve_sandbox_gid, OpenShift SCC
  annotation parsing)

Signed-off-by: Seth Jennings <sjenning@redhat.com>
…mples

Phase 4 of the numeric-UID plan: replace hardcoded SANDBOX_UID (10001)
in VM rootfs preparation with configurable sandbox_uid/sandbox_gid fields.

Changes:
- Add sandbox_uid/sandbox_gid to VmDriverConfig with serde derives
- Pass resolved UID/GID through prepare_sandbox_rootfs_from_image_root
  to ensure_sandbox_guest_user which writes /etc/passwd/group/gshadow
- Update BYOC Dockerfile: remove groupadd/useradd, document runtime UID
  injection and the ability to skip baked-in sandbox user
- Update gateway-config.mdx: document sandbox_uid/sandbox_gid for both
  Kubernetes (with OpenShift SCC autodetection) and VM drivers
- Update sandbox-compute-drivers.mdx: add Sandbox User Identity section
  explaining numeric UID support across all compute drivers
- Update rootfs tests to use non-default UIDs, verify config passthrough

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Add the Kubernetes sidecar supervisor topology, including the supervisor sidecar, network init container, low-permission agent shape, proxy UID configuration, and process enforcement mode selection.

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown

Signed-off-by: Taylor Mutch <taylormutch@gmail.com>
@TaylorMutch TaylorMutch added the test:e2e Requires end-to-end coverage label Jun 30, 2026
@github-actions

Copy link
Copy Markdown

Label test:e2e applied, but pull-request/2076 is at {"messa while the PR head is 6276de3. A maintainer needs to comment /ok to test 6276de38ac92b65b0b24a68b8d09a5eed06f50d7 to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@TaylorMutch TaylorMutch marked this pull request as ready for review June 30, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants