Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/issue-106-passthrough-verify.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
bump: patch
---

dind-box: verify host-image passthrough actually seeded the nested daemon, and stop falsely reporting success when it did not (issue #106).

box-dind kept re-downloading multi-GB host images (~30 GB, ~1 hour) on first nested `docker run` even with `DIND_HOST_PASSTHROUGH_IMAGES` set, while the entrypoint still printed `image preload/passthrough complete` — so a misconfigured deployment (forgotten `-v /var/run/docker.sock:…:ro` mount, host missing that exact ref, or the `mode=public` filter dropping a locally-built/private image) looked healthy right up until the slow re-pull. This is the recurring symptom behind closed issues #94 and #102.

The entrypoint now verifies the copy after passthrough: for every **concrete** allowlist entry (explicit tag or `@sha256:` digest — bare repos and globs are skipped to avoid false alarms) it runs `docker image inspect` against the nested daemon. If an expected image is absent it emits a loud, actionable warning (whether the host socket is reachable but lacks the ref / was filtered by the mode, or no usable socket is mounted) and the completion line becomes `image preload/passthrough finished WITH WARNINGS` instead of the misleading `complete`. No silent no-op path can report success anymore. Re-pull still happens naturally — we do not auto-pull, which would mask the config error and incur the same multi-GB download.

Covered by new cases in `experiments/preload-unit-test.sh` and new `verify_ok`/`verify_miss` assertions in the CI-run `tests/dind/example-preload-images.sh`; deployment wiring and verification behavior documented in `docs/dind/USAGE.md` and `README.md`.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ Each row below has the same toolchain as its non-dind sibling **plus** a working
> - **Recommended secure invocation:** [`docker run --runtime=sysbox-runc konard/box-dind`](https://github.com/nestybox/sysbox) — Sysbox is a drop-in OCI runtime that runs system containers without `--privileged` and without exposing host devices.
> - **Do NOT bind-mount `/var/run/docker.sock`.** That gives the container root on the host ([Quarkslab](https://blog.quarkslab.com/why-is-exposing-the-docker-socket-a-really-bad-idea.html), [OWASP](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html)) and breaks the per-box `docker ps` scoping property.
> - **Storage:** the inner daemon writes to `/var/lib/docker` inside the container by default. For persistence, mount a volume: `-v box-dind-data:/var/lib/docker`.
> - **Reusing host images:** the nested daemon starts with an empty image store, so a fresh container re-downloads images the host already has. Seed it explicitly at startup with `DIND_PRELOAD_TARBALL` (mount `docker save` tarballs) or `DIND_PRELOAD_IMAGES` (pull from a registry/mirror); see [Reusing Host Images](docs/dind/USAGE.md#reusing-host-images-preload). For automatic seeding, mount the host socket at `-v /var/run/docker.sock:/var/run/host-docker.sock:ro` — host-image passthrough is on by default and copies the host's **public** images (those re-pullable from a public registry, so no local secrets or private credentials leak) into the inner daemon; `DIND_HOST_PASSTHROUGH=all` passes everything and `=off` disables it. To copy only specific images rather than every public one, set `DIND_HOST_PASSTHROUGH_IMAGES` to a space-separated allowlist of names/globs (e.g. `"konard/hive-mind konard/hive-mind-dind"`), composed with the mode filter. The host socket is mounted at a non-default path and read only at startup to seed images, so the inner daemon keeps its own isolated socket. See [Host-Image Passthrough](docs/dind/USAGE.md#host-image-passthrough-dind_host_passthrough).
> - **Reusing host images:** the nested daemon starts with an empty image store, so a fresh container re-downloads images the host already has. Seed it explicitly at startup with `DIND_PRELOAD_TARBALL` (mount `docker save` tarballs) or `DIND_PRELOAD_IMAGES` (pull from a registry/mirror); see [Reusing Host Images](docs/dind/USAGE.md#reusing-host-images-preload). For automatic seeding, mount the host socket at `-v /var/run/docker.sock:/var/run/host-docker.sock:ro` — host-image passthrough is on by default and copies the host's **public** images (those re-pullable from a public registry, so no local secrets or private credentials leak) into the inner daemon; `DIND_HOST_PASSTHROUGH=all` passes everything and `=off` disables it. To copy only specific images rather than every public one, set `DIND_HOST_PASSTHROUGH_IMAGES` to a space-separated allowlist of names/globs (e.g. `"konard/hive-mind konard/hive-mind-dind"`), composed with the mode filter. Pin a concrete tag/digest (e.g. `konard/hive-mind-dind:2.0.6`) and the entrypoint verifies the image actually landed in the nested daemon after passthrough — if it did not (forgotten socket mount, host missing that ref, or the mode filter dropped it) it warns loudly instead of falsely reporting "complete", so you are not surprised by a multi-GB re-pull on first run (issue #106). The host socket is mounted at a non-default path and read only at startup to seed images, so the inner daemon keeps its own isolated socket. See [Host-Image Passthrough](docs/dind/USAGE.md#host-image-passthrough-dind_host_passthrough).
> - **Usage examples:** see [`docs/dind/USAGE.md`](docs/dind/USAGE.md). Its examples are backed by executable tests under `tests/dind/`.

See [docs/case-studies/issue-80/CASE-STUDY.md](docs/case-studies/issue-80/CASE-STUDY.md) for the full design and threat model.
Expand Down
31 changes: 30 additions & 1 deletion docs/dind/USAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ The entrypoint supports these environment variables:
| `DIND_HOST_PASSTHROUGH` | `public` | Copy images already present on the host into the nested daemon at startup when a host socket is mounted (see below). `public` only passes images with a RepoDigest from an allowlisted public registry; `all` passes every tagged image; `off` disables it. A quiet no-op when no host socket is mounted. |
| `DIND_HOST_DOCKER_SOCK` | `/var/run/host-docker.sock` | Path inside the container to the mounted *host* Docker socket used for passthrough. Deliberately **not** `/var/run/docker.sock`, so the inner daemon keeps its own isolated socket. |
| `DIND_HOST_PASSTHROUGH_REGISTRIES` | common public registries | Space-separated allowlist of registries treated as "public" in `DIND_HOST_PASSTHROUGH=public` mode (default: `docker.io ghcr.io quay.io gcr.io registry.k8s.io public.ecr.aws mcr.microsoft.com`). |
| `DIND_HOST_PASSTHROUGH_IMAGES` | _(empty)_ | Space-separated allowlist of image references / globs. When non-empty, only host images matching at least one entry are passed through, composed with the mode filter (so `public` still requires a public RepoDigest). Empty keeps the mode + registry filter only. One level finer than `DIND_HOST_PASSTHROUGH_REGISTRIES` — scope to specific repositories / image names. |
| `DIND_HOST_PASSTHROUGH_IMAGES` | _(empty)_ | Space-separated allowlist of image references / globs. When non-empty, only host images matching at least one entry are passed through, composed with the mode filter (so `public` still requires a public RepoDigest). Empty keeps the mode + registry filter only. One level finer than `DIND_HOST_PASSTHROUGH_REGISTRIES` — scope to specific repositories / image names. Each concrete entry (explicit tag/digest) is verified present in the nested daemon after passthrough; a missing one warns loudly instead of falsely reporting "complete" (issue #106). |

Use a named volume when the inner Docker state should survive container removal:

Expand Down Expand Up @@ -271,6 +271,35 @@ the nested daemon will otherwise re-pull from the registry on the first
`docker run` with no hint as to why (issue #102). Plain `box-dind` containers
that never set an allowlist still see no extra noise when no socket is mounted.

### Verifying the copy actually happened (`issue #106`)

A warning about a forgotten mount only covers one failure mode. Passthrough can
also quietly seed *nothing* for other reasons — the host does not have the image
under that exact reference, the socket is present but unreachable, or `public`
mode filtered out a locally-built image (no RepoDigest). In every case the
entrypoint used to print `image preload/passthrough complete` regardless, and
the first nested `docker run` then silently re-pulled the multi-GB image from the
registry (~30 GB, ~1 h downstream — `link-assistant/hive-mind#1914`/`#1946`).

So after passthrough runs, each **concrete** `DIND_HOST_PASSTHROUGH_IMAGES`
entry — one with an explicit tag or digest, no glob — is verified to actually be
present in the nested daemon (`docker image inspect <ref>`). When one is
missing, the entrypoint:

- emits a loud, actionable warning naming the un-seeded image(s) and the likely
cause (missing/unreachable socket, host lacks that exact ref, or the mode
filter dropped it — with the `DIND_HOST_PASSTHROUGH=all` remedy for
locally-built/private images), and
- ends the phase with `image preload/passthrough finished WITH WARNINGS`
instead of the misleading `...complete`, so logs never claim success when
nothing was copied.

Bare repositories (`konard/hive-mind`) and globs (`konard/hive-mind*`) are not
concrete — the host may hold them under any tag — so they are not individually
verified and never trigger a false alarm. To get this assertion for a specific
image, pin it in the allowlist with an explicit tag or digest, e.g.
`DIND_HOST_PASSTHROUGH_IMAGES=konard/hive-mind-dind:2.0.6`.

## Commit Cycles

`DIND_SKIP_DAEMON=1` is useful for setup containers where you want to install or
Expand Down
96 changes: 82 additions & 14 deletions experiments/preload-unit-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,24 @@ case "$1" in
[ "$host" = "1" ] && cat "$HOST_IMAGES" 2>/dev/null
exit 0 ;;
load)
cat >/dev/null 2>&1 || true # drain the piped tar stream like real `docker load`
echo "loaded" >> "$DOCKER_LOADED"; exit 0 ;;
echo "loaded" >> "$DOCKER_LOADED"
if [ "${2:-}" = "-i" ]; then
# Tarball load (`docker load -i file`): nothing is piped and the mock has
# no way to know which refs the tarball carried, so just record the load.
exit 0
fi
# Piped load (`docker -H .. save <ref> | docker load`): our mock `save`
# encodes the ref as a "REF:<ref>" line, so the loaded image becomes present
# in the inner daemon — mirroring real `docker load` so post-load
# verification (issue #106) sees what was actually seeded.
while IFS= read -r line; do
case "$line" in REF:*) printf '%s\n' "${line#REF:}" >> "$DOCKER_PRESENT" ;; esac
done
exit 0 ;;
save)
# `docker -H .. save <ref>` streams a tarball; mark it saved.
echo "$2" >> "$DOCKER_SAVED"; echo "fake-tar-stream"; exit 0 ;;
# `docker -H .. save <ref>` streams a tarball; mark it saved and encode the
# ref so the piped `docker load` can mark it present (see above).
echo "$2" >> "$DOCKER_SAVED"; printf 'REF:%s\n' "$2"; exit 0 ;;
pull)
echo "$2" >> "$DOCKER_PULLED"; echo "$2" >> "$DOCKER_PRESENT"; exit 0 ;;
*) exit 0 ;;
Expand All @@ -61,6 +74,11 @@ export DOCKER_PRESENT="$WORK/present.log"
export DOCKER_SAVED="$WORK/saved.log"
export HOST_IMAGES="$WORK/host-images.log"
export HOST_DIGESTS="$WORK/host-digests.log"
# Captured entrypoint stdout/stderr for the issue #106 verification cases.
# Exported so the `bash -c '! grep ...'` negative checks resolve the path inside
# their subshell (an unexported $WORK would silently miss the file).
export OUT_LOG="$WORK/out.log"
export ERR_LOG="$WORK/err.log"

# --- Source the real entrypoint for its functions only ---
# shellcheck disable=SC1090
Expand Down Expand Up @@ -138,17 +156,17 @@ check "no docker calls at all" bash -c '! test -s "$DOCKER_CALLS"'
echo "== Case 7: missing tarball path warns, no load =="
reset_state
DIND_HOST_PASSTHROUGH=off DIND_PRELOAD_TARBALL="$WORK/does-not-exist.tar" DIND_PRELOAD_IMAGES="" \
DOCKER_INFO_OK=1 preload_into_daemon 2>"$WORK/err.log"
DOCKER_INFO_OK=1 preload_into_daemon 2>"$ERR_LOG"
check "no load for missing path" bash -c '! grep -q "load -i" "$DOCKER_CALLS"'
check "warning emitted for missing path" grep -q "does not exist" "$WORK/err.log"
check "warning emitted for missing path" grep -q "does not exist" "$ERR_LOG"

echo "== Case 8: passthrough is a quiet no-op when no host socket is mounted =="
reset_state
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$WORK/absent.sock" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 \
preload_into_daemon 2>"$WORK/err.log"
preload_into_daemon 2>"$ERR_LOG"
check "no host save attempted without a socket" bash -c '! test -s "$DOCKER_SAVED"'
check "no warning emitted when socket simply absent" bash -c '! test -s "$WORK/err.log"'
check "no warning emitted when socket simply absent" bash -c '! test -s "$ERR_LOG"'

echo "== Case 8b: explicit allowlist + absent socket warns about the missing mount (issue #102) =="
reset_state
Expand All @@ -158,10 +176,10 @@ reset_state
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$WORK/absent.sock" \
DIND_HOST_PASSTHROUGH_IMAGES="hello-world" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 \
preload_into_daemon 2>"$WORK/err.log"
preload_into_daemon 2>"$ERR_LOG"
check "no host save attempted without a socket" bash -c '! test -s "$DOCKER_SAVED"'
check "warning names DIND_HOST_PASSTHROUGH_IMAGES" grep -q "DIND_HOST_PASSTHROUGH_IMAGES is set" "$WORK/err.log"
check "warning suggests the -v mount remediation" grep -q -- "-v /var/run/docker.sock:" "$WORK/err.log"
check "warning names DIND_HOST_PASSTHROUGH_IMAGES" grep -q "DIND_HOST_PASSTHROUGH_IMAGES is set" "$ERR_LOG"
check "warning suggests the -v mount remediation" grep -q -- "-v /var/run/docker.sock:" "$ERR_LOG"

echo "== Case 8c: present-but-unreachable socket still wins over the allowlist warning =="
reset_state
Expand All @@ -172,9 +190,9 @@ touch "$WORK/dead.sock"
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$WORK/dead.sock" \
DIND_HOST_PASSTHROUGH_IMAGES="hello-world" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 HOST_DOCKER_OK=0 \
preload_into_daemon 2>"$WORK/err.log"
check "unreachable-socket warning fires" grep -q "is not accessible; skipping passthrough" "$WORK/err.log"
check "missing-mount hint suppressed when a socket file exists" bash -c '! grep -q "DIND_HOST_PASSTHROUGH_IMAGES is set" "$WORK/err.log"'
preload_into_daemon 2>"$ERR_LOG"
check "unreachable-socket warning fires" grep -q "is not accessible; skipping passthrough" "$ERR_LOG"
check "missing-mount hint suppressed when a socket file exists" bash -c '! grep -q "DIND_HOST_PASSTHROUGH_IMAGES is set" "$ERR_LOG"'
rm -f "$WORK/dead.sock"

echo "== Case 9: public mode copies a Docker Hub image, skips a local one =="
Expand Down Expand Up @@ -294,6 +312,56 @@ check "empty allowlist still saves hive-mind" grep -qx "konard/hive-mind:latest"
check "empty allowlist still saves alpine" grep -qx "alpine:3.20" "$DOCKER_SAVED"
rm -f "$HOST_SOCK"

echo "== Case 19: concrete allowlisted image present after passthrough -> honest 'complete' (issue #106) =="
reset_state
# Host has the named image with a public RepoDigest; the socket is mounted, so
# passthrough copies it and the mock `load` marks it present in the inner daemon.
printf '%s\n' "konard/hive-mind-dind:2.0.6" > "$HOST_IMAGES"
echo "konard/hive-mind-dind:2.0.6|konard/hive-mind-dind@sha256:aaa " > "$HOST_DIGESTS"
make_sock "$HOST_SOCK"
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$HOST_SOCK" \
DIND_HOST_PASSTHROUGH_IMAGES="konard/hive-mind-dind:2.0.6" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 HOST_DOCKER_OK=1 \
preload_into_daemon >"$OUT_LOG" 2>"$ERR_LOG"
check "seeded concrete image was saved from host" grep -qx "konard/hive-mind-dind:2.0.6" "$DOCKER_SAVED"
check "honest 'complete' marker printed" grep -q "image preload/passthrough complete" "$OUT_LOG"
check "no verification warning when present" bash -c '! grep -q "did NOT seed" "$ERR_LOG"'
check "no 'WITH WARNINGS' marker when present" bash -c '! grep -q "finished WITH WARNINGS" "$OUT_LOG" "$ERR_LOG"'
rm -f "$HOST_SOCK"

echo "== Case 20: concrete allowlisted image absent -> loud warning, no false 'complete' (issue #106) =="
reset_state
# No host socket mounted, so nothing can be copied. The named concrete image is
# absent from the inner daemon: verification must catch it and suppress 'complete'.
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$WORK/absent.sock" \
DIND_HOST_PASSTHROUGH_IMAGES="konard/hive-mind-dind:2.0.6" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 \
preload_into_daemon >"$OUT_LOG" 2>"$ERR_LOG"
check "verification warns it did NOT seed the image" grep -q "did NOT seed expected image(s) into the nested daemon: konard/hive-mind-dind:2.0.6" "$ERR_LOG"
check "warning points at the missing -v mount" grep -q -- "-v /var/run/docker.sock:" "$ERR_LOG"
check "terminal marker is 'finished WITH WARNINGS'" grep -q "image preload/passthrough finished WITH WARNINGS" "$ERR_LOG"
check "misleading 'complete' is NOT printed" bash -c '! grep -q "image preload/passthrough complete" "$OUT_LOG" "$ERR_LOG"'

echo "== Case 21: glob / bare-repo allowlist entries never raise a false verification alarm (issue #106) =="
reset_state
# Neither a glob nor a bare repository is concrete, so verification must skip
# them (the host could hold any tag) and still report an honest 'complete'.
DIND_HOST_PASSTHROUGH=public DIND_HOST_DOCKER_SOCK="$WORK/absent.sock" \
DIND_HOST_PASSTHROUGH_IMAGES="konard/hive-mind* konard/other" \
DIND_PRELOAD_TARBALL="" DIND_PRELOAD_IMAGES="" DOCKER_INFO_OK=1 \
preload_into_daemon >"$OUT_LOG" 2>"$ERR_LOG"
check "no verification warning for non-concrete entries" bash -c '! grep -q "did NOT seed" "$ERR_LOG"'
check "honest 'complete' still printed" grep -q "image preload/passthrough complete" "$OUT_LOG"

echo "== Case 22: ref_is_concrete classification (direct calls) =="
reset_state
check "explicit tag is concrete" eval 'ref_is_concrete "konard/hive-mind-dind:2.0.6"'
check "explicit digest is concrete" eval 'ref_is_concrete "konard/hive-mind-dind@sha256:abc"'
check "bare repo is NOT concrete" eval '! ref_is_concrete "konard/hive-mind"'
check "glob is NOT concrete" eval '! ref_is_concrete "konard/hive-mind*"'
check "registry port w/o tag NOT concrete" eval '! ref_is_concrete "registry.example.com:5000/repo"'
check "registry port WITH tag is concrete" eval 'ref_is_concrete "registry.example.com:5000/repo:v1"'

echo "== Case 18: image-matching helper normalization (direct calls) =="
reset_state
# Like Case 13, drive the sourced helper in the current shell via `eval` so the
Expand Down
Loading