Ship static busybox shell in gpu-operator image#2434
Conversation
19fd65d to
14f5202
Compare
14f5202 to
20e9691
Compare
20e9691 to
9e3efb2
Compare
9e3efb2 to
448be34
Compare
The gpu-operator mounts a ConfigMap-backed `entrypoint.sh` into the nvidia-mig-manager container today: it waits for the driver-ready file, sources it as KEY=value env, derives `WITH_SHUTDOWN_HOST_GPU_CLIENTS=$IS_HOST_DRIVER`, and execs `nvidia-mig-manager`. That script requires a shell in the container image, which is currently provided by the `-dev` distroless variant via a busybox `/bin/sh` symlink. NVIDIA STIG policy is dropping `-dev` distroless as approved parent images, so the shell has to go — and that means the entrypoint logic has to live in the binary. Move startup hooks into `nvidia-mig-manager` itself. A new `internal/startup` package provides `WaitForFile` (polls `os.Stat`) and `SourceEnvFile` (parses `KEY=value` lines with quote and comment handling, calls `os.Setenv`). `main()` runs the hooks before `cli.App.Run` parses flags, so any env vars sourced from `driver-ready` are visible to the `EnvVars:` declarations on each cli.Flag. The hooks are opt-in via env vars: - `WAIT_FOR_DRIVER_READY=<path>` — block on the file's existence - `DRIVER_ENV_FILE=<path>` — source KEY=value into the process env - `WAIT_FOR_FILE_INTERVAL=<duration>` — poll interval, default 5s After sourcing, `IS_HOST_DRIVER` is mirrored into `WITH_SHUTDOWN_HOST_GPU_CLIENTS` for backward compatibility with the existing shell behavior. The cli flag picks up the env var as usual. Drop the `SHELL ["/busybox/sh", "-c"]` directive and the `RUN ln -s /busybox/sh /bin/sh && rm -r /var/run && ln -s /run /var/run` step from the Dockerfile, and flip the base from `distroless/go:v4.0.5-dev` to `v4.0.5`. The `/var/run` -> `/run` symlink is provided by the non-`-dev` distroless base. Companion to NVIDIA/gpu-operator#2434, which removes the `nvidia-mig-manager-entrypoint` ConfigMap and updates the state-mig-manager DaemonSet to invoke `nvidia-mig-manager` directly with the new env vars set on the container spec. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
acd7fa9 to
8d75aec
Compare
8d75aec to
f6ed616
Compare
cdesiniotis
left a comment
There was a problem hiding this comment.
A couple of minor comments / questions. Otherwise this looks good to me!
| && rm -rf /var/lib/apt/lists/* \ | ||
| && mkdir /busybox \ | ||
| && cp /bin/busybox /busybox/busybox \ | ||
| && /busybox/busybox --install -s /busybox |
There was a problem hiding this comment.
What's the purpose of this statement? What is --install actually doing?
There was a problem hiding this comment.
--install creates links for all the busybox internal commands (rm, cat, etc.). The -s makes them a symlink. What we're doing here is installing those symlinks in /busybox, copying that directory into the final image (symlinks are preserved) and adding /busybox to PATH so direct command invocations are resolved correctly.
Here's what /busybox directory looks like:
$ ls -l /busybox
-rwxr-xr-x 1 root root 1975064 May 20 04:48 busybox
lrwxrwxrwx 1 root root 16 May 20 04:48 cat -> /busybox/busybox
lrwxrwxrwx 1 root root 16 May 20 04:48 chgrp -> /busybox/busybox
lrwxrwxrwx 1 root root 16 May 20 04:48 chmod -> /busybox/busybox
lrwxrwxrwx 1 root root 16 May 20 04:48 chown -> /busybox/busybox
lrwxrwxrwx 1 root root 16 May 20 04:48 chroot -> /busybox/busyboxMy understanding is invocation of, say cat (which is resolved via $PATH), is dispatched as /busybox/busybox cat since busybox is a "multi-call binary".
There was a problem hiding this comment.
The only reason we used busybox before was because it came with the distroless-dev images. Since we are moving away from them, why can't we just add a statically-built bash binary to the final image? Wouldn't bash be much simpler to use than a busybox shell?
There was a problem hiding this comment.
The deployment manifests call out to coreutils like rm, cat, etc directly. We don't get those if we just add a static bash binary.
Flip the base from *-dev* to non-*-dev* distroless and source a static busybox from debian:trixie-slim. Init container wrappers, lifecycle hooks, and helper scripts continue to work via /bin/sh and busybox applet symlinks layered into the final image. Part of NVIDIA/cloud-native-team#299. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
f6ed616 to
c935136
Compare
Flip the base from
*-dev*to non-*-dev*distroless and source a static busybox fromdebian:trixie-slim. Init container wrappers, lifecycle hooks, and helper scripts continue to work via/bin/shand busybox applet symlinks layered into the final image.Part of NVIDIA/cloud-native-team#299.
Resolves #2435.