Skip to content

Ship static busybox shell in gpu-operator image#2434

Open
rajathagasthya wants to merge 1 commit into
NVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev
Open

Ship static busybox shell in gpu-operator image#2434
rajathagasthya wants to merge 1 commit into
NVIDIA:mainfrom
rajathagasthya:worktree-distroless-dev

Conversation

@rajathagasthya
Copy link
Copy Markdown
Contributor

@rajathagasthya rajathagasthya commented May 6, 2026

Flip the base from *-dev* to non-*-dev* distroless and source a static busybox from debian:trixie-slim. Init container wrappers, lifecycle hooks, and helper scripts continue to work via /bin/sh and busybox applet symlinks layered into the final image.

Part of NVIDIA/cloud-native-team#299.
Resolves #2435.

@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch 5 times, most recently from 19fd65d to 14f5202 Compare May 6, 2026 20:25
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 14f5202 to 20e9691 Compare May 7, 2026 15:50
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 20e9691 to 9e3efb2 Compare May 19, 2026 18:05
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 9e3efb2 to 448be34 Compare May 19, 2026 18:36
rajathagasthya added a commit to rajathagasthya/mig-parted that referenced this pull request May 19, 2026
The gpu-operator mounts a ConfigMap-backed `entrypoint.sh` into the
nvidia-mig-manager container today: it waits for the driver-ready
file, sources it as KEY=value env, derives
`WITH_SHUTDOWN_HOST_GPU_CLIENTS=$IS_HOST_DRIVER`, and execs
`nvidia-mig-manager`. That script requires a shell in the container
image, which is currently provided by the `-dev` distroless variant
via a busybox `/bin/sh` symlink. NVIDIA STIG policy is dropping
`-dev` distroless as approved parent images, so the shell has to
go — and that means the entrypoint logic has to live in the binary.

Move startup hooks into `nvidia-mig-manager` itself. A new
`internal/startup` package provides `WaitForFile` (polls
`os.Stat`) and `SourceEnvFile` (parses `KEY=value` lines with quote
and comment handling, calls `os.Setenv`). `main()` runs the hooks
before `cli.App.Run` parses flags, so any env vars sourced from
`driver-ready` are visible to the `EnvVars:` declarations on each
cli.Flag. The hooks are opt-in via env vars:

- `WAIT_FOR_DRIVER_READY=<path>` — block on the file's existence
- `DRIVER_ENV_FILE=<path>` — source KEY=value into the process env
- `WAIT_FOR_FILE_INTERVAL=<duration>` — poll interval, default 5s

After sourcing, `IS_HOST_DRIVER` is mirrored into
`WITH_SHUTDOWN_HOST_GPU_CLIENTS` for backward compatibility with
the existing shell behavior. The cli flag picks up the env var as
usual.

Drop the `SHELL ["/busybox/sh", "-c"]` directive and the
`RUN ln -s /busybox/sh /bin/sh && rm -r /var/run && ln -s /run
/var/run` step from the Dockerfile, and flip the base from
`distroless/go:v4.0.5-dev` to `v4.0.5`. The `/var/run` -> `/run`
symlink is provided by the non-`-dev` distroless base.

Companion to NVIDIA/gpu-operator#2434, which removes the
`nvidia-mig-manager-entrypoint` ConfigMap and updates the
state-mig-manager DaemonSet to invoke `nvidia-mig-manager` directly
with the new env vars set on the container spec.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch 2 times, most recently from acd7fa9 to 8d75aec Compare May 20, 2026 03:06
@rajathagasthya rajathagasthya changed the title Remove shell dependency from validator pods Ship static busybox shell in gpu-operator image May 20, 2026
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from 8d75aec to f6ed616 Compare May 20, 2026 04:59
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments / questions. Otherwise this looks good to me!

Comment thread docker/Dockerfile
&& rm -rf /var/lib/apt/lists/* \
&& mkdir /busybox \
&& cp /bin/busybox /busybox/busybox \
&& /busybox/busybox --install -s /busybox
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this statement? What is --install actually doing?

Copy link
Copy Markdown
Contributor Author

@rajathagasthya rajathagasthya May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--install creates links for all the busybox internal commands (rm, cat, etc.). The -s makes them a symlink. What we're doing here is installing those symlinks in /busybox, copying that directory into the final image (symlinks are preserved) and adding /busybox to PATH so direct command invocations are resolved correctly.

Here's what /busybox directory looks like:

$ ls -l /busybox
-rwxr-xr-x 1 root root 1975064 May 20 04:48  busybox
lrwxrwxrwx 1 root root      16 May 20 04:48  cat -> /busybox/busybox
lrwxrwxrwx 1 root root      16 May 20 04:48  chgrp -> /busybox/busybox
lrwxrwxrwx 1 root root      16 May 20 04:48  chmod -> /busybox/busybox
lrwxrwxrwx 1 root root      16 May 20 04:48  chown -> /busybox/busybox
lrwxrwxrwx 1 root root      16 May 20 04:48  chroot -> /busybox/busybox

My understanding is invocation of, say cat (which is resolved via $PATH), is dispatched as /busybox/busybox cat since busybox is a "multi-call binary".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason we used busybox before was because it came with the distroless-dev images. Since we are moving away from them, why can't we just add a statically-built bash binary to the final image? Wouldn't bash be much simpler to use than a busybox shell?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment manifests call out to coreutils like rm, cat, etc directly. We don't get those if we just add a static bash binary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

Comment thread docker/Dockerfile Outdated
@rajathagasthya rajathagasthya marked this pull request as ready for review May 20, 2026 16:22
Flip the base from *-dev* to non-*-dev* distroless and source a static
busybox from debian:trixie-slim. Init container wrappers, lifecycle
hooks, and helper scripts continue to work via /bin/sh and busybox
applet symlinks layered into the final image.

Part of NVIDIA/cloud-native-team#299.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the worktree-distroless-dev branch from f6ed616 to c935136 Compare May 20, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ship static busybox shell in gpu-operator image

3 participants