From 707e3623908ba06cf8930d1f0668eea7bdfaab5c Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Mon, 16 Mar 2026 18:52:54 -0700 Subject: [PATCH] refactor(build): unify image build graph for cache reuse Signed-off-by: Drew Newberry --- .../skills/debug-openshell-cluster/SKILL.md | 2 +- AGENTS.md | 2 +- architecture/build-containers.md | 8 +- architecture/gateway-single-node.md | 8 +- deploy/docker/Dockerfile.cluster | 270 ------------------ deploy/docker/Dockerfile.gateway | 105 ------- deploy/docker/Dockerfile.images | 232 +++++++++++++++ tasks/scripts/cluster-deploy-fast.sh | 35 +-- tasks/scripts/docker-build-cluster.sh | 79 +---- tasks/scripts/docker-build-component.sh | 180 ++---------- tasks/scripts/docker-build-image.sh | 160 +++++++++++ tasks/scripts/docker-publish-multiarch.sh | 210 +++----------- 12 files changed, 482 insertions(+), 809 deletions(-) delete mode 100644 deploy/docker/Dockerfile.cluster delete mode 100644 deploy/docker/Dockerfile.gateway create mode 100644 deploy/docker/Dockerfile.images create mode 100755 tasks/scripts/docker-build-image.sh diff --git a/.agents/skills/debug-openshell-cluster/SKILL.md b/.agents/skills/debug-openshell-cluster/SKILL.md index 115a2aa5..ceb8ae84 100644 --- a/.agents/skills/debug-openshell-cluster/SKILL.md +++ b/.agents/skills/debug-openshell-cluster/SKILL.md @@ -312,7 +312,7 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w | `metrics-server` errors in logs | Normal k3s noise, not the root cause | These errors are benign — look for the actual failing health check component | | Stale NotReady nodes from previous deploys | Volume reused across container recreations | The deploy flow now auto-cleans stale nodes; if it still fails, manually delete NotReady nodes (see Step 2) or choose "Recreate" when prompted | | gRPC `UNIMPLEMENTED` for newer RPCs in push mode | Helm values still point at older pulled images instead of the pushed refs | Verify rendered `openshell-helmchart.yaml` uses the expected push refs (`server`, `sandbox`, `pki-job`) and not `:latest` | -| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory` | Supervisor binary missing from cluster image | The cluster image was built/published without the `supervisor-builder` stage. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker | +| Sandbox pods crash with `/opt/openshell/bin/openshell-sandbox: no such file or directory` | Supervisor binary missing from cluster image | The cluster image was built/published without the `supervisor-builder` target in `deploy/docker/Dockerfile.images`. Rebuild with `mise run docker:build:cluster` and recreate gateway. Bootstrap auto-detects via `HEALTHCHECK_MISSING_SUPERVISOR` marker | | `HEALTHCHECK_MISSING_SUPERVISOR` in health check logs | `/opt/openshell/bin/openshell-sandbox` not found in gateway container | Rebuild cluster image: `mise run docker:build:cluster`, then `openshell gateway destroy && openshell gateway start` | ## Full Diagnostic Dump diff --git a/AGENTS.md b/AGENTS.md index f5cf5269..688eed1b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -99,7 +99,7 @@ These pipelines connect skills into end-to-end workflows. Individual skill files ## Cluster Infrastructure Changes -- If you change cluster bootstrap infrastructure (e.g., `openshell-bootstrap` crate, `Dockerfile.cluster`, `cluster-entrypoint.sh`, `cluster-healthcheck.sh`, deploy logic in `openshell-cli`), update the `debug-openshell-cluster` skill in `.agents/skills/debug-openshell-cluster/SKILL.md` to reflect those changes. +- If you change cluster bootstrap infrastructure (e.g., `openshell-bootstrap` crate, `deploy/docker/Dockerfile.images`, `cluster-entrypoint.sh`, `cluster-healthcheck.sh`, deploy logic in `openshell-cli`), update the `debug-openshell-cluster` skill in `.agents/skills/debug-openshell-cluster/SKILL.md` to reflect those changes. ## Documentation diff --git a/architecture/build-containers.md b/architecture/build-containers.md index 705b00d6..2e0d664d 100644 --- a/architecture/build-containers.md +++ b/architecture/build-containers.md @@ -6,7 +6,7 @@ OpenShell produces two container images, both published for `linux/amd64` and `l The gateway runs the control plane API server. It is deployed as a StatefulSet inside the cluster container via a bundled Helm chart. -- **Dockerfile**: `deploy/docker/Dockerfile.gateway` +- **Docker target**: `gateway` in `deploy/docker/Dockerfile.images` - **Registry**: `ghcr.io/nvidia/openshell/gateway:latest` - **Pulled when**: Cluster startup (the Helm chart triggers the pull) - **Entrypoint**: `openshell-server --port 8080` (gRPC + HTTP, mTLS) @@ -15,11 +15,11 @@ The gateway runs the control plane API server. It is deployed as a StatefulSet i The cluster image is a single-container Kubernetes distribution that bundles the Helm charts, Kubernetes manifests, and the `openshell-sandbox` supervisor binary needed to bootstrap the control plane. -- **Dockerfile**: `deploy/docker/Dockerfile.cluster` +- **Docker target**: `cluster` in `deploy/docker/Dockerfile.images` - **Registry**: `ghcr.io/nvidia/openshell/cluster:latest` - **Pulled when**: `openshell gateway start` -The supervisor binary (`openshell-sandbox`) is cross-compiled in a build stage and placed at `/opt/openshell/bin/openshell-sandbox`. It is exposed to sandbox pods at runtime via a read-only `hostPath` volume mount — it is not baked into sandbox images. +The supervisor binary (`openshell-sandbox`) is built by the shared `supervisor-builder` stage in `deploy/docker/Dockerfile.images` and placed at `/opt/openshell/bin/openshell-sandbox`. It is exposed to sandbox pods at runtime via a read-only `hostPath` volume mount — it is not baked into sandbox images. ## Sandbox Images @@ -42,7 +42,7 @@ The incremental deploy (`cluster-deploy-fast.sh`) fingerprints local Git changes | Changed files | Rebuild triggered | |---|---| | Cargo manifests, proto definitions, cross-build script | Gateway + supervisor | -| `crates/openshell-server/*`, `Dockerfile.gateway` | Gateway | +| `crates/openshell-server/*`, `deploy/docker/Dockerfile.images` | Gateway | | `crates/openshell-sandbox/*`, `crates/openshell-policy/*` | Supervisor | | `deploy/helm/openshell/*` | Helm upgrade | diff --git a/architecture/gateway-single-node.md b/architecture/gateway-single-node.md index 679bc338..999bc6aa 100644 --- a/architecture/gateway-single-node.md +++ b/architecture/gateway-single-node.md @@ -29,7 +29,7 @@ Out of scope: - `crates/openshell-bootstrap/src/push.rs`: Local development image push into k3s containerd. - `crates/openshell-bootstrap/src/paths.rs`: XDG path resolution. - `crates/openshell-bootstrap/src/constants.rs`: Shared constants (image name, container/volume/network naming). -- `deploy/docker/Dockerfile.cluster`: Container image definition (k3s base + Helm charts + manifests + entrypoint). +- `deploy/docker/Dockerfile.images` (target `cluster`): Container image definition (k3s base + Helm charts + manifests + entrypoint). - `deploy/docker/cluster-entrypoint.sh`: Container entrypoint (DNS proxy, registry config, manifest injection). - `deploy/docker/cluster-healthcheck.sh`: Docker HEALTHCHECK script. - Docker daemon(s): @@ -226,7 +226,7 @@ After deploy, the CLI calls `save_active_gateway(name)`, writing the gateway nam ## Container Image -The gateway image is defined in `deploy/docker/Dockerfile.cluster`: +The cluster image is defined by target `cluster` in `deploy/docker/Dockerfile.images`: ``` Base: rancher/k3s:v1.35.2-k3s1 @@ -296,7 +296,7 @@ GPU support is part of the single-node gateway bootstrap path rather than a sepa - `openshell gateway start --gpu` threads a boolean deploy option through `crates/openshell-cli`, `crates/openshell-bootstrap`, and `crates/openshell-bootstrap/src/docker.rs`. - When enabled, the cluster container is created with Docker `DeviceRequests`, which is the API equivalent of `docker run --gpus all`. -- `deploy/docker/Dockerfile.cluster` installs NVIDIA Container Toolkit packages in a dedicated Ubuntu stage and copies the runtime binaries, config, and `libnvidia-container` shared libraries into the final Ubuntu-based cluster image. +- `deploy/docker/Dockerfile.images` installs NVIDIA Container Toolkit packages in a dedicated Ubuntu stage and copies the runtime binaries, config, and `libnvidia-container` shared libraries into the final Ubuntu-based cluster image. - `deploy/docker/cluster-entrypoint.sh` checks `GPU_ENABLED=true` and copies GPU-only manifests from `/opt/openshell/gpu-manifests/` into k3s's manifests directory. - `deploy/kube/gpu-manifests/nvidia-device-plugin-helmchart.yaml` installs the NVIDIA device plugin chart, currently pinned to `0.18.2`, along with GPU Feature Discovery and Node Feature Discovery. - k3s auto-detects `nvidia-container-runtime` on `PATH`, registers the `nvidia` containerd runtime, and creates the `nvidia` `RuntimeClass` automatically. @@ -452,7 +452,7 @@ openshell/ - `crates/openshell-cli/src/main.rs` -- CLI command definitions - `crates/openshell-cli/src/run.rs` -- CLI command implementations - `crates/openshell-cli/src/bootstrap.rs` -- auto-bootstrap from sandbox create -- `deploy/docker/Dockerfile.cluster` -- container image definition +- `deploy/docker/Dockerfile.images` -- shared image build definition (cluster target) - `deploy/docker/cluster-entrypoint.sh` -- container entrypoint script - `deploy/docker/cluster-healthcheck.sh` -- Docker HEALTHCHECK script - `deploy/kube/manifests/openshell-helmchart.yaml` -- OpenShell Helm chart manifest diff --git a/deploy/docker/Dockerfile.cluster b/deploy/docker/Dockerfile.cluster deleted file mode 100644 index 49e29a98..00000000 --- a/deploy/docker/Dockerfile.cluster +++ /dev/null @@ -1,270 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# k3s cluster image with OpenShell Helm charts and manifests -# -# Multi-stage build: extracts k3s artifacts from the upstream rancher/k3s -# Alpine image and layers them onto the NVIDIA Ubuntu base image. -# -# This image includes: -# - k3s binary and all supporting binaries (containerd-shim, runc, CNI, etc.) -# - k9s for interactive cluster debugging (via `openshell doctor exec -- k9s`) -# - openshell-sandbox supervisor binary (side-loaded into sandbox pods via hostPath) -# - Packaged OpenShell Helm chart -# - HelmChart CR for auto-deploying OpenShell -# - Custom entrypoint for DNS configuration in Docker environments -# -# The gateway image (openshell/gateway) is pulled at runtime from the -# distribution registry. Sandbox images are pulled from the community registry -# (ghcr.io/nvidia/openshell-community/sandboxes). The supervisor binary is -# embedded in this cluster image and exposed to sandbox pods via a hostPath -# volume mount. -# Registry credentials are generated by the entrypoint script at container start. -# -# The helm charts are built by the docker:build:cluster mise task -# and placed in deploy/docker/.build/ before this Dockerfile is built. - -# Tracked upstream vulns in rancher/k3s:v1.35.2-k3s1 (bundled Go dependencies): -# GHSA-pwhc-rpq9-4c8w containerd v2.1.5-k3s1 (local privesc via CRI dir perms; -# upstream patched in 2.1.5 -- may be scanner false positive -# from the -k3s1 suffix) -# GHSA-p436-gjf2-799p docker/cli v28.3.2 (Windows-only plugin path hijack; N/A) -# GHSA-9h8m-3fm2-qjrq otel/sdk v1.39.0 (macOS-only PATH hijack; N/A for Linux) -# CVE-2024-36623 docker/docker v25.0.8 (streamformatter race condition) -# Bump K3S_VERSION when a release with updated dependencies ships. - -ARG K3S_VERSION=v1.35.2-k3s1 -ARG K9S_VERSION=v0.50.18 -ARG HELM_VERSION=v3.17.3 -ARG NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.2-1 - -# --------------------------------------------------------------------------- -# Stage 1: Extract k3s artifacts from upstream rancher image (Alpine-based) -# --------------------------------------------------------------------------- -FROM rancher/k3s:${K3S_VERSION} AS k3s - -# --------------------------------------------------------------------------- -# Stage 1b: Download k9s binary for interactive cluster debugging -# --------------------------------------------------------------------------- -FROM ubuntu:24.04 AS k9s -ARG K9S_VERSION -ARG TARGETARCH -RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && \ - curl -fsSL "https://github.com/derailed/k9s/releases/download/${K9S_VERSION}/k9s_Linux_${TARGETARCH}.tar.gz" \ - | tar xz -C /tmp k9s && \ - chmod +x /tmp/k9s && \ - rm -rf /var/lib/apt/lists/* - -# --------------------------------------------------------------------------- -# Stage 1c: Download helm binary for in-container chart upgrades -# --------------------------------------------------------------------------- -FROM ubuntu:24.04 AS helm -ARG HELM_VERSION -ARG TARGETARCH -RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && \ - curl -fsSL "https://get.helm.sh/helm-${HELM_VERSION}-linux-${TARGETARCH}.tar.gz" \ - | tar xz --strip-components=1 -C /tmp "linux-${TARGETARCH}/helm" && \ - chmod +x /tmp/helm && \ - rm -rf /var/lib/apt/lists/* - -# --------------------------------------------------------------------------- -# Stage 1d: Build openshell-sandbox supervisor binary -# --------------------------------------------------------------------------- -# The supervisor binary runs inside every sandbox pod. It is built here and -# placed on the k3s node filesystem at /opt/openshell/bin/openshell-sandbox, -# then mounted into sandbox pods via a read-only hostPath volume. -FROM --platform=$BUILDPLATFORM rust:1.88-slim AS supervisor-builder -ARG TARGETARCH -ARG BUILDARCH -ARG OPENSHELL_CARGO_VERSION -ARG CARGO_TARGET_CACHE_SCOPE=default -ARG SCCACHE_MEMCACHED_ENDPOINT - -# Install build dependencies -RUN apt-get update && apt-get install -y --no-install-recommends \ - cmake g++ make protobuf-compiler curl && rm -rf /var/lib/apt/lists/* - -# Install cross-compilation toolchain, sccache, + Rust target (no-ops for native builds) -COPY deploy/docker/cross-build.sh /usr/local/bin/ -RUN . cross-build.sh && install_cross_toolchain && install_sccache && add_rust_target - -WORKDIR /build - -# Copy dependency manifests first for better caching -COPY Cargo.toml Cargo.lock ./ -COPY crates/openshell-cli/Cargo.toml crates/openshell-cli/Cargo.toml -COPY crates/openshell-core/Cargo.toml crates/openshell-core/Cargo.toml -COPY crates/openshell-policy/Cargo.toml crates/openshell-policy/Cargo.toml -COPY crates/openshell-providers/Cargo.toml crates/openshell-providers/Cargo.toml -COPY crates/openshell-router/Cargo.toml crates/openshell-router/Cargo.toml -COPY crates/openshell-sandbox/Cargo.toml crates/openshell-sandbox/Cargo.toml -COPY crates/openshell-server/Cargo.toml crates/openshell-server/Cargo.toml -COPY crates/openshell-bootstrap/Cargo.toml crates/openshell-bootstrap/Cargo.toml - -# Create dummy source files to build dependencies -RUN mkdir -p crates/openshell-cli/src crates/openshell-core/src crates/openshell-policy/src \ - crates/openshell-providers/src crates/openshell-router/src crates/openshell-sandbox/src \ - crates/openshell-server/src crates/openshell-bootstrap/src && \ - echo "fn main() {}" > crates/openshell-cli/src/main.rs && \ - echo "fn main() {}" > crates/openshell-sandbox/src/main.rs && \ - echo "fn main() {}" > crates/openshell-server/src/main.rs && \ - touch crates/openshell-core/src/lib.rs && \ - touch crates/openshell-policy/src/lib.rs && \ - touch crates/openshell-providers/src/lib.rs && \ - touch crates/openshell-router/src/lib.rs && \ - touch crates/openshell-bootstrap/src/lib.rs - -# Copy proto files needed for build -COPY proto/ proto/ - -# Build dependencies only (cached unless Cargo.toml/lock changes) -RUN --mount=type=cache,id=cargo-registry-supervisor-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ - --mount=type=cache,id=cargo-target-supervisor-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ - --mount=type=cache,id=sccache-supervisor-${TARGETARCH},sharing=locked,target=/tmp/sccache \ - . cross-build.sh && cargo_cross_build -p openshell-sandbox 2>/dev/null || true - -# Copy actual source code -COPY crates/ crates/ - -# Touch source files to ensure they're rebuilt (not the cached dummy) -RUN touch crates/openshell-sandbox/src/main.rs \ - crates/openshell-core/build.rs \ - proto/*.proto - -# Build the supervisor binary -RUN --mount=type=cache,id=cargo-registry-supervisor-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ - --mount=type=cache,id=cargo-target-supervisor-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ - --mount=type=cache,id=sccache-supervisor-${TARGETARCH},sharing=locked,target=/tmp/sccache \ - . cross-build.sh && \ - if [ -n "${OPENSHELL_CARGO_VERSION:-}" ]; then \ - sed -i -E '/^\[workspace\.package\]/,/^\[/{s/^version[[:space:]]*=[[:space:]]*".*"/version = "'"${OPENSHELL_CARGO_VERSION}"'"/}' Cargo.toml; \ - fi && \ - cargo_cross_build --release -p openshell-sandbox && \ - mkdir -p /build/out && \ - cp "$(cross_output_dir release)/openshell-sandbox" /build/out/ - -# --------------------------------------------------------------------------- -# Stage 2: Install NVIDIA container toolkit on Ubuntu -# --------------------------------------------------------------------------- -FROM ubuntu:24.04 AS nvidia-toolkit - -ARG NVIDIA_CONTAINER_TOOLKIT_VERSION - -RUN apt-get update && apt-get install -y --no-install-recommends \ - gpg curl ca-certificates && \ - curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ - | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \ - curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ - | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ - | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && \ - apt-get update && \ - apt-get install -y --no-install-recommends \ - "nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ - "nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ - "libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ - "libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" && \ - rm -rf /var/lib/apt/lists/* - -# --------------------------------------------------------------------------- -# Stage 3: Runtime on NVIDIA hardened Ubuntu base -# --------------------------------------------------------------------------- -FROM nvcr.io/nvidia/base/ubuntu:noble-20251013 - -# Install runtime dependencies that k3s expects from the host OS. -# - iptables: used by flannel/kube-proxy for network policy and NAT rules -# - mount/umount: needed by kubelet for volume mounts (provided by mount package) -# - ca-certificates: TLS verification for registry pulls -# - conntrack: k3s/kube-proxy uses conntrack for connection tracking -# - dnsutils: nslookup used by entrypoint/healthcheck for DNS probe -RUN apt-get update && apt-get install -y --no-install-recommends \ - ca-certificates \ - iptables \ - mount \ - dnsutils \ - && rm -rf /var/lib/apt/lists/* - -# Copy the full /bin directory from k3s (contains all statically-linked -# binaries and their symlinks: k3s, kubectl, crictl, ctr, containerd, -# containerd-shim-runc-v2, runc, cni plugins, busybox, coreutils, -# ip, ipset, conntrack, nsenter, pigz, etc.) -COPY --from=k3s /bin/ /bin/ - -# Copy k9s binary for interactive cluster debugging via `openshell doctor exec -- k9s` -COPY --from=k9s /tmp/k9s /usr/local/bin/k9s - -# Copy helm binary for in-container chart upgrades (used by cluster-deploy-fast.sh) -COPY --from=helm /tmp/helm /usr/local/bin/helm - -# Copy iptables/nftables tooling (xtables-nft-multi, iptables-detect.sh, etc.) -# These are in /bin/aux/ in the k3s image and must be on PATH. -# Note: the Ubuntu iptables package provides /usr/sbin/iptables, but k3s -# expects its own bundled version at /bin/aux/iptables. Both are on PATH; -# k3s finds its copy via /bin/aux in PATH. - -# Copy CA certificates from k3s (bundled Alpine CA bundle). -# The Ubuntu ca-certificates package also installs certs; having both is fine. -COPY --from=k3s /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/k3s-ca-certificates.crt - -# Copy timezone data used by k3s/Go for time.LoadLocation -COPY --from=k3s /usr/share/zoneinfo/ /usr/share/zoneinfo/ - -# Set environment variables matching the upstream k3s image. -# PATH includes /bin/aux for iptables tooling and /var/lib/rancher/k3s/data/cni -# for runtime-extracted CNI binaries. -ENV PATH="/var/lib/rancher/k3s/data/cni:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin/aux" \ - CRI_CONFIG_FILE="/var/lib/rancher/k3s/agent/etc/crictl.yaml" - -# Copy NVIDIA Container Toolkit files from the build stage. -# k3s auto-detects nvidia-container-runtime on PATH and registers it as a -# containerd runtime + creates the "nvidia" RuntimeClass automatically. -COPY --from=nvidia-toolkit /usr/bin/nvidia-cdi-hook /usr/bin/ -COPY --from=nvidia-toolkit /usr/bin/nvidia-container-runtime /usr/bin/ -COPY --from=nvidia-toolkit /usr/bin/nvidia-container-runtime-hook /usr/bin/ -COPY --from=nvidia-toolkit /usr/bin/nvidia-container-cli /usr/bin/ -COPY --from=nvidia-toolkit /usr/bin/nvidia-ctk /usr/bin/ -COPY --from=nvidia-toolkit /etc/nvidia-container-runtime /etc/nvidia-container-runtime -COPY --from=nvidia-toolkit /usr/lib/*-linux-gnu/libnvidia-container*.so* /usr/lib/ - -# Copy the openshell-sandbox supervisor binary to the node filesystem. -# Sandbox pods mount /opt/openshell/bin as a read-only hostPath volume -# to side-load the supervisor without baking it into every sandbox image. -COPY --from=supervisor-builder /build/out/openshell-sandbox /opt/openshell/bin/openshell-sandbox - -# Create directories for manifests, charts, and configuration -RUN mkdir -p /var/lib/rancher/k3s/server/manifests \ - /var/lib/rancher/k3s/server/static/charts \ - /etc/rancher/k3s \ - /opt/openshell/manifests \ - /opt/openshell/charts \ - /opt/openshell/gpu-manifests \ - /run/flannel - -# Copy entrypoint script that configures DNS for Docker environments -# This script detects the host gateway IP and configures CoreDNS to use it -COPY deploy/docker/cluster-entrypoint.sh /usr/local/bin/cluster-entrypoint.sh -RUN chmod +x /usr/local/bin/cluster-entrypoint.sh - -# Copy healthcheck script that verifies cluster readiness -COPY deploy/docker/cluster-healthcheck.sh /usr/local/bin/cluster-healthcheck.sh -RUN chmod +x /usr/local/bin/cluster-healthcheck.sh - -# Registry credentials for pulling component images at runtime are generated -# by the entrypoint script at /etc/rancher/k3s/registries.yaml. - -# Copy packaged helm charts to a staging directory that won't be -# overwritten by the /var/lib/rancher/k3s volume mount. The entrypoint -# script copies them into the k3s static charts directory at container start. -COPY deploy/docker/.build/charts/*.tgz /opt/openshell/charts/ - -# Copy Kubernetes manifests to a persistent location that won't be overwritten by the volume mount. -# The bootstrap code will copy these to /var/lib/rancher/k3s/server/manifests/ after cluster start. -COPY deploy/kube/manifests/*.yaml /opt/openshell/manifests/ - -# Copy GPU-specific manifests (deployed conditionally by entrypoint when GPU_ENABLED=true) -COPY deploy/kube/gpu-manifests/*.yaml /opt/openshell/gpu-manifests/ - -# Use custom entrypoint that configures DNS before starting k3s -ENTRYPOINT ["/usr/local/bin/cluster-entrypoint.sh"] - -HEALTHCHECK --interval=5s --timeout=5s --start-period=20s --retries=60 \ - CMD ["/usr/local/bin/cluster-healthcheck.sh"] diff --git a/deploy/docker/Dockerfile.gateway b/deploy/docker/Dockerfile.gateway deleted file mode 100644 index 05d2a46f..00000000 --- a/deploy/docker/Dockerfile.gateway +++ /dev/null @@ -1,105 +0,0 @@ -# syntax=docker/dockerfile:1.4 - -# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# OpenShell Gateway Docker image -# Multi-stage build with cross-compilation support for multi-arch - -# Stage 1: Rust builder (runs on build platform, cross-compiles for target) -FROM --platform=$BUILDPLATFORM rust:1.88-slim AS builder -ARG TARGETARCH -ARG BUILDARCH -ARG OPENSHELL_CARGO_VERSION -ARG CARGO_TARGET_CACHE_SCOPE=default - -# Install build dependencies -RUN apt-get update && apt-get install -y --no-install-recommends \ - cmake g++ make protobuf-compiler curl && rm -rf /var/lib/apt/lists/* - -# Install cross-compilation toolchain, sccache, + Rust target (no-ops for native builds) -COPY deploy/docker/cross-build.sh /usr/local/bin/ -RUN . cross-build.sh && install_cross_toolchain && install_sccache && add_rust_target - -ARG SCCACHE_MEMCACHED_ENDPOINT - -WORKDIR /build - -# Copy dependency manifests first for better caching -COPY Cargo.toml Cargo.lock ./ -COPY crates/openshell-cli/Cargo.toml crates/openshell-cli/Cargo.toml -COPY crates/openshell-core/Cargo.toml crates/openshell-core/Cargo.toml -COPY crates/openshell-providers/Cargo.toml crates/openshell-providers/Cargo.toml -COPY crates/openshell-router/Cargo.toml crates/openshell-router/Cargo.toml -COPY crates/openshell-sandbox/Cargo.toml crates/openshell-sandbox/Cargo.toml -COPY crates/openshell-server/Cargo.toml crates/openshell-server/Cargo.toml -COPY crates/openshell-bootstrap/Cargo.toml crates/openshell-bootstrap/Cargo.toml - -# Create dummy source files to build dependencies -RUN mkdir -p crates/openshell-cli/src crates/openshell-core/src crates/openshell-providers/src crates/openshell-router/src crates/openshell-sandbox/src crates/openshell-server/src crates/openshell-bootstrap/src && \ - echo "fn main() {}" > crates/openshell-cli/src/main.rs && \ - echo "fn main() {}" > crates/openshell-sandbox/src/main.rs && \ - echo "fn main() {}" > crates/openshell-server/src/main.rs && \ - touch crates/openshell-core/src/lib.rs && \ - touch crates/openshell-providers/src/lib.rs && \ - touch crates/openshell-router/src/lib.rs && \ - touch crates/openshell-bootstrap/src/lib.rs - -# Copy proto files needed for build -COPY proto/ proto/ - -# Build dependencies only (cached unless Cargo.toml/lock changes). -# sccache uses memcached in CI (SCCACHE_MEMCACHED_ENDPOINT) or the local -# disk cache mount for local dev builds. The cargo-target mount gives cargo -# a persistent target/ dir for true incremental rebuilds on source changes. -RUN --mount=type=cache,id=cargo-registry-gateway-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ - --mount=type=cache,id=cargo-target-gateway-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ - --mount=type=cache,id=sccache-gateway-${TARGETARCH},sharing=locked,target=/tmp/sccache \ - . cross-build.sh && cargo_cross_build --release -p openshell-server 2>/dev/null || true - -# Copy actual source code -COPY crates/ crates/ - -# Touch source files to ensure they're rebuilt (not the cached dummy). -# Touch build.rs and proto files to force proto code regeneration when the -# cargo target cache mount retains stale OUT_DIR artifacts from prior builds. -RUN touch crates/openshell-server/src/main.rs \ - crates/openshell-core/build.rs \ - proto/*.proto - -# Build the actual application -RUN --mount=type=cache,id=cargo-registry-gateway-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ - --mount=type=cache,id=cargo-target-gateway-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ - --mount=type=cache,id=sccache-gateway-${TARGETARCH},sharing=locked,target=/tmp/sccache \ - . cross-build.sh && \ - if [ -n "${OPENSHELL_CARGO_VERSION:-}" ]; then \ - sed -i -E '/^\[workspace\.package\]/,/^\[/{s/^version[[:space:]]*=[[:space:]]*".*"/version = "'"${OPENSHELL_CARGO_VERSION}"'"/}' Cargo.toml; \ - fi && \ - cargo_cross_build --release -p openshell-server && \ - cp "$(cross_output_dir release)/openshell-server" /build/openshell-server - -# Stage 2: Runtime (uses target platform) -# NVIDIA hardened Ubuntu base for supply chain consistency. -FROM nvcr.io/nvidia/base/ubuntu:noble-20251013 AS runtime - -RUN apt-get update && apt-get install -y --no-install-recommends \ - ca-certificates && rm -rf /var/lib/apt/lists/* - -RUN useradd --create-home --user-group openshell - -WORKDIR /app - -COPY --from=builder /build/openshell-server /usr/local/bin/ - -# Copy migrations to the build-time manifest directory expected by sqlx -RUN mkdir -p /build/crates/openshell-server -COPY crates/openshell-server/migrations /build/crates/openshell-server/migrations - -USER openshell -EXPOSE 8080 - -# Health checks are handled by Kubernetes liveness/readiness probes (tcpSocket). -# No Docker HEALTHCHECK is needed since this image runs inside a k3s cluster. - -ENTRYPOINT ["openshell-server"] -CMD ["--port", "8080"] diff --git a/deploy/docker/Dockerfile.images b/deploy/docker/Dockerfile.images new file mode 100644 index 00000000..84edf449 --- /dev/null +++ b/deploy/docker/Dockerfile.images @@ -0,0 +1,232 @@ +# syntax=docker/dockerfile:1.4 + +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +# Shared OpenShell image build graph. +# +# Targets: +# gateway Final gateway image +# cluster Final cluster image +# gateway-builder Release openshell-server binary +# supervisor-builder Release openshell-sandbox binary + +ARG K3S_VERSION=v1.35.2-k3s1 +ARG K9S_VERSION=v0.50.18 +ARG HELM_VERSION=v3.17.3 +ARG NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.2-1 + +# --------------------------------------------------------------------------- +# Shared Rust build stages +# --------------------------------------------------------------------------- +FROM --platform=$BUILDPLATFORM rust:1.88-slim AS rust-builder-base +ARG TARGETARCH +ARG BUILDARCH +ARG OPENSHELL_CARGO_VERSION +ARG CARGO_TARGET_CACHE_SCOPE=default +ARG SCCACHE_MEMCACHED_ENDPOINT + +RUN apt-get update && apt-get install -y --no-install-recommends \ + cmake g++ make protobuf-compiler curl && rm -rf /var/lib/apt/lists/* + +COPY deploy/docker/cross-build.sh /usr/local/bin/ +RUN . cross-build.sh && install_cross_toolchain && install_sccache && add_rust_target + +WORKDIR /build + +FROM rust-builder-base AS rust-builder-skeleton + +COPY Cargo.toml Cargo.lock ./ +COPY crates/openshell-bootstrap/Cargo.toml crates/openshell-bootstrap/Cargo.toml +COPY crates/openshell-cli/Cargo.toml crates/openshell-cli/Cargo.toml +COPY crates/openshell-core/Cargo.toml crates/openshell-core/Cargo.toml +COPY crates/openshell-policy/Cargo.toml crates/openshell-policy/Cargo.toml +COPY crates/openshell-providers/Cargo.toml crates/openshell-providers/Cargo.toml +COPY crates/openshell-router/Cargo.toml crates/openshell-router/Cargo.toml +COPY crates/openshell-sandbox/Cargo.toml crates/openshell-sandbox/Cargo.toml +COPY crates/openshell-server/Cargo.toml crates/openshell-server/Cargo.toml +COPY crates/openshell-tui/Cargo.toml crates/openshell-tui/Cargo.toml +COPY crates/openshell-core/build.rs crates/openshell-core/build.rs +COPY proto/ proto/ + +RUN mkdir -p \ + crates/openshell-bootstrap/src \ + crates/openshell-cli/src \ + crates/openshell-core/src \ + crates/openshell-policy/src \ + crates/openshell-providers/src \ + crates/openshell-router/src \ + crates/openshell-sandbox/src \ + crates/openshell-server/src \ + crates/openshell-tui/src && \ + touch crates/openshell-bootstrap/src/lib.rs && \ + printf 'fn main() {}\n' > crates/openshell-cli/src/main.rs && \ + touch crates/openshell-core/src/lib.rs && \ + touch crates/openshell-policy/src/lib.rs && \ + touch crates/openshell-providers/src/lib.rs && \ + touch crates/openshell-router/src/lib.rs && \ + touch crates/openshell-sandbox/src/lib.rs && \ + printf 'fn main() {}\n' > crates/openshell-sandbox/src/main.rs && \ + touch crates/openshell-server/src/lib.rs && \ + printf 'fn main() {}\n' > crates/openshell-server/src/main.rs && \ + touch crates/openshell-tui/src/lib.rs + +FROM rust-builder-skeleton AS rust-deps + +RUN --mount=type=cache,id=cargo-registry-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ + --mount=type=cache,id=cargo-git-${TARGETARCH},sharing=locked,target=/usr/local/cargo/git \ + --mount=type=cache,id=cargo-target-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ + --mount=type=cache,id=sccache-${TARGETARCH},sharing=locked,target=/tmp/sccache \ + . cross-build.sh && cargo_cross_build --release -p openshell-server -p openshell-sandbox + +FROM rust-deps AS rust-workspace + +COPY crates/ crates/ + +RUN touch \ + crates/openshell-core/build.rs \ + crates/openshell-sandbox/src/main.rs \ + crates/openshell-server/src/main.rs \ + proto/*.proto && \ + if [ -n "${OPENSHELL_CARGO_VERSION:-}" ]; then \ + sed -i -E '/^\[workspace\.package\]/,/^\[/{s/^version[[:space:]]*=[[:space:]]*".*"/version = "'"${OPENSHELL_CARGO_VERSION}"'"/}' Cargo.toml; \ + fi + +FROM rust-workspace AS gateway-builder + +RUN --mount=type=cache,id=cargo-registry-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ + --mount=type=cache,id=cargo-git-${TARGETARCH},sharing=locked,target=/usr/local/cargo/git \ + --mount=type=cache,id=cargo-target-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ + --mount=type=cache,id=sccache-${TARGETARCH},sharing=locked,target=/tmp/sccache \ + . cross-build.sh && \ + cargo_cross_build --release -p openshell-server && \ + mkdir -p /build/out && \ + cp "$(cross_output_dir release)/openshell-server" /build/out/ + +FROM rust-workspace AS supervisor-builder + +RUN --mount=type=cache,id=cargo-registry-${TARGETARCH},sharing=locked,target=/usr/local/cargo/registry \ + --mount=type=cache,id=cargo-git-${TARGETARCH},sharing=locked,target=/usr/local/cargo/git \ + --mount=type=cache,id=cargo-target-${TARGETARCH}-${CARGO_TARGET_CACHE_SCOPE},sharing=locked,target=/build/target \ + --mount=type=cache,id=sccache-${TARGETARCH},sharing=locked,target=/tmp/sccache \ + . cross-build.sh && \ + cargo_cross_build --release -p openshell-sandbox && \ + mkdir -p /build/out && \ + cp "$(cross_output_dir release)/openshell-sandbox" /build/out/ + +# --------------------------------------------------------------------------- +# Final gateway image +# --------------------------------------------------------------------------- +FROM nvcr.io/nvidia/base/ubuntu:noble-20251013 AS gateway + +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates && rm -rf /var/lib/apt/lists/* + +RUN useradd --create-home --user-group openshell + +WORKDIR /app + +COPY --from=gateway-builder /build/out/openshell-server /usr/local/bin/ + +RUN mkdir -p /build/crates/openshell-server +COPY crates/openshell-server/migrations /build/crates/openshell-server/migrations + +USER openshell +EXPOSE 8080 + +ENTRYPOINT ["openshell-server"] +CMD ["--port", "8080"] + +# --------------------------------------------------------------------------- +# Cluster asset stages +# --------------------------------------------------------------------------- +FROM rancher/k3s:${K3S_VERSION} AS k3s + +FROM ubuntu:24.04 AS k9s +ARG K9S_VERSION +ARG TARGETARCH +RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && \ + curl -fsSL "https://github.com/derailed/k9s/releases/download/${K9S_VERSION}/k9s_Linux_${TARGETARCH}.tar.gz" \ + | tar xz -C /tmp k9s && \ + chmod +x /tmp/k9s && \ + rm -rf /var/lib/apt/lists/* + +FROM ubuntu:24.04 AS helm +ARG HELM_VERSION +ARG TARGETARCH +RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && \ + curl -fsSL "https://get.helm.sh/helm-${HELM_VERSION}-linux-${TARGETARCH}.tar.gz" \ + | tar xz --strip-components=1 -C /tmp "linux-${TARGETARCH}/helm" && \ + chmod +x /tmp/helm && \ + rm -rf /var/lib/apt/lists/* + +FROM ubuntu:24.04 AS nvidia-toolkit +ARG NVIDIA_CONTAINER_TOOLKIT_VERSION + +RUN apt-get update && apt-get install -y --no-install-recommends \ + gpg curl ca-certificates && \ + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && \ + curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ + | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list && \ + apt-get update && \ + apt-get install -y --no-install-recommends \ + "nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ + "nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ + "libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" \ + "libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}" && \ + rm -rf /var/lib/apt/lists/* + +# --------------------------------------------------------------------------- +# Final cluster image +# --------------------------------------------------------------------------- +FROM nvcr.io/nvidia/base/ubuntu:noble-20251013 AS cluster + +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates \ + iptables \ + mount \ + dnsutils \ + && rm -rf /var/lib/apt/lists/* + +COPY --from=k3s /bin/ /bin/ +COPY --from=k9s /tmp/k9s /usr/local/bin/k9s +COPY --from=helm /tmp/helm /usr/local/bin/helm +COPY --from=k3s /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/k3s-ca-certificates.crt +COPY --from=k3s /usr/share/zoneinfo/ /usr/share/zoneinfo/ + +ENV PATH="/var/lib/rancher/k3s/data/cni:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/bin/aux" \ + CRI_CONFIG_FILE="/var/lib/rancher/k3s/agent/etc/crictl.yaml" + +COPY --from=nvidia-toolkit /usr/bin/nvidia-cdi-hook /usr/bin/ +COPY --from=nvidia-toolkit /usr/bin/nvidia-container-runtime /usr/bin/ +COPY --from=nvidia-toolkit /usr/bin/nvidia-container-runtime-hook /usr/bin/ +COPY --from=nvidia-toolkit /usr/bin/nvidia-container-cli /usr/bin/ +COPY --from=nvidia-toolkit /usr/bin/nvidia-ctk /usr/bin/ +COPY --from=nvidia-toolkit /etc/nvidia-container-runtime /etc/nvidia-container-runtime +COPY --from=nvidia-toolkit /usr/lib/*-linux-gnu/libnvidia-container*.so* /usr/lib/ +COPY --from=supervisor-builder /build/out/openshell-sandbox /opt/openshell/bin/openshell-sandbox + +RUN mkdir -p /var/lib/rancher/k3s/server/manifests \ + /var/lib/rancher/k3s/server/static/charts \ + /etc/rancher/k3s \ + /opt/openshell/manifests \ + /opt/openshell/charts \ + /opt/openshell/gpu-manifests \ + /run/flannel + +COPY deploy/docker/cluster-entrypoint.sh /usr/local/bin/cluster-entrypoint.sh +RUN chmod +x /usr/local/bin/cluster-entrypoint.sh + +COPY deploy/docker/cluster-healthcheck.sh /usr/local/bin/cluster-healthcheck.sh +RUN chmod +x /usr/local/bin/cluster-healthcheck.sh + +COPY deploy/docker/.build/charts/*.tgz /opt/openshell/charts/ +COPY deploy/kube/manifests/*.yaml /opt/openshell/manifests/ +COPY deploy/kube/gpu-manifests/*.yaml /opt/openshell/gpu-manifests/ + +ENTRYPOINT ["/usr/local/bin/cluster-entrypoint.sh"] + +HEALTHCHECK --interval=5s --timeout=5s --start-period=20s --retries=60 \ + CMD ["/usr/local/bin/cluster-healthcheck.sh"] diff --git a/tasks/scripts/cluster-deploy-fast.sh b/tasks/scripts/cluster-deploy-fast.sh index 213c2e25..1b2991da 100755 --- a/tasks/scripts/cluster-deploy-fast.sh +++ b/tasks/scripts/cluster-deploy-fast.sh @@ -149,13 +149,13 @@ matches_gateway() { Cargo.toml|Cargo.lock|proto/*|deploy/docker/cross-build.sh) return 0 ;; - crates/openshell-core/*|crates/openshell-providers/*) + crates/openshell-core/*|crates/openshell-policy/*|crates/openshell-providers/*) return 0 ;; crates/openshell-router/*) return 0 ;; - crates/openshell-server/*|deploy/docker/Dockerfile.gateway) + crates/openshell-server/*|deploy/docker/Dockerfile.images) return 0 ;; *) @@ -173,7 +173,7 @@ matches_supervisor() { crates/openshell-core/*|crates/openshell-policy/*|crates/openshell-router/*) return 0 ;; - crates/openshell-sandbox/*) + crates/openshell-sandbox/*|deploy/docker/Dockerfile.images) return 0 ;; *) @@ -206,7 +206,7 @@ compute_fingerprint() { local committed_trees="" case "${component}" in gateway) - committed_trees=$(git ls-tree HEAD Cargo.toml Cargo.lock proto/ deploy/docker/cross-build.sh crates/openshell-core/ crates/openshell-providers/ crates/openshell-router/ crates/openshell-server/ deploy/docker/Dockerfile.gateway 2>/dev/null || true) + committed_trees=$(git ls-tree HEAD Cargo.toml Cargo.lock proto/ deploy/docker/cross-build.sh crates/openshell-core/ crates/openshell-policy/ crates/openshell-providers/ crates/openshell-router/ crates/openshell-server/ deploy/docker/Dockerfile.images 2>/dev/null || true) ;; supervisor) committed_trees=$(git ls-tree HEAD Cargo.toml Cargo.lock proto/ deploy/docker/cross-build.sh crates/openshell-core/ crates/openshell-policy/ crates/openshell-router/ crates/openshell-sandbox/ 2>/dev/null || true) @@ -315,32 +315,21 @@ if [[ "${build_supervisor}" == "1" ]]; then _cluster_image=$(docker inspect --format '{{.Config.Image}}' "${CONTAINER_NAME}" 2>/dev/null) CLUSTER_ARCH=$(docker image inspect --format '{{.Architecture}}' "${_cluster_image}" 2>/dev/null || echo "amd64") - # Build the supervisor binary using docker buildx with a lightweight build. - # We use the same cross-build.sh helpers as the full cluster image but only - # compile openshell-sandbox, then extract the binary via --output. + # Build the supervisor binary from the shared image build graph, then + # extract it via --output so fast deploys reuse the same Rust cache. SUPERVISOR_BUILD_DIR=$(mktemp -d) trap 'rm -rf "${SUPERVISOR_BUILD_DIR}"' EXIT # Compute cargo version from git tags for the supervisor binary. - SUPERVISOR_VERSION_ARGS=() - if [[ -n "${OPENSHELL_CARGO_VERSION:-}" ]]; then - SUPERVISOR_VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${OPENSHELL_CARGO_VERSION}") - else + _cargo_version=${OPENSHELL_CARGO_VERSION:-} + if [[ -z "${_cargo_version}" ]]; then _cargo_version=$(uv run python tasks/scripts/release.py get-version --cargo 2>/dev/null || true) - if [[ -n "${_cargo_version}" ]]; then - SUPERVISOR_VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${_cargo_version}") - fi fi - docker buildx build \ - --file deploy/docker/Dockerfile.cluster \ - --target supervisor-builder \ - --build-arg "BUILDARCH=$(docker version --format '{{.Server.Arch}}')" \ - --build-arg "TARGETARCH=${CLUSTER_ARCH}" \ - ${SUPERVISOR_VERSION_ARGS[@]+"${SUPERVISOR_VERSION_ARGS[@]}"} \ - --output "type=local,dest=${SUPERVISOR_BUILD_DIR}" \ - --platform "linux/${CLUSTER_ARCH}" \ - . + DOCKER_PLATFORM="linux/${CLUSTER_ARCH}" \ + DOCKER_OUTPUT="type=local,dest=${SUPERVISOR_BUILD_DIR}" \ + OPENSHELL_CARGO_VERSION="${_cargo_version}" \ + tasks/scripts/docker-build-image.sh supervisor-builder # Copy the built binary into the running k3s container docker exec "${CONTAINER_NAME}" mkdir -p /opt/openshell/bin diff --git a/tasks/scripts/docker-build-cluster.sh b/tasks/scripts/docker-build-cluster.sh index 80dc2a48..425d8e75 100755 --- a/tasks/scripts/docker-build-cluster.sh +++ b/tasks/scripts/docker-build-cluster.sh @@ -3,89 +3,12 @@ # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 -# Build the k3s cluster image with bundled helm charts. -# -# Environment: -# IMAGE_TAG - Image tag (default: dev) -# K3S_VERSION - k3s version override (optional; default in Dockerfile.cluster) - -# DOCKER_PLATFORM - Target platform (optional) -# DOCKER_BUILDER - Buildx builder name (default: auto-select) -# DOCKER_PUSH - When set to "1", push instead of loading into local daemon -# IMAGE_REGISTRY - Registry prefix for image name (e.g. ghcr.io/org/repo) set -euo pipefail -IMAGE_TAG=${IMAGE_TAG:-dev} -IMAGE_NAME="openshell/cluster" -if [[ -n "${IMAGE_REGISTRY:-}" ]]; then - IMAGE_NAME="${IMAGE_REGISTRY}/cluster" -fi -DOCKER_BUILD_CACHE_DIR=${DOCKER_BUILD_CACHE_DIR:-.cache/buildkit} -CACHE_PATH="${DOCKER_BUILD_CACHE_DIR}/cluster" - -mkdir -p "${CACHE_PATH}" - -# Select builder — prefer native "docker" driver for local single-arch builds -# to avoid slow tarball export from the docker-container driver. -BUILDER_ARGS=() -if [[ -n "${DOCKER_BUILDER:-}" ]]; then - BUILDER_ARGS=(--builder "${DOCKER_BUILDER}") -elif [[ -z "${DOCKER_PLATFORM:-}" && -z "${CI:-}" ]]; then - _ctx=$(docker context inspect --format '{{.Name}}' 2>/dev/null || echo default) - BUILDER_ARGS=(--builder "${_ctx}") -fi - -CACHE_ARGS=() -if [[ -z "${CI:-}" ]]; then - # Local development: use filesystem cache with docker-container driver. - if docker buildx inspect ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} 2>/dev/null | grep -q "Driver: docker-container"; then - CACHE_ARGS=( - --cache-from "type=local,src=${CACHE_PATH}" - --cache-to "type=local,dest=${CACHE_PATH},mode=max" - ) - fi -fi - -# Create build directory for charts mkdir -p deploy/docker/.build/charts -# Package helm chart echo "Packaging helm chart..." helm package deploy/helm/openshell -d deploy/docker/.build/charts/ -# Build cluster image (no bundled component images — they are pulled at runtime -# from the distribution registry; credentials are injected at deploy time) echo "Building cluster image..." - -OUTPUT_FLAG="--load" -if [[ "${DOCKER_PUSH:-}" == "1" ]]; then - OUTPUT_FLAG="--push" -elif [[ "${DOCKER_PLATFORM:-}" == *","* ]]; then - # Multi-platform builds cannot use --load; push is required. - OUTPUT_FLAG="--push" -fi - -# Compute cargo version from git tags (same scheme as docker-build-component.sh). -VERSION_ARGS=() -if [[ -n "${OPENSHELL_CARGO_VERSION:-}" ]]; then - VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${OPENSHELL_CARGO_VERSION}") -else - CARGO_VERSION=$(uv run python tasks/scripts/release.py get-version --cargo 2>/dev/null || true) - if [[ -n "${CARGO_VERSION}" ]]; then - VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${CARGO_VERSION}") - fi -fi - -docker buildx build \ - ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} \ - ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ - ${CACHE_ARGS[@]+"${CACHE_ARGS[@]}"} \ - ${VERSION_ARGS[@]+"${VERSION_ARGS[@]}"} \ - -f deploy/docker/Dockerfile.cluster \ - -t ${IMAGE_NAME}:${IMAGE_TAG} \ - ${K3S_VERSION:+--build-arg K3S_VERSION=${K3S_VERSION}} \ - --provenance=false \ - ${OUTPUT_FLAG} \ - . - -echo "Done! Cluster image: ${IMAGE_NAME}:${IMAGE_TAG}" +exec tasks/scripts/docker-build-image.sh cluster "$@" diff --git a/tasks/scripts/docker-build-component.sh b/tasks/scripts/docker-build-component.sh index f20d7295..312e5ac0 100755 --- a/tasks/scripts/docker-build-component.sh +++ b/tasks/scripts/docker-build-component.sh @@ -3,159 +3,35 @@ # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 -# Generic Docker image builder for OpenShell components. -# Usage: docker-build-component.sh [extra docker build args...] -# -# docker-build-component.sh gateway -> Dockerfile.gateway -> openshell/gateway:dev -# docker-build-component.sh cluster -> Dockerfile.cluster -> openshell/cluster:dev -# -# Environment: -# IMAGE_TAG - Image tag (default: dev) -# DOCKER_PLATFORM - Target platform (optional, e.g. linux/amd64) -# DOCKER_BUILDER - Buildx builder name (default: auto-select) -# DOCKER_PUSH - When set to "1", push instead of loading into local daemon -# IMAGE_REGISTRY - Registry prefix for image name (e.g. ghcr.io/org/repo) set -euo pipefail -sha256_16() { - if command -v sha256sum >/dev/null 2>&1; then - sha256sum "$1" | awk '{print substr($1, 1, 16)}' - else - shasum -a 256 "$1" | awk '{print substr($1, 1, 16)}' - fi -} - -sha256_16_stdin() { - if command -v sha256sum >/dev/null 2>&1; then - sha256sum | awk '{print substr($1, 1, 16)}' - else - shasum -a 256 | awk '{print substr($1, 1, 16)}' - fi -} - -detect_rust_scope() { - local dockerfile="$1" - local rust_from - rust_from=$(grep -E '^FROM --platform=\$BUILDPLATFORM rust:[^ ]+' "$dockerfile" | head -n1 | sed -E 's/^FROM --platform=\$BUILDPLATFORM rust:([^ ]+).*/\1/' || true) - if [[ -n "${rust_from}" ]]; then - echo "rust-${rust_from}" - return - fi - - if grep -q "rustup.rs" "$dockerfile"; then - echo "rustup-stable" - return - fi - - echo "no-rust" -} - -COMPONENT=${1:?"Usage: docker-build-component.sh [variant] [extra-args...]"} +COMPONENT=${1:?"Usage: docker-build-component.sh [extra-args...]"} shift -# Resolve Dockerfile path and image name. -# If the component has a subdirectory layout, consume the next positional arg -# as a variant name (default: base). -VARIANT="" -COMPONENT_DIR="deploy/docker/${COMPONENT}" -if [[ -d "${COMPONENT_DIR}" ]]; then - # Subdirectory layout — check for a variant argument. - if [[ $# -gt 0 && ! "$1" == --* ]]; then - VARIANT="$1" - shift - fi - VARIANT=${VARIANT:-base} - DOCKERFILE="${COMPONENT_DIR}/Dockerfile.${VARIANT}" - if [[ "${VARIANT}" == "base" ]]; then - IMAGE_NAME="openshell/${COMPONENT}" - else - IMAGE_NAME="openshell/${COMPONENT}-${VARIANT}" - fi -else - # Flat layout: deploy/docker/Dockerfile. - DOCKERFILE="deploy/docker/Dockerfile.${COMPONENT}" - IMAGE_NAME="openshell/${COMPONENT}" -fi - -if [[ ! -f "${DOCKERFILE}" ]]; then - echo "Error: Dockerfile not found: ${DOCKERFILE}" >&2 - exit 1 -fi - -# Prefix with registry when set (e.g. ghcr.io/org/repo/gateway:tag). -# Replaces the default "openshell/" prefix with the registry path. -if [[ -n "${IMAGE_REGISTRY:-}" ]]; then - _suffix="${IMAGE_NAME#openshell/}" - IMAGE_NAME="${IMAGE_REGISTRY}/${_suffix}" -fi - -IMAGE_TAG=${IMAGE_TAG:-dev} -DOCKER_BUILD_CACHE_DIR=${DOCKER_BUILD_CACHE_DIR:-.cache/buildkit} -CACHE_PATH="${DOCKER_BUILD_CACHE_DIR}/${COMPONENT}${VARIANT:+-${VARIANT}}" - -mkdir -p "${CACHE_PATH}" - -# Select the builder. For local (single-arch) builds use a builder with the -# native "docker" driver so images land directly in the Docker image store — -# no slow tarball export via the docker-container driver. -# Multi-platform builds (DOCKER_PLATFORM set) keep the current builder which -# is typically docker-container. -BUILDER_ARGS=() -if [[ -n "${DOCKER_BUILDER:-}" ]]; then - BUILDER_ARGS=(--builder "${DOCKER_BUILDER}") -elif [[ -z "${DOCKER_PLATFORM:-}" && -z "${CI:-}" ]]; then - # Pick the builder matching the active docker context (uses docker driver). - _ctx=$(docker context inspect --format '{{.Name}}' 2>/dev/null || echo default) - BUILDER_ARGS=(--builder "${_ctx}") -fi - -CACHE_ARGS=() -if [[ -z "${CI:-}" ]]; then - # Local development: use filesystem cache with docker-container driver. - if docker buildx inspect ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} 2>/dev/null | grep -q "Driver: docker-container"; then - CACHE_ARGS=( - --cache-from "type=local,src=${CACHE_PATH}" - --cache-to "type=local,dest=${CACHE_PATH},mode=max" - ) - fi -fi - -OUTPUT_FLAG="--load" -if [[ "${DOCKER_PUSH:-}" == "1" ]]; then - OUTPUT_FLAG="--push" -elif [[ "${DOCKER_PLATFORM:-}" == *","* ]]; then - # Multi-platform builds cannot use --load; push is required. - OUTPUT_FLAG="--push" -fi - -SCCACHE_ARGS=() -if [[ -n "${SCCACHE_MEMCACHED_ENDPOINT:-}" ]]; then - SCCACHE_ARGS=(--build-arg "SCCACHE_MEMCACHED_ENDPOINT=${SCCACHE_MEMCACHED_ENDPOINT}") -fi - -VERSION_ARGS=() -if [[ -n "${OPENSHELL_CARGO_VERSION:-}" ]]; then - VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${OPENSHELL_CARGO_VERSION}") -elif [[ "${COMPONENT}" == "gateway" ]]; then - CARGO_VERSION=$(uv run python tasks/scripts/release.py get-version --cargo) - VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${CARGO_VERSION}") -fi - -LOCK_HASH=$(sha256_16 Cargo.lock) -RUST_SCOPE=${RUST_TOOLCHAIN_SCOPE:-$(detect_rust_scope "${DOCKERFILE}")} -CACHE_SCOPE_INPUT="v1|${COMPONENT}|${VARIANT:-base}|${LOCK_HASH}|${RUST_SCOPE}" -CARGO_TARGET_CACHE_SCOPE=$(printf '%s' "${CACHE_SCOPE_INPUT}" | sha256_16_stdin) - -docker buildx build \ - ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} \ - ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ - ${CACHE_ARGS[@]+"${CACHE_ARGS[@]}"} \ - ${SCCACHE_ARGS[@]+"${SCCACHE_ARGS[@]}"} \ - ${VERSION_ARGS[@]+"${VERSION_ARGS[@]}"} \ - --build-arg "CARGO_TARGET_CACHE_SCOPE=${CARGO_TARGET_CACHE_SCOPE}" \ - -f "${DOCKERFILE}" \ - -t "${IMAGE_NAME}:${IMAGE_TAG}" \ - --provenance=false \ - "$@" \ - ${OUTPUT_FLAG} \ - . +case "${COMPONENT}" in + gateway) + exec tasks/scripts/docker-build-image.sh gateway "$@" + ;; + ci) + OUTPUT_ARGS=(--load) + if [[ "${DOCKER_PUSH:-}" == "1" ]]; then + OUTPUT_ARGS=(--push) + elif [[ "${DOCKER_PLATFORM:-}" == *","* ]]; then + OUTPUT_ARGS=(--push) + fi + + exec docker buildx build \ + ${DOCKER_BUILDER:+--builder ${DOCKER_BUILDER}} \ + ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ + -f deploy/docker/Dockerfile.ci \ + -t "openshell/ci:${IMAGE_TAG:-dev}" \ + --provenance=false \ + "$@" \ + ${OUTPUT_ARGS[@]+"${OUTPUT_ARGS[@]}"} \ + . + ;; + *) + echo "Error: unsupported component '${COMPONENT}'" >&2 + exit 1 + ;; +esac diff --git a/tasks/scripts/docker-build-image.sh b/tasks/scripts/docker-build-image.sh new file mode 100755 index 00000000..80f36786 --- /dev/null +++ b/tasks/scripts/docker-build-image.sh @@ -0,0 +1,160 @@ +#!/usr/bin/env bash + +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +set -euo pipefail + +sha256_16() { + if command -v sha256sum >/dev/null 2>&1; then + sha256sum "$1" | awk '{print substr($1, 1, 16)}' + else + shasum -a 256 "$1" | awk '{print substr($1, 1, 16)}' + fi +} + +sha256_16_stdin() { + if command -v sha256sum >/dev/null 2>&1; then + sha256sum | awk '{print substr($1, 1, 16)}' + else + shasum -a 256 | awk '{print substr($1, 1, 16)}' + fi +} + +detect_rust_scope() { + local dockerfile="$1" + local rust_from + rust_from=$(grep -E '^FROM --platform=\$BUILDPLATFORM rust:[^ ]+' "$dockerfile" | head -n1 | sed -E 's/^FROM --platform=\$BUILDPLATFORM rust:([^ ]+).*/\1/' || true) + if [[ -n "${rust_from}" ]]; then + echo "rust-${rust_from}" + return + fi + + if grep -q "rustup.rs" "$dockerfile"; then + echo "rustup-stable" + return + fi + + echo "no-rust" +} + +TARGET=${1:?"Usage: docker-build-image.sh [extra-args...]"} +shift + +DOCKERFILE="deploy/docker/Dockerfile.images" +if [[ ! -f "${DOCKERFILE}" ]]; then + echo "Error: Dockerfile not found: ${DOCKERFILE}" >&2 + exit 1 +fi + +IS_FINAL_IMAGE=0 +IMAGE_NAME="" +DOCKER_TARGET="" +case "${TARGET}" in + gateway) + IS_FINAL_IMAGE=1 + IMAGE_NAME="openshell/gateway" + DOCKER_TARGET="gateway" + ;; + cluster) + IS_FINAL_IMAGE=1 + IMAGE_NAME="openshell/cluster" + DOCKER_TARGET="cluster" + ;; + supervisor-builder) + DOCKER_TARGET="supervisor-builder" + ;; + *) + echo "Error: unsupported target '${TARGET}'" >&2 + exit 1 + ;; +esac + +if [[ -n "${IMAGE_REGISTRY:-}" && "${IS_FINAL_IMAGE}" == "1" ]]; then + IMAGE_NAME="${IMAGE_REGISTRY}/${IMAGE_NAME#openshell/}" +fi + +IMAGE_TAG=${IMAGE_TAG:-dev} +DOCKER_BUILD_CACHE_DIR=${DOCKER_BUILD_CACHE_DIR:-.cache/buildkit} +CACHE_PATH="${DOCKER_BUILD_CACHE_DIR}/images" +mkdir -p "${CACHE_PATH}" + +BUILDER_ARGS=() +if [[ -n "${DOCKER_BUILDER:-}" ]]; then + BUILDER_ARGS=(--builder "${DOCKER_BUILDER}") +elif [[ -z "${DOCKER_PLATFORM:-}" && -z "${CI:-}" ]]; then + _ctx=$(docker context inspect --format '{{.Name}}' 2>/dev/null || echo default) + BUILDER_ARGS=(--builder "${_ctx}") +fi + +CACHE_ARGS=() +if [[ -z "${CI:-}" ]]; then + if docker buildx inspect ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} 2>/dev/null | grep -q "Driver: docker-container"; then + CACHE_ARGS=( + --cache-from "type=local,src=${CACHE_PATH}" + --cache-to "type=local,dest=${CACHE_PATH},mode=max" + ) + fi +fi + +SCCACHE_ARGS=() +if [[ -n "${SCCACHE_MEMCACHED_ENDPOINT:-}" ]]; then + SCCACHE_ARGS=(--build-arg "SCCACHE_MEMCACHED_ENDPOINT=${SCCACHE_MEMCACHED_ENDPOINT}") +fi + +VERSION_ARGS=() +if [[ -n "${OPENSHELL_CARGO_VERSION:-}" ]]; then + VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${OPENSHELL_CARGO_VERSION}") +else + CARGO_VERSION=$(uv run python tasks/scripts/release.py get-version --cargo 2>/dev/null || true) + if [[ -n "${CARGO_VERSION}" ]]; then + VERSION_ARGS=(--build-arg "OPENSHELL_CARGO_VERSION=${CARGO_VERSION}") + fi +fi + +LOCK_HASH=$(sha256_16 Cargo.lock) +RUST_SCOPE=${RUST_TOOLCHAIN_SCOPE:-$(detect_rust_scope "${DOCKERFILE}")} +CACHE_SCOPE_INPUT="v2|shared|release|${LOCK_HASH}|${RUST_SCOPE}" +CARGO_TARGET_CACHE_SCOPE=$(printf '%s' "${CACHE_SCOPE_INPUT}" | sha256_16_stdin) + +K3S_ARGS=() +if [[ "${TARGET}" == "cluster" && -n "${K3S_VERSION:-}" ]]; then + K3S_ARGS=(--build-arg "K3S_VERSION=${K3S_VERSION}") +fi + +TAG_ARGS=() +if [[ "${IS_FINAL_IMAGE}" == "1" ]]; then + TAG_ARGS=(-t "${IMAGE_NAME}:${IMAGE_TAG}") +fi + +OUTPUT_ARGS=() +if [[ -n "${DOCKER_OUTPUT:-}" ]]; then + OUTPUT_ARGS=(--output "${DOCKER_OUTPUT}") +elif [[ "${IS_FINAL_IMAGE}" == "1" ]]; then + if [[ "${DOCKER_PUSH:-}" == "1" ]]; then + OUTPUT_ARGS=(--push) + elif [[ "${DOCKER_PLATFORM:-}" == *","* ]]; then + OUTPUT_ARGS=(--push) + else + OUTPUT_ARGS=(--load) + fi +else + echo "Error: DOCKER_OUTPUT must be set when building target '${TARGET}'" >&2 + exit 1 +fi + +docker buildx build \ + ${BUILDER_ARGS[@]+"${BUILDER_ARGS[@]}"} \ + ${DOCKER_PLATFORM:+--platform ${DOCKER_PLATFORM}} \ + ${CACHE_ARGS[@]+"${CACHE_ARGS[@]}"} \ + ${SCCACHE_ARGS[@]+"${SCCACHE_ARGS[@]}"} \ + ${VERSION_ARGS[@]+"${VERSION_ARGS[@]}"} \ + ${K3S_ARGS[@]+"${K3S_ARGS[@]}"} \ + --build-arg "CARGO_TARGET_CACHE_SCOPE=${CARGO_TARGET_CACHE_SCOPE}" \ + -f "${DOCKERFILE}" \ + --target "${DOCKER_TARGET}" \ + ${TAG_ARGS[@]+"${TAG_ARGS[@]}"} \ + --provenance=false \ + "$@" \ + ${OUTPUT_ARGS[@]+"${OUTPUT_ARGS[@]}"} \ + . diff --git a/tasks/scripts/docker-publish-multiarch.sh b/tasks/scripts/docker-publish-multiarch.sh index 7bb6dc84..6847c90f 100755 --- a/tasks/scripts/docker-publish-multiarch.sh +++ b/tasks/scripts/docker-publish-multiarch.sh @@ -3,88 +3,31 @@ # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 -# Unified multi-arch build and push for all OpenShell images. -# -# Usage: -# docker-publish-multiarch.sh --mode registry # Push to DOCKER_REGISTRY -# docker-publish-multiarch.sh --mode ecr # Push to ECR -# -# Environment: -# IMAGE_TAG - Image tag (default: dev) -# K3S_VERSION - k3s version override (optional; default in Dockerfile.cluster) - -# DOCKER_PLATFORMS - Target platforms (default: linux/amd64,linux/arm64) -# RUST_BUILD_PROFILE - Rust build profile for sandbox (default: release) -# TAG_LATEST - If true, add/update :latest tag (default: false) -# EXTRA_DOCKER_TAGS - Additional tags to add (comma or space separated) -# -# Registry mode env: -# DOCKER_REGISTRY - Registry URL (required, e.g. ghcr.io/myorg) -# -# ECR mode env: -# AWS_ACCOUNT_ID - AWS account ID (default: 012345678901) -# AWS_REGION - AWS region (default: us-west-2) set -euo pipefail -sha256_16() { - if command -v sha256sum >/dev/null 2>&1; then - sha256sum "$1" | awk '{print substr($1, 1, 16)}' - else - shasum -a 256 "$1" | awk '{print substr($1, 1, 16)}' - fi -} - -sha256_16_stdin() { - if command -v sha256sum >/dev/null 2>&1; then - sha256sum | awk '{print substr($1, 1, 16)}' - else - shasum -a 256 | awk '{print substr($1, 1, 16)}' - fi -} - -detect_rust_scope() { - local dockerfile="$1" - local rust_from - rust_from=$(grep -E '^FROM --platform=\$BUILDPLATFORM rust:[^ ]+' "$dockerfile" | head -n1 | sed -E 's/^FROM --platform=\$BUILDPLATFORM rust:([^ ]+).*/\1/' || true) - if [[ -n "${rust_from}" ]]; then - echo "rust-${rust_from}" - return - fi - - if grep -q "rustup.rs" "$dockerfile"; then - echo "rustup-stable" - return - fi - - echo "no-rust" +usage() { + echo "Usage: docker-publish-multiarch.sh --mode " >&2 + exit 1 } -# --------------------------------------------------------------------------- -# Parse arguments -# --------------------------------------------------------------------------- MODE="" while [[ $# -gt 0 ]]; do - case $1 in - --mode) MODE="$2"; shift 2 ;; - *) echo "Unknown argument: $1" >&2; exit 1 ;; + case "$1" in + --mode) + MODE="$2" + shift 2 + ;; + *) + echo "Unknown argument: $1" >&2 + usage + ;; esac done -if [[ -z "$MODE" ]]; then - echo "Usage: docker-publish-multiarch.sh --mode " >&2 - exit 1 -fi +[[ -n "${MODE}" ]] || usage -# --------------------------------------------------------------------------- -# Common variables -# --------------------------------------------------------------------------- IMAGE_TAG=${IMAGE_TAG:-dev} PLATFORMS=${DOCKER_PLATFORMS:-linux/amd64,linux/arm64} -CARGO_VERSION=${OPENSHELL_CARGO_VERSION:-} -if [[ -z "${CARGO_VERSION}" ]]; then - CARGO_VERSION=$(uv run python tasks/scripts/release.py get-version --cargo) -fi -EXTRA_BUILD_FLAGS="" TAG_LATEST=${TAG_LATEST:-false} EXTRA_DOCKER_TAGS_RAW=${EXTRA_DOCKER_TAGS:-} EXTRA_TAGS=() @@ -92,39 +35,25 @@ EXTRA_TAGS=() if [[ -n "${EXTRA_DOCKER_TAGS_RAW}" ]]; then EXTRA_DOCKER_TAGS_RAW=${EXTRA_DOCKER_TAGS_RAW//,/ } for tag in ${EXTRA_DOCKER_TAGS_RAW}; do - if [[ -n "${tag}" ]]; then - EXTRA_TAGS+=("${tag}") - fi + [[ -n "${tag}" ]] && EXTRA_TAGS+=("${tag}") done fi -# --------------------------------------------------------------------------- -# Mode-specific configuration -# --------------------------------------------------------------------------- -case "$MODE" in +case "${MODE}" in registry) REGISTRY=${DOCKER_REGISTRY:?Set DOCKER_REGISTRY to push multi-arch images (e.g. ghcr.io/myorg)} - IMAGE_PREFIX="openshell-" ;; ecr) AWS_ACCOUNT_ID=${AWS_ACCOUNT_ID:-012345678901} AWS_REGION=${AWS_REGION:-us-west-2} - ECR_HOST="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com" - REGISTRY="${ECR_HOST}/openshell" - IMAGE_PREFIX="" - EXTRA_BUILD_FLAGS="--provenance=false --sbom=false" + REGISTRY="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/openshell" ;; *) - echo "Unknown mode: $MODE (expected 'registry' or 'ecr')" >&2 - exit 1 + echo "Unknown mode: ${MODE}" >&2 + usage ;; esac -# --------------------------------------------------------------------------- -# Select or create a multi-platform buildx builder. -# If DOCKER_BUILDER is set (e.g. by CI with remote BuildKit nodes), use it. -# Otherwise fall back to creating a local "multiarch" builder. -# --------------------------------------------------------------------------- BUILDER_NAME=${DOCKER_BUILDER:-multiarch} if docker buildx inspect "${BUILDER_NAME}" >/dev/null 2>&1; then echo "Using existing buildx builder: ${BUILDER_NAME}" @@ -134,109 +63,48 @@ else docker buildx create --name "${BUILDER_NAME}" --use --bootstrap fi -# --------------------------------------------------------------------------- -# Resolve Dockerfile path for a component. -# --------------------------------------------------------------------------- -resolve_dockerfile() { - local comp="$1" - echo "deploy/docker/Dockerfile.${comp}" -} +export DOCKER_BUILDER="${BUILDER_NAME}" +export DOCKER_PLATFORM="${PLATFORMS}" +export DOCKER_PUSH=1 +export IMAGE_REGISTRY="${REGISTRY}" -# --------------------------------------------------------------------------- -# Step 1: Build and push the gateway image as a multi-arch manifest. -# Uses cross-compilation in the Dockerfile (BUILDPLATFORM != TARGETPLATFORM) -# so Rust compiles natively and only the final stage runs on the target arch. -# Sandbox images are maintained in the community repo and not built here. -# --------------------------------------------------------------------------- echo "Building multi-arch gateway image..." -LOCK_HASH=$(sha256_16 Cargo.lock) -GATEWAY_DOCKERFILE=$(resolve_dockerfile "gateway") -BUILD_ARGS="--build-arg OPENSHELL_CARGO_VERSION=${CARGO_VERSION}" -if [ -n "${SCCACHE_MEMCACHED_ENDPOINT:-}" ]; then - BUILD_ARGS="${BUILD_ARGS} --build-arg SCCACHE_MEMCACHED_ENDPOINT=${SCCACHE_MEMCACHED_ENDPOINT}" -fi -RUST_SCOPE=${RUST_TOOLCHAIN_SCOPE:-$(detect_rust_scope "${GATEWAY_DOCKERFILE}")} -CACHE_SCOPE_INPUT="v1|gateway|base|${LOCK_HASH}|${RUST_SCOPE}" -CARGO_TARGET_CACHE_SCOPE=$(printf '%s' "${CACHE_SCOPE_INPUT}" | sha256_16_stdin) -BUILD_ARGS="${BUILD_ARGS} --build-arg CARGO_TARGET_CACHE_SCOPE=${CARGO_TARGET_CACHE_SCOPE}" -FULL_IMAGE="${REGISTRY}/${IMAGE_PREFIX}gateway" -docker buildx build \ - --platform "${PLATFORMS}" \ - -f "${GATEWAY_DOCKERFILE}" \ - -t "${FULL_IMAGE}:${IMAGE_TAG}" \ - ${EXTRA_BUILD_FLAGS} \ - ${BUILD_ARGS} \ - --push \ - . - -# --------------------------------------------------------------------------- -# Step 2: Package helm charts (architecture-independent) -# --------------------------------------------------------------------------- +tasks/scripts/docker-build-image.sh gateway + mkdir -p deploy/docker/.build/charts echo "Packaging helm chart..." helm package deploy/helm/openshell -d deploy/docker/.build/charts/ -# --------------------------------------------------------------------------- -# Step 3: Build and push multi-arch cluster image. -# The cluster image includes the supervisor binary (built from Rust source) -# and k3s. Gateway images are pulled at runtime from the registry; sandbox -# images are pulled from the community registry. -# --------------------------------------------------------------------------- -echo "" +echo echo "Building multi-arch cluster image..." -CLUSTER_DOCKERFILE="deploy/docker/Dockerfile.cluster" -CLUSTER_RUST_SCOPE=${RUST_TOOLCHAIN_SCOPE:-$(detect_rust_scope "${CLUSTER_DOCKERFILE}")} -CLUSTER_CACHE_SCOPE_INPUT="v1|cluster|base|${LOCK_HASH}|${CLUSTER_RUST_SCOPE}" -CLUSTER_CARGO_SCOPE=$(printf '%s' "${CLUSTER_CACHE_SCOPE_INPUT}" | sha256_16_stdin) -CLUSTER_BUILD_ARGS="" -if [ -n "${SCCACHE_MEMCACHED_ENDPOINT:-}" ]; then - CLUSTER_BUILD_ARGS="--build-arg SCCACHE_MEMCACHED_ENDPOINT=${SCCACHE_MEMCACHED_ENDPOINT}" -fi -CLUSTER_IMAGE="${REGISTRY}/${IMAGE_PREFIX:+${IMAGE_PREFIX}}cluster" -docker buildx build \ - --platform "${PLATFORMS}" \ - -f "${CLUSTER_DOCKERFILE}" \ - -t "${CLUSTER_IMAGE}:${IMAGE_TAG}" \ - ${K3S_VERSION:+--build-arg K3S_VERSION=${K3S_VERSION}} \ - --build-arg "CARGO_TARGET_CACHE_SCOPE=${CLUSTER_CARGO_SCOPE}" \ - ${CLUSTER_BUILD_ARGS} \ - ${EXTRA_BUILD_FLAGS} \ - --push \ - . - -# --------------------------------------------------------------------------- -# Step 4: Apply additional tags by copying manifests. -# Use --prefer-index=false to carbon-copy the source manifest format instead of -# wrapping it in an OCI image index (which the registry v3 proxy can't serve). -# --------------------------------------------------------------------------- +tasks/scripts/docker-build-image.sh cluster + TAGS_TO_APPLY=("${EXTRA_TAGS[@]}") -if [ "$TAG_LATEST" = true ]; then +if [[ "${TAG_LATEST}" == "true" ]]; then TAGS_TO_APPLY+=("latest") fi -if [ ${#TAGS_TO_APPLY[@]} -gt 0 ]; then +if [[ ${#TAGS_TO_APPLY[@]} -gt 0 ]]; then for component in gateway cluster; do - FULL_IMAGE="${REGISTRY}/${IMAGE_PREFIX:+${IMAGE_PREFIX}}${component}" + full_image="${REGISTRY}/${component}" for tag in "${TAGS_TO_APPLY[@]}"; do - if [ "${tag}" = "${IMAGE_TAG}" ]; then - continue - fi - echo "Tagging ${FULL_IMAGE}:${tag}..." + [[ "${tag}" == "${IMAGE_TAG}" ]] && continue + echo "Tagging ${full_image}:${tag}..." docker buildx imagetools create \ --prefer-index=false \ - -t "${FULL_IMAGE}:${tag}" \ - "${FULL_IMAGE}:${IMAGE_TAG}" + -t "${full_image}:${tag}" \ + "${full_image}:${IMAGE_TAG}" done done fi -echo "" +echo echo "Done! Multi-arch images pushed to ${REGISTRY}:" -echo " ${REGISTRY}/${IMAGE_PREFIX}gateway:${IMAGE_TAG}" -echo " ${REGISTRY}/${IMAGE_PREFIX:+${IMAGE_PREFIX}}cluster:${IMAGE_TAG}" -if [ "$TAG_LATEST" = true ]; then +echo " ${REGISTRY}/gateway:${IMAGE_TAG}" +echo " ${REGISTRY}/cluster:${IMAGE_TAG}" +if [[ "${TAG_LATEST}" == "true" ]]; then echo " (all also tagged :latest)" fi -if [ ${#EXTRA_TAGS[@]} -gt 0 ]; then +if [[ ${#EXTRA_TAGS[@]} -gt 0 ]]; then echo " (all also tagged: ${EXTRA_TAGS[*]})" fi