Skip to content

fix(cluster): detect stale sandbox base image in local cluster #335

@johntmyers

Description

@johntmyers

Bug Report

Agent Diagnostic

During development of #268 (bypass detection), the sandbox base image was updated in the openshell-community repo to include iptables (OpenShell-Community#36). Despite the upstream image being rebuilt and pushed to ghcr.io, the local k3s cluster continued using the cached copy without iptables — causing bypass detection to silently degrade.

The only fix was manually evicting the image: openshell doctor exec -- crictl rmi ghcr.io/nvidia/openshell-community/sandboxes/base:latest

Expected Behavior

The local cluster should detect when base:latest (or any :latest-tagged sandbox image) has a newer digest available upstream, and either:

  1. Automatically re-pull on sandbox creation (preferred for :latest tags)
  2. Warn the user that the cached image is stale

Current Behavior

k3s caches the image after the first pull and never checks the registry again for :latest tag updates. The imagePullPolicy for sandbox pods is not set to Always, so kubelet uses the cached copy.

Possible Fix

Set imagePullPolicy: Always on sandbox pod specs when the image tag is latest (or untagged). This matches standard Kubernetes convention and ensures :latest always gets a fresh pull. For pinned tags (e.g., :v1.2.3), the current IfNotPresent behavior is correct.

The relevant code is in crates/openshell-server/src/sandbox/mod.rs where the pod spec is constructed.

Reproduction

  1. Deploy a local cluster
  2. Create a sandbox (pulls base:latest)
  3. Update the base image upstream (e.g., add a package)
  4. Create another sandbox — still uses the old cached image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions