Skip to content

perf(docker): move version ARG below cached layers to fix cache invalidation#385

Merged
pimlock merged 2 commits intomainfrom
perf-docker-version-arg-cache/pm
Mar 17, 2026
Merged

perf(docker): move version ARG below cached layers to fix cache invalidation#385
pimlock merged 2 commits intomainfrom
perf-docker-version-arg-cache/pm

Conversation

@pimlock
Copy link
Collaborator

@pimlock pimlock commented Mar 17, 2026

Summary

Fix Docker layer cache invalidation caused by ARG OPENSHELL_CARGO_VERSION being declared near the top of each Dockerfile. Since the version includes a git commit hash (e.g. 0.0.7-dev.11+g085b131ae), it changes on every build and invalidates all downstream layers — including expensive dependency installs, toolchain setup, and Rust dependency pre-builds.

Related Issue

N/A — discovered via CI timing analysis.

Changes

  • Moved ARG OPENSHELL_CARGO_VERSION from the top of each builder stage to just before the RUN that uses it, in all 5 Dockerfiles:
    • Dockerfile.gateway
    • Dockerfile.cluster
    • Dockerfile.cli-macos
    • Dockerfile.python-wheels
    • Dockerfile.python-wheels-macos
  • Removed unused ARG OPENSHELL_IMAGE_TAG from Dockerfile.cli-macos, Dockerfile.python-wheels, and Dockerfile.python-wheels-macos

Context

2-week bisect of the build-gateway / Build gateway CI job showed two regressions:

Period Avg Duration Cause
Mar 6-7 ~2 min Baseline (before version ARG was introduced)
Mar 8-11 ~5.5 min +3.5m after 68525bb8 added ARG OPENSHELL_CARGO_VERSION at top of stage
Mar 12-16 ~8-9 min +2.5m from base image migration + Dockerfile refactors compounding the issue

Expected improvement: ~5-6 minutes recovered on gateway builds by preserving layer cache for dependency installation, toolchain setup, and the dependency pre-build step.

Testing

  • mise run pre-commit passes (two pre-existing failures unrelated to this change: port-8080-in-use integration test, missing license headers on 3 unrelated files)
  • Unit tests added/updated — N/A, Dockerfile-only change
  • E2E tests added/updated — N/A, will be validated by CI build times

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) — N/A

…idation

The OPENSHELL_CARGO_VERSION build arg contains a git commit hash that
changes on every build (e.g. 0.0.7-dev.11+g085b131ae). Declaring this
ARG near the top of each Dockerfile invalidated every layer below it --
including expensive dependency installs, toolchain setup, and the Rust
dependency pre-build step -- on every single commit.

Move the ARG declaration to just before the RUN that actually uses it so
upstream layers stay cached. This recovers ~5-6 minutes per build on the
gateway image (from ~9m back toward ~2-3m) and similarly improves cluster,
CLI, and Python wheel builds.

Also removes unused OPENSHELL_IMAGE_TAG ARG from cli-macos, python-wheels,
and python-wheels-macos Dockerfiles.
@pimlock pimlock self-assigned this Mar 17, 2026
@pimlock pimlock added the e2e label Mar 17, 2026
@pimlock pimlock requested review from drew and johntmyers and removed request for drew March 17, 2026 01:20
@pimlock pimlock merged commit 18fb7af into main Mar 17, 2026
11 of 12 checks passed
@pimlock pimlock deleted the perf-docker-version-arg-cache/pm branch March 17, 2026 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants