feat(stack): pull-through registry cache on by default#500
Open
bussyjd wants to merge 1 commit into
Open
Conversation
Contributor
Can't we just say 'if not present'? instead of |
On a clean install every `obol stack up` made the k3d node pull every image directly from ghcr.io / docker.io. On the v1337 demo on spark1 this cost ~10 min waiting for LiteLLM alone, and `obol stack down && obol stack up` re-paid the same cost because the next k3d node also pulled fresh. The pull-through cache containers (docker.io, ghcr.io, quay.io) were already implemented for OBOL_DEVELOPMENT=true. This commit promotes them to the default for all users so the second `obol stack up` on the same host completes the LiteLLM rollout in <2 min vs ~10 min on a cold host. Changes: - Three pull-through caches (ports 54100-54102) are now started for all users on every `obol stack up`, regardless of OBOL_DEVELOPMENT. - The local push target (localhost:54103) stays gated behind OBOL_DEVELOPMENT=true — it is only needed for `just dev-frontend` hot-swap and adds no value for regular installs. - New `--no-registry-cache` flag on `obol stack up` (env: OBOL_DISABLE_REGISTRY_CACHE=true) for hosts behind a corporate proxy with their own caching, or with tight disk constraints. - `reclaimLeakedDevK3dNetworks` (called on `obol stack purge`) now runs for all users, not just dev mode, since the mirror containers are created for everyone and hold Docker networks open after cluster delete. - CLAUDE.md "Dev Registry Cache" section renamed to "Registry Cache" and split into "Pull-through caches (default for all installs)" and "Local push target (OBOL_DEVELOPMENT only)" sub-sections. - Tests: golden snapshots for pull-through-only and dev-mode registries.yaml; OBOL_DISABLE_REGISTRY_CACHE early-exit test; mirror invariant tests (count, remoteURL presence/absence).
5b39484 to
3fc09ac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
obol stack up, not just whenOBOL_DEVELOPMENT=true.localhost:54103) stays gated behindOBOL_DEVELOPMENT=true— it exists only forjust dev-frontendhot-swap.--no-registry-cacheflag onobol stack up(also readable viaOBOL_DISABLE_REGISTRY_CACHE=true) for hosts behind a corporate proxy or with tight disk constraints.Why
On a clean install every
obol stack upmakes the k3d node pull every image fresh from ghcr.io / docker.io. On the v1337 demo on spark1 this cost ~10 min waiting for LiteLLM alone, andobol stack down && obol stack upre-pays the same cost because the next k3d node also pulls fresh.The pull-through cache containers were already built and working for dev mode. Promoting them to the default means the second
obol stack upon the same host completes the LiteLLM rollout in <2 min vs ~10 min today. Disk footprint: ~0–2 GB per cache container, only what has been pulled.What's changed
internal/stack/dev_registry.go— SplitdevRegistryMirrorsintopullThroughMirrors(3 caches, always on) andlocalPushMirror(dev-only). NewensureRegistryCaches(cfg, u, devMode)function; legacyensureDevRegistrieskept as a thin wrapper.devRegistrySetup→registrySetup(type alias for back-compat).renderRegistriesConfignow takes a mirror slice instead of hardcoding all-dev.internal/stack/backend_k3d.go—ensureRegistryCachescalled unconditionally (unlessOBOL_DISABLE_REGISTRY_CACHE=true);devModeflag controls whether localhost:54103 is included.internal/stack/stack.go—reclaimLeakedDevK3dNetworksnow runs for all users (not just dev mode) since the mirror containers are created for everyone and hold Docker networks open after cluster delete.cmd/obol/main.go— New--no-registry-cacheflag onobol stack up.CLAUDE.md— "Dev Registry Cache" section renamed to "Registry Cache", split into pull-through (all users) and local push target (dev-only) subsections, with opt-out callout.Test plan
go build ./...— cleango test ./internal/stack/ ./cmd/obol/ -count=1— all pass (both packages)registries.yaml;OBOL_DISABLE_REGISTRY_CACHEearly-exit test; mirror invariant tests (count, remoteURL presence/absence)obol stack down && obol stack up—docker ps | grep k3d-obolshows 3 cache containers; secondupis faster (layers cached)Risks
--no-registry-cacheopt-out.reclaimLeakedDevK3dNetworksguards against now applies to all users; the function is updated to run for all users (not just dev mode) onobol stack purge.Out of scope
localhost:54103local push target stays gated behindOBOL_DEVELOPMENT=true.obol stack downbehaviour — cache containers intentionally persist across down/up cycles.