Skip to content

Conversation

@develra
Copy link
Contributor

@develra develra commented Jan 24, 2026

I'll leave it up to y'all to decide if the changes/risks here are worth the reduction in image size. Thanks!

Reduced image size

Metric Original New Reduction
Image Size 60.1 GB 48.2 GB 11.9 GB (20%)
Filesystem Size 49 GB 44 GB 5 GB (10%)

Note: Image size includes all layers; filesystem size is the actual disk usage inside the container.

  • Added --no-cache to uv pip install (Safe)
    Cache is only useful for repeated installs in the same environment. In Docker builds, each layer is fresh, so cache provides no benefit.

  • Removed Intel MKL numpy (Less sure)

Removed the Intel MKL numpy install from Intel's Anaconda channel. Intel's channel only has numpy 1.26.4 (numpy 1.x), but the base image has numpy 2.0.2. Installing Intel's numpy would downgrade and break packages compiled against numpy 2.x ABI.

The base image's numpy 2.0.2 uses OpenBLAS optimizations and is compatible with all installed packages.

  • Removed preprocessing package (Less sure)
    Package is unmaintained (last release 2017) and requires nltk==3.2.4 which is incompatible with Python 3.11 (inspect.formatargspec was removed). Package hasn't been updated in 7+ years and cannot function on Python 3.11.

  • Updated scikit-learn to 1.5.2 (Less sure)
    Changed from scikit-learn==1.2.2 to scikit-learn==1.5.2. scikit-learn 1.2.2 binary wheels are incompatible with numpy 2.x ABI, causing "numpy.dtype size changed" errors. scikit-learn 1.5.x maintains API compatibility with 1.2.x. The original pin was for eli5/learntools compatibility, which should work with 1.5.x.

  • Added uv cache cleanup to clean-layer.sh (safe)
    Added /root/.cache/uv/* to the cleanup script. The script only cleaned pip cache, not uv cache. Cache cleanup scripts are run after package installs; cache is not needed at runtime.

@develra develra requested review from calderjo and djherbis January 24, 2026 01:37
I'll leave it up to y'all to decide if the changes/risks here are worth
the reduction in image size. Thanks!

Reduced image size
  ┌─────────────────┬──────────┬─────────┬───────────────┐
  │     Metric      │ Original │ New     │   Reduction   │
  ├─────────────────┼──────────┼─────────┼───────────────┤
  │ Image Size      │ 60.1 GB  │ 48.2 GB │ 11.9 GB (20%) │
  ├─────────────────┼──────────┼─────────┼───────────────┤
  │ Filesystem Size │ 49 GB    │ 44 GB   │ 5 GB (10%)    │
  └─────────────────┴──────────┴─────────┴───────────────┘
  Note: Image size includes all layers; filesystem size is the actual disk usage inside the container.

  - Added --no-cache to uv pip install (Safe)

  Cache is only useful for repeated installs in the same environment. In Docker builds, each layer is fresh, so cache provides no benefit.

  - Removed Intel MKL numpy (Less sure)

  Removed the Intel MKL numpy install from Intel's Anaconda channel.
  Intel's channel only has numpy 1.26.4 (numpy 1.x), but the base image has numpy 2.0.2.
  Installing Intel's numpy would downgrade and break packages compiled against numpy 2.x ABI.

  The base image's numpy 2.0.2 uses OpenBLAS optimizations and is compatible with all installed packages.

  - Removed preprocessing package (Less sure)

  Package is unmaintained (last release 2017) and requires nltk==3.2.4 which is incompatible with Python 3.11 (inspect.formatargspec was removed).
  Package hasn't been updated in 7+ years and cannot function on Python 3.11.

  - Updated scikit-learn to 1.5.2 (Less sure)

  Changed from scikit-learn==1.2.2 to scikit-learn==1.5.2.
  scikit-learn 1.2.2 binary wheels are incompatible with numpy 2.x ABI, causing "numpy.dtype size changed" errors.
  scikit-learn 1.5.x maintains API compatibility with 1.2.x. The original pin was for eli5/learntools compatibility, which should work with 1.5.x.

  - Added uv cache cleanup to clean-layer.sh (safe)

  Added /root/.cache/uv/* to the cleanup script.
  The script only cleaned pip cache, not uv cache.
  Cache cleanup scripts are run after package installs; cache is not needed at runtime.
@develra develra force-pushed the optimize-layers-for-better-squashing branch from 9773e95 to 8abc702 Compare January 24, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant