Reorganzie dependency installation for better squashing #1523
+64
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'll leave it up to y'all to decide if the changes/risks here are worth the reduction in image size. Thanks!
Reduced image size
Note: Image size includes all layers; filesystem size is the actual disk usage inside the container.
Added --no-cache to uv pip install (Safe)
Cache is only useful for repeated installs in the same environment. In Docker builds, each layer is fresh, so cache provides no benefit.
Removed Intel MKL numpy (Less sure)
Removed the Intel MKL numpy install from Intel's Anaconda channel. Intel's channel only has numpy 1.26.4 (numpy 1.x), but the base image has numpy 2.0.2. Installing Intel's numpy would downgrade and break packages compiled against numpy 2.x ABI.
The base image's numpy 2.0.2 uses OpenBLAS optimizations and is compatible with all installed packages.
Removed preprocessing package (Less sure)
Package is unmaintained (last release 2017) and requires nltk==3.2.4 which is incompatible with Python 3.11 (inspect.formatargspec was removed). Package hasn't been updated in 7+ years and cannot function on Python 3.11.
Updated scikit-learn to 1.5.2 (Less sure)
Changed from scikit-learn==1.2.2 to scikit-learn==1.5.2. scikit-learn 1.2.2 binary wheels are incompatible with numpy 2.x ABI, causing "numpy.dtype size changed" errors. scikit-learn 1.5.x maintains API compatibility with 1.2.x. The original pin was for eli5/learntools compatibility, which should work with 1.5.x.
Added uv cache cleanup to clean-layer.sh (safe)
Added /root/.cache/uv/* to the cleanup script. The script only cleaned pip cache, not uv cache. Cache cleanup scripts are run after package installs; cache is not needed at runtime.