How to Cache Python Pip Requirements for Reliable Docker Builds
You’re working on a Python project, and every time you run docker compose up --build, the RUN pip install -r requirements.txt step fails halfway because your internet connection is slower than a dial-up modem. When you retry, Docker starts from scratch, redownloading all packages again. Frustrating? Absolutely.
Why does this happen?
- Docker’s Layer Caching: If a step (like
pip install) fails, Docker invalidates the cache for that layer and everything after it. - No Persisted Pip Cache: By default, pip doesn’t save downloaded packages between builds. Every failure means starting over.
Cache Packages or Go Offline
Here’s how I solved this for good and how you can too.
Use Docker BuildKit Cache Mounts
Modern Docker builds (BuildKit) can persist pip’s cache across runs.
# syntax=docker/dockerfile:1.4
FROM python:3.11
ENV PYTHONUNBUFFERED=1
WORKDIR /app
# Copy requirements first to leverage Docker layer caching
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
COPY . .
CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}
What’s happening here?
--mount=type=cache,target=/root/.cache/pip: This tells Docker to reuse the pip cache directory across builds.- Even if the build fails, downloaded packages are retained for the next attempt.
Run it with BuildKit enabled:
DOCKER_BUILDKIT=1 docker compose up --build
Why this works:
- BuildKit’s cache mounts persist the pip cache between builds.
- Subsequent builds skip downloading packages already in the cache.
Offline Installs with Pre-Downloaded Packages
No internet? No problem. Pre-download packages and install them offline.
Download packages locally
On your host machine:
pip download -r requirements.txt -d ./pip_packages
This creates a pip_packages folder with all dependencies.
Modify the Dockerfile
FROM python:3.11
ENV PYTHONUNBUFFERED=1
WORKDIR /app
# Copy pre-downloaded packages
COPY pip_packages /pip_packages
COPY requirements.txt .
# Install from local directory (no internet!)
RUN pip install --no-index --find-links=/pip_packages -r requirements.txt
COPY . .
CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}
Key flags:
--no-index: Skip PyPI.--find-links=/pip_packages: Use the local directory for packages.
Perfect for:
- Airplane coding.
- Rural internet (or no internet).
Hybrid Caching for Best Results
Combine BuildKit caching with a local package fallback:
# syntax=docker/dockerfile:1.4
FROM python:3.11
ENV PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt .
# Try using BuildKit cache first
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt || true # Don't fail if cache is incomplete
# Fallback to local packages
COPY pip_packages /pip_packages
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-index --find-links=/pip_packages -r requirements.txt
COPY . .
CMD celery -A myapp worker -l info -Q ${CELERY_QUEUE}
Why this rocks:
- Speed of BuildKit caching + reliability of local packages.
- Retries failed downloads gracefully.
Final Thoughts
- Use BuildKit if you control the build environment. It’s seamless and fast.
- Pre-download packages for offline scenarios or flaky networks.
- Hybrid approach is gold for mission-critical builds.
Pro Tips:
- Always pin versions in
requirements.txt(e.g.,requests==2.31.0) to avoid surprises. - For teams, set up a private PyPI mirror (like
devpi) for blazing-fast, reliable builds.
By caching pip’s downloads or going fully offline, I turned my Docker builds from a hair-pulling ordeal into a smooth process. Now I can finally focus on coding not waiting for packages to download.