An Introduction to Docker

Learn how to use Docker and Docker Compose to containerize your projects, with a focus on Machine Learning workflows.

What is Docker?

Docker is a platform that lets you package, distribute, and run applications inside lightweight, portable containers. Think of a container as a sealed box that carries your app and every single dependency it needs (libraries, runtime, environment variables) so it runs identically no matter where it’s opened.

The key difference between a Docker container and a virtual machine is that containers share the host OS kernel, making them dramatically lighter and faster to spin up. A VM boots an entire OS; a container starts in milliseconds.

This matters especially in Machine Learning: a model trained on Python 3.10 + CUDA 11.8 + a pinned version of PyTorch will behave exactly the same on your laptop, a colleague’s workstation, or a cloud GPU instance, no more “it works on my machine.”


Core concepts

Before running any commands, it helps to understand three building blocks:

ConceptDescription
ImageA read-only blueprint (filesystem snapshot + metadata). Built once, run many times.
ContainerA running (or stopped) instance of an image. Isolated, ephemeral by default.
RegistryA remote store for images. Docker Hub is the public default and you can also self-host or use cloud registries.

Running your first container

Assuming Docker is alreadys:

# Pull and run the official hello-world image
docker run hello-world

A more practical example, Nginx:

# Run Nginx in the background (-d), mapping host port 8080 to container port 80
docker run -d -p 8080:80 --name my-site nginx

Open http://localhost:8080 and you’ll see the Nginx welcome page. Now explore a few essential commands:

# List running containers
docker ps

# List ALL containers (including stopped)
docker ps -a

# Stream live logs
docker logs -f my-site

# Open an interactive shell inside the container
docker exec -it my-site bash

# Stop and remove the container
docker stop my-site && docker rm my-site

Writing a Dockerfile

A Dockerfile is the recipe for building a custom image. Every line creates a new layer Docker caches layers intelligently, so only changed layers are rebuilt.

Here’s a minimal example for a Python ML project:

# --- Stage: base image ---
FROM python:3.11-slim

# Set a working directory inside the container
WORKDIR /app

# Copy dependency list first (maximizes cache reuse)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the source code
COPY . .

# Expose a port (documentation only — actual binding happens at runtime)
EXPOSE 8000

# Default command when the container starts
CMD ["python", "train.py"]

Tip: Put COPY requirements.txt and RUN pip install before COPY . . so that Docker reuses the cached layer on every rebuild unless dependencies change.

Build and run it:

# Build the image and tag it
docker build -t ml-trainer:latest .

# Run interactively, mounting a local data folder into the container
docker run -it --rm \
  -v $(pwd)/data:/app/data \
  ml-trainer:latest

The -v flag mounts a volume: your local ./data folder is visible inside the container at /app/data. This is the standard way to feed datasets into a container without rebuilding the image.


GPU support for ML workloads

If you need CUDA inside the container, use NVIDIA’s official base images:

FROM nvidia/cuda:12.4.1-cudnn9-runtime-ubuntu22.04

WORKDIR /app
RUN apt-get update && apt-get install -y python3 python3-pip && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python3", "train.py"]

And pass --gpus all at runtime to expose the host GPUs.

Important: To use the --gpus flag, your host machine must have the NVIDIA Container Toolkit installed. This toolkit bridges the gap between Docker and your host’s NVIDIA drivers.

docker run --gpus all -it --rm \
  -v $(pwd)/data:/app/data \
  ml-trainer:cuda

This means your entire training environment — Python version, CUDA version, framework versions — is reproducible and shareable as a single image tag.


Docker Compose

Real projects rarely consist of a single service. A typical ML system might have:

  • A training service that reads data and writes model artifacts
  • A serving API (e.g., FastAPI) that loads the model and exposes predictions
  • A database (PostgreSQL, Redis) for experiment metadata or caching

And.. More!

Docker Compose lets you define and orchestrate all of these in one docker-compose.yml file.

Example: ML training + serving stack

# docker-compose.yml
services:

  # ── Training job ────────────────────────────────────────
  trainer:
    build:
      context: .
      dockerfile: Dockerfile.train
    volumes:
      - ./data:/app/data          # mount dataset
      - ./artifacts:/app/artifacts  # persist model outputs
    environment:
      EPOCHS: "50"
      LEARNING_RATE: "0.001"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  # ── Prediction API ───────────────────────────────────────
  api:
    build:
      context: .
      dockerfile: Dockerfile.serve
    ports:
      - "8000:8000"
    volumes:
      - ./artifacts:/app/artifacts  # read trained model
    depends_on:
      - redis
    environment:
      REDIS_URL: redis://redis:6379

  # ── Cache / message broker ───────────────────────────────
  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  redis_data:

The depends_on key ensures api only starts after redis is up. Services communicate over a shared internal network created by Compose, notice how api refers to redis by its service name, not by an IP address.

Essential Compose commands

# Build images and start all services in the background
docker compose up -d --build

# Follow logs from a specific service
docker compose logs -f api

# Scale a service to multiple replicas
docker compose up -d --scale api=3

# Run a one-off command in a service (e.g., run evaluation script)
docker compose run --rm trainer python evaluate.py

# Stop and remove containers (volumes are kept by default)
docker compose down

# Stop AND remove named volumes (careful: deletes persisted data)
docker compose down -v

Essential commands reference

CommandDescription
docker psList running containers
docker ps -aList all containers (including stopped)
docker imagesList locally available images
docker stop <id>Gracefully stop a container
docker rm <id>Remove a stopped container
docker rmi <id>Remove a local image
docker exec -it <id> bashOpen an interactive shell inside a container
docker logs -f <id>Stream container logs
docker volume lsList all volumes
docker system pruneRemove all unused containers, images, and networks

Why this matters for Machine Learning

Containerizing ML projects solves some of the field’s most common pain points:

  • Reproducibility :Pin exact library versions; share the image, not a list of pip install commands.
  • Onboarding : A new team member runs docker compose up and has a full working environment in minutes.
  • Experiment isolation : Run multiple experiments with different hyperparameter sets simultaneously, each in its own container, without dependency conflicts.
  • Deployment parity : The same image you trained with is the one you deploy. No surprises from environment mismatches between development and production.

This is the end, for now. It’s necessary that the first blog post is about this topic, because it will be used as a basis for future blog posts.