Health Checks and Zero-Downtime Deployments with Docker Compose

Production environments run 24/7. Containers can crash, databases can lock up, and updates need to happen without users noticing anything. Health checks and zero-downtime deployments form the foundation for reliable systems.

Why Health Checks Are Essential

A container can be running without the application inside actually working. The database might be unreachable. The API could be throwing timeout errors. A simple docker ps only shows whether the process is active — not whether it's healthy.

Health checks detect these issues automatically. When a health check fails, the container can restart, or the load balancer routes traffic to healthy instances.

Health Check Configuration

A basic health check for a web application looks like this:

services:
  web:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

What happens?

Every 30 seconds the endpoint gets checked
The check can take a maximum of 10 seconds
After 3 failed attempts the container is marked "unhealthy"
Failed checks during the first 40 seconds are ignored (startup time)

For databases, a different approach works better:

services:
  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 20s

pg_isready checks whether PostgreSQL actually accepts connections. This is more reliable than just checking if the process is running.

Dependencies Between Services

Applications often start too quickly, before the database is ready. This causes startup crashes. The depends_on option combines well with health checks:

services:
  web:
    image: myapp:latest
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/ready"]
      interval: 15s
      timeout: 5s
      retries: 3

  postgres:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s

The web application only starts when both PostgreSQL and Redis are healthy. This prevents race conditions and unnecessary crashes.

Zero-Downtime Deployments

Updates without downtime require a rolling update strategy. Docker Compose doesn't support this by default, but there are two practical solutions.

Option 1: Blue-Green with Proxy

Run two identical stacks behind a load balancer. Update one while the other handles traffic:

services:
  web-blue:
    image: myapp:v1
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s

  web-green:
    image: myapp:v2
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s

  nginx:
    image: nginx:alpine
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "80:80"
    depends_on:
      web-blue:
        condition: service_healthy
      web-green:
        condition: service_healthy

NGINX routes traffic to healthy containers. A simple configuration:

upstream backend {
    server web-blue:8080 max_fails=3 fail_timeout=30s;
    server web-green:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    
    location / {
        proxy_pass http://backend;
        proxy_next_upstream error timeout http_500 http_502 http_503;
    }
    
    location /health {
        access_log off;
        return 200 "OK";
    }
}

During deployment, first web-green is stopped and replaced with the new version. NGINX automatically routes everything to web-blue. Once web-green is healthy, web-blue gets replaced.

Option 2: Rolling Update with Replicas

For larger deployments, Docker Swarm works better. A simple migration:

version: "3.8"

services:
  web:
    image: myapp:latest
    deploy:
      replicas: 4
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
      rollback_config:
        parallelism: 1
        delay: 10s
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s

The configuration updates one replica at a time with 10 seconds delay. The start-first option starts new containers before stopping old ones. If problems occur, Swarm automatically rolls back.

Deploy with: docker stack deploy -c docker-compose.yml myapp

Monitoring Health Checks

Health checks are worthless without monitoring. A simple solution with a health check aggregator:

services:
  healthcheck-monitor:
    image: alpine:latest
    command: |
      sh -c '
      while true; do
        echo "=== Health Check Status ==="
        docker ps --format "table {{.Names}}\t{{.Status}}"
        sleep 60
      done
      '
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro

For production this is too basic. Better solutions integrate with Prometheus or Grafana. An example with metrics export:

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8081:8080"

cAdvisor exports container metrics including health status to Prometheus. This provides real-time insights into the health of all services.

Practical Tips

Test health checks locally. A failing health check in production means downtime. Verify that the endpoint responds quickly and reliably:

docker compose up -d
docker inspect --format='{{.State.Health.Status}}' container_name

Distinguish between liveness and readiness. A liveness check detects crashes. A readiness check determines whether the container can handle traffic. Both are important:

Check Type	Purpose	Action on failure
Liveness	Has the app crashed?	Restart container
Readiness	Can the app handle traffic?	Don't send new requests

Docker Compose only supports liveness checks. For readiness checks, a load balancer is needed that checks both endpoints.

Avoid overly aggressive timeouts. A health check running every 5 seconds can overload the database. Start with slow intervals and tighten only when necessary.

Log failed health checks. This helps with debugging:

healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]

Conclusion

Health checks and zero-downtime deployments aren't a luxury. They form the foundation for reliable production environments. Start with simple health checks on critical services. Add dependencies to prevent race conditions. Implement a deployment strategy that prevents downtime.

Without these fundamentals, a production environment remains vulnerable to unexpected crashes and downtime during updates. With the right configuration, systems run stable and updates can happen without users noticing anything.

Health Checks and Zero-Downtime Deployments with Docker Compose

Why Health Checks Are Essential

Health Check Configuration

Dependencies Between Services

Zero-Downtime Deployments

Option 1: Blue-Green with Proxy

Option 2: Rolling Update with Replicas

Monitoring Health Checks

Practical Tips

Conclusion

Further reading

Related Articles

GitHub Actions Caching: 3x Faster Pipelines Without Extra Costs

Monitoring on a Budget: Cost Control Without Blind Spots

GraphQL API Security: Query Complexity and Resolver-Level Protection

Want to stay updated?