Resource Limits and Performance Tuning for Docker Compose in Production
Setting CPU and memory limits, configuring logging drivers, and monitoring container performance in a Docker Compose production environment.
Jean-Pierre Broeders
Freelance DevOps Engineer
Resource Limits and Performance Tuning for Docker Compose in Production
Deploying a Docker Compose stack without resource limits is like driving a car without a speedometer. It works — until it doesn't. And by then, it's usually too late.
Why resource limits aren't optional
Without explicit limits, a single container can consume all available memory. The Linux OOM killer steps in, but it doesn't distinguish between your database and your logging sidecar. The result: random containers getting killed, often exactly the wrong ones.
The deploy.resources section in Compose fixes this:
services:
api:
image: myapp/api:latest
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.25'
memory: 128M
There's a subtle but important difference between limits and reservations. Limits are hard caps: the container cannot exceed them. Reservations are soft guarantees: Docker ensures those resources are available at startup. On a server with 4GB RAM running five services, the sum of all reservations should stay under 4GB, while limits can go higher to accommodate peak loads.
CPU limits: shares vs quota
CPU limiting has more nuance than it appears. Docker uses two mechanisms: CPU shares and CPU quota.
services:
worker:
image: myapp/worker:latest
deploy:
resources:
limits:
cpus: '0.5'
cpu_shares: 512
The cpus: '0.5' setting restricts the container to a maximum of 50% of one core. Always, regardless of how busy the server is. CPU shares work differently — they're a relative weight. A container with 512 shares gets half the CPU compared to one with 1024 shares, but only when there's actual contention. On an idle server, it makes no difference.
For most production setups, a combination works best. Hard CPU limits on CPU-intensive workers, shares for services that occasionally spike but don't consistently need much CPU.
Memory: the silent killer
Memory issues are trickier than CPU problems. A Node.js API with a slow leak, a Java service with an oversized heap, a Redis instance without maxmemory — it creeps in gradually.
A few practical settings that help:
services:
redis:
image: redis:7-alpine
command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
deploy:
resources:
limits:
memory: 300M
api:
image: myapp/api:latest
environment:
- NODE_OPTIONS=--max-old-space-size=384
deploy:
resources:
limits:
memory: 512M
The trick is setting the application limit slightly lower than the container limit. Redis gets 256MB as its own limit, but the container allows 300MB. That margin covers overhead from the Redis process itself. Without it, the container crashes before Redis can clean up gracefully.
Logging: the forgotten resource hog
By default, Docker sends all stdout/stderr to JSON files on disk. Without configuration, those grow indefinitely. On a busy API with debug logging enabled, producing gigabytes per day is entirely realistic.
services:
api:
image: myapp/api:latest
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
Three files of 10MB max per container. Keeps the disk clean without logs disappearing entirely. For serious monitoring, an external logging driver works better:
services:
api:
logging:
driver: syslog
options:
syslog-address: "tcp://logserver:514"
tag: "api"
Or the appropriate driver for a Loki or Elasticsearch stack. The point: think about logging before the disk fills up.
Health checks that actually matter
A health check that only verifies whether the container is running is useless. The container can run perfectly fine while the application inside is completely deadlocked.
services:
api:
image: myapp/api:latest
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
That /health endpoint should do more than return 200 OK. Check the database connection. Check if Redis is reachable. Check if the event queue isn't backing up. A health endpoint that genuinely reflects service health makes the difference between a problem that auto-recovers and one that triggers a 3 AM phone call.
Performance monitoring without extra tools
Docker itself provides more insight than most teams realize:
# Real-time resource usage per container
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
# Detailed resource usage inspection
docker inspect --format='{{.HostConfig.Memory}}' container_name
For something more structured, a simple script that logs stats every minute:
#!/bin/bash
while true; do
docker stats --no-stream --format \
"{{.Name}},{{.CPUPerc}},{{.MemPerc}},{{.MemUsage}}" \
>> /var/log/docker-stats.csv
sleep 60
done
Not flashy, but effective. A CSV loadable in Grafana or even a spreadsheet gives a clear picture of actual resource usage after a few days. That data is essential for tuning limits properly — limits too tight cause OOM kills, limits too loose waste resources.
The right restart policy
Closely related to resource management is the restart policy. A container that gets OOM-killed and immediately restarts with restart: always can enter a crash loop that takes down the entire server.
services:
api:
restart: unless-stopped
deploy:
resources:
limits:
memory: 512M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
The delay and max_attempts prevent a failing container from restarting endlessly. After three failed attempts within two minutes, Docker stops restarting it. That gives room to investigate the problem instead of masking it.
Wrapping up
Setting resource limits costs maybe an hour during initial deployment. That hour pays for itself the first time a memory leak doesn't take down your entire server. Start with conservative limits, monitor actual usage, and adjust based on data. Not the other way around.
