Kubernetes Deployments That Actually Work in Production
Practical patterns for Kubernetes deployments: rolling updates, health checks, resource limits, and the pitfalls teams keep falling into.
Jean-Pierre Broeders
Freelance DevOps Engineer
Kubernetes Deployments That Actually Work in Production
Most Kubernetes tutorials end at kubectl apply -f deployment.yaml and call it a day. But there's a massive gap between a working demo and a stable production environment. Here are the patterns that bridge that gap.
Rolling updates without downtime
A standard Deployment already does rolling updates. But the defaults are too aggressive for most workloads. This configuration better matches reality:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: registry.example.com/api:v2.4.1
ports:
- containerPort: 8080
maxUnavailable: 0 is the key. This guarantees at least the desired number of pods are always running. The rollout takes slightly longer? Sure. But users won't notice a thing.
Health checks: not optional
Without health checks, Kubernetes has no idea whether a container is actually ready to receive traffic. The result: requests hitting an application that hasn't finished starting. Two checks are essential:
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
The readinessProbe controls when a pod receives traffic. The livenessProbe restarts pods that are stuck. A common mistake: using the same endpoint for both. Works fine until a pod responds slowly to the liveness check and gets unnecessarily restarted — right in the middle of a traffic spike.
Keep /healthz simple: process is running, done. Make /ready stricter: database connection works, caches are warm, dependencies are reachable.
Setting resource limits
No resource limits is asking for trouble. One container with a memory leak takes down the entire node.
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
The requests determine scheduling — how much space Kubernetes reserves. The limits are the ceiling. A solid approach: start with generous limits, monitor actual usage for a week using Prometheus or kubectl top pods, then adjust.
Watch out with CPU limits. There's a growing camp that says: don't set CPU limits, only requests. The reasoning is that CPU throttling causes more damage than a pod that briefly spikes. Memory limits, though — always needed. An OOMKill is better than a node that completely locks up.
Graceful shutdown
Kubernetes sends SIGTERM to the container and then waits terminationGracePeriodSeconds (default: 30 seconds) before sending SIGKILL. The application needs to catch that SIGTERM signal and finish in-flight requests.
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
That 5-second preStop sleep gives the load balancer time to remove the pod from rotation. Without it, requests still arrive while the pod is already shutting down.
Pod Disruption Budgets
During cluster maintenance (node upgrades, autoscaling), Kubernetes evicts pods. A PDB prevents all pods from disappearing at once:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api-server
With 3 replicas and minAvailable: 2, Kubernetes can only evict one pod at a time during maintenance. Simple, but it prevents unexpected downtime during cluster operations.
What teams forget
A few things that regularly get overlooked:
Image pull policy. Always use specific tags, never :latest in production. And set imagePullPolicy: IfNotPresent so nodes don't re-pull the image on every pod start.
Anti-affinity rules. Without pod anti-affinity, all replicas can end up on the same node. That node goes down, everything's gone.
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api-server
topologyKey: kubernetes.io/hostname
Namespace isolation. Run production and staging in separate namespaces with ResourceQuotas. A staging deployment that goes haywire should never consume production resources.
Wrapping up
Kubernetes deployment configuration isn't a one-time thing. It's a cycle of deploying, monitoring, adjusting. The configuration above is a starting point — not the finish line. Measure what the application actually needs and adapt based on real data. That delivers more value than any best practices checklist ever will.
