Kubernetes Deployments That Actually Work in Production

Practical patterns for Kubernetes deployments: rolling updates, health checks, resource limits, and the pitfalls teams keep falling into.

Jean-Pierre Broeders

Freelance DevOps Engineer

February 23, 20265 min. read

Kubernetes Deployments That Actually Work in Production

Most Kubernetes tutorials end at kubectl apply -f deployment.yaml and call it a day. But there's a massive gap between a working demo and a stable production environment. Here are the patterns that bridge that gap.

Rolling updates without downtime

A standard Deployment already does rolling updates. But the defaults are too aggressive for most workloads. This configuration better matches reality:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: registry.example.com/api:v2.4.1
          ports:
            - containerPort: 8080

maxUnavailable: 0 is the key. This guarantees at least the desired number of pods are always running. The rollout takes slightly longer? Sure. But users won't notice a thing.

Health checks: not optional

Without health checks, Kubernetes has no idea whether a container is actually ready to receive traffic. The result: requests hitting an application that hasn't finished starting. Two checks are essential:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 2

The readinessProbe controls when a pod receives traffic. The livenessProbe restarts pods that are stuck. A common mistake: using the same endpoint for both. Works fine until a pod responds slowly to the liveness check and gets unnecessarily restarted — right in the middle of a traffic spike.

Keep /healthz simple: process is running, done. Make /ready stricter: database connection works, caches are warm, dependencies are reachable.

Setting resource limits

No resource limits is asking for trouble. One container with a memory leak takes down the entire node.

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 512Mi

The requests determine scheduling — how much space Kubernetes reserves. The limits are the ceiling. A solid approach: start with generous limits, monitor actual usage for a week using Prometheus or kubectl top pods, then adjust.

Watch out with CPU limits. There's a growing camp that says: don't set CPU limits, only requests. The reasoning is that CPU throttling causes more damage than a pod that briefly spikes. Memory limits, though — always needed. An OOMKill is better than a node that completely locks up.

Graceful shutdown

Kubernetes sends SIGTERM to the container and then waits terminationGracePeriodSeconds (default: 30 seconds) before sending SIGKILL. The application needs to catch that SIGTERM signal and finish in-flight requests.

spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: api
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]

That 5-second preStop sleep gives the load balancer time to remove the pod from rotation. Without it, requests still arrive while the pod is already shutting down.

Pod Disruption Budgets

During cluster maintenance (node upgrades, autoscaling), Kubernetes evicts pods. A PDB prevents all pods from disappearing at once:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api-server

With 3 replicas and minAvailable: 2, Kubernetes can only evict one pod at a time during maintenance. Simple, but it prevents unexpected downtime during cluster operations.

What teams forget

A few things that regularly get overlooked:

Image pull policy. Always use specific tags, never :latest in production. And set imagePullPolicy: IfNotPresent so nodes don't re-pull the image on every pod start.

Anti-affinity rules. Without pod anti-affinity, all replicas can end up on the same node. That node goes down, everything's gone.

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - api-server
          topologyKey: kubernetes.io/hostname

Namespace isolation. Run production and staging in separate namespaces with ResourceQuotas. A staging deployment that goes haywire should never consume production resources.

Wrapping up

Kubernetes deployment configuration isn't a one-time thing. It's a cycle of deploying, monitoring, adjusting. The configuration above is a starting point — not the finish line. Measure what the application actually needs and adapt based on real data. That delivers more value than any best practices checklist ever will.

Want to stay updated?

Subscribe to my newsletter or get in touch for freelance projects.

Get in Touch