Kubernetes Networking: How Services Find Each Other
Service discovery, DNS debugging, and the differences between ClusterIP, NodePort, and LoadBalancer — practical examples you can use right away.
Jean-Pierre Broeders
Freelance DevOps Engineer
Kubernetes Networking: How Services Find Each Other
Deploying Pods is step one. But once service A needs to talk to service B, things get interesting. Kubernetes has its own networking model and it differs quite a bit from what most developers are used to on traditional servers.
The networking model in brief
Every Pod gets its own IP address. No NAT, no port mapping between Pods. They can reach each other as if they're on the same flat network. Sounds simple, but under the hood there's quite a bit of magic happening via CNI plugins like Calico, Flannel, or Cilium.
The problem: Pod IPs are ephemeral. Every restart assigns a new address. Connecting directly to IP addresses isn't viable. That's where Services come in.
ClusterIP, NodePort, and LoadBalancer
Three Service types, each for a different scenario.
| Type | Reachable from | Typical use case |
|---|---|---|
| ClusterIP | Inside the cluster only | Internal service-to-service communication |
| NodePort | Any node on a fixed port (30000-32767) | Quick & dirty external access, dev/test |
| LoadBalancer | Externally via cloud load balancer | Production traffic from outside |
ClusterIP is by far the most common. A backend API that's only accessed by the frontend container? ClusterIP. No reason to expose it externally.
apiVersion: v1
kind: Service
metadata:
name: user-api
namespace: production
spec:
type: ClusterIP
selector:
app: user-api
ports:
- port: 80
targetPort: 8080
protocol: TCP
Other Pods reach this service via user-api.production.svc.cluster.local or simply user-api when running in the same namespace.
NodePort opens a port on every node. Useful for testing, but in production this is almost never what you want. The limited port range and the need to know node IPs makes it impractical.
LoadBalancer only works in cloud environments (or with MetalLB on-prem). Each Service gets an external load balancer provisioned. With dozens of services, that gets expensive fast — one LoadBalancer per service is rarely the right call.
Ingress: the better solution for HTTP traffic
Instead of ten LoadBalancers, use a single Ingress Controller that routes HTTP traffic based on hostname or path.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-api
port:
number: 80
- path: /orders
pathType: Prefix
backend:
service:
name: order-api
port:
number: 80
One external IP, multiple services. TLS termination at the Ingress level. Saves a bunch on LoadBalancer costs and keeps management straightforward.
DNS debugging: when services can't find each other
Sooner or later, something goes wrong with DNS inside the cluster. The standard debugging approach:
# Start a debug pod
kubectl run dns-debug --image=busybox:1.36 --rm -it -- sh
# From inside the pod:
nslookup user-api.production.svc.cluster.local
nslookup kubernetes.default
cat /etc/resolv.conf
That resolv.conf file is crucial. It lists which DNS server (CoreDNS) is being used and which search domains are active. A common mistake: CoreDNS pods aren't running or are stuck in CrashLoopBackOff.
# Check CoreDNS status
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50
Nine times out of ten it's one of these three:
- CoreDNS pods have crashed (check the logs)
- A NetworkPolicy is blocking DNS traffic on port 53
- The service name is misspelled (yes, really)
Headless Services for StatefulSets
Sometimes individual Pods need to be addressable rather than load-balanced. Think databases or message brokers where each instance has its own identity.
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
clusterIP: None # This makes it headless
selector:
app: postgres
ports:
- port: 5432
With clusterIP: None, Kubernetes doesn't create a virtual IP. Instead, a DNS lookup returns all Pod IPs directly. Combined with a StatefulSet, individual Pods become reachable via postgres-0.postgres.default.svc.cluster.local, postgres-1.postgres..., and so on.
NetworkPolicies: restricting traffic
By default, every Pod can communicate with every other Pod. In production, that's a risk. NetworkPolicies act as a firewall within the cluster.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: user-api-policy
namespace: production
spec:
podSelector:
matchLabels:
app: user-api
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Only Pods with label app: frontend can reach the user-api on port 8080. Everything else gets blocked. Note: not every CNI plugin supports NetworkPolicies. Flannel doesn't by default, Calico and Cilium do.
Common mistakes
Port confusion. A Service has port (what the Service listens on) and targetPort (the container's port). They don't need to match, but if they're wrong, traffic simply never reaches the Pod. No error message, just timeouts.
Forgetting the namespace. Services are namespace-scoped. Reaching user-api from a different namespace requires user-api.production or the full FQDN. Without the namespace qualifier, DNS only searches within the calling Pod's own namespace.
Selector mismatch. The Service's selector must exactly match the labels on the Pods. One typo and the Service has zero endpoints. Quick check:
kubectl get endpoints user-api -n production
Empty endpoints? The selector doesn't match or no Pods are running with those labels.
Wrapping up
Kubernetes networking isn't complicated once the mental model clicks: Pods have IPs, Services provide stable addresses, DNS glues it together. Most issues come down to configuration mistakes — wrong ports, missing labels, blocked DNS traffic. With the debugging steps above, figuring out what went wrong usually takes minutes, not hours.
