Kubernetes Interview — Service Types, kube-proxy Modes, and DNS

Service A in namespace-a calls Service B in namespace-b via Kubernetes DNS. Connectivity works 99% of the time, but 1-2% of requests return 5xx errors. No obvious pattern. The services are ClusterIP. Where do you start debugging?

Intermittent failures suggest transient network issues or endpoint instability. First, check if the endpoints are healthy: kubectl get endpoints service-b -n namespace-b -o yaml. Are all pods listed? If the endpoint list is fluctuating (pods appearing/disappearing), connections fail when traffic hits a pod that's not ready.

Check the Service and its endpoints: kubectl get svc service-b -n namespace-b and kubectl get pods -n namespace-b -l app=service-b. Verify the label selector matches. If the selector is wrong, no endpoints populate.

Verify the endpoint IP addresses are correct: kubectl get endpoints service-b -n namespace-b -o wide shows which pod IPs are backing the service. Cross-check with kubectl get pods -n namespace-b -o wide. The IPs should match.

Check if pods are crashing or restarting. Intermittent failures often mean a pod is repeatedly cycling (CrashLoopBackOff or constant restart). Run: kubectl get pods -n namespace-b --watch and watch for restarts over time. If a pod restarts every 30 seconds, requests to that pod fail during the downtime.

Next, examine kube-proxy behavior on the nodes. Run: kubectl get nodes and then SSH to a node: ssh node-name. Check the iptables (or IPVS) rules for the service: iptables -L -n | grep <service-ip>. If using IPVS: ipvsadm -L -n to see load balancer rules.

Check logs on a pod in Service A: kubectl logs -f pod-name -n namespace-a. Look for connection errors, timeouts, or DNS failures. If the pod is doing DNS lookups each request, transient DNS failures cause intermittent errors. Check the pod's /etc/resolv.conf: kubectl exec pod-name -n namespace-a -- cat /etc/resolv.conf.

Follow-up: You find that one endpoint pod is in CrashLoopBackOff, restarting every 10 seconds. But removing this pod from the service still shows 1-2% errors (not 20-25% as you'd expect). Why?

You migrate from iptables kube-proxy mode to IPVS. Service connectivity works, but now you see sporadic connection timeouts. The same services and endpoints work fine in iptables mode. What changed?

IPVS and iptables have different connection handling. iptables creates firewall rules and relies on netfilter; IPVS is a kernel module that does load balancing via Virtual Server technology. Both should work, but IPVS can have different timeout behavior and conntrack limits.

First, verify IPVS is actually running: SSH to a node and run: ipvsadm -L -n. You should see services and their backends. If it's empty, IPVS isn't properly configured.

Check connection tracking settings: cat /proc/sys/net/netfilter/nf_conntrack_max. IPVS relies on netfilter conntrack. If the max is too low and all connection slots are used, new connections get dropped. Compare the value before and after the migration. If you haven't tuned it, the default might be too low for your workload.

IPVS has session affinity (sticky sessions) by default, whereas iptables doesn't. Check the service definition: kubectl get svc service-b -n namespace-b -o yaml | grep -i affinity. If sessionAffinity is set, connections to a specific backend might fail if that backend is down or slow. With iptables, connections can be retried; with IPVS, they might be pinned to a failed backend.

Check IPVS timeout settings: ipvsadm -L --timeout. Compare with what's needed for your workload. TCP timeout of 900 seconds is typical, but if your connections are long-lived and idle, they might time out. Increase timeout if necessary: ipvsadm --set 900 120 60 (tcp, tcpfin, udp timeouts).

Also check if the issue is specific to certain node: kubectl get nodes -o wide and see if timeouts happen only when traffic is routed through certain nodes. That suggests a node-specific IPVS configuration issue.

Follow-up: You increase conntrack_max and IPVS timeout. The timeouts stop, but now you see connection refused errors intermittently. What's happening?

You create a headless Service (clusterIP: None). When a pod queries the DNS name, it gets back 3 A records (for 3 backend pods). But the pod always connects to the first IP, and traffic isn't balanced. Why?

Headless services return multiple A records, but the client (your application) decides which IP to use. If the client always picks the first record, that's a client-side issue, not Kubernetes.

Verify DNS is returning all records: kubectl exec pod-name -- nslookup headless-service.namespace.svc.cluster.local. You should see 3 addresses listed. If only 1 is returned, the issue is DNS/endpoints, not load balancing.

Check the service's endpoints: kubectl get endpoints headless-service -n namespace -o yaml. Are all 3 pod IPs listed? If not, they're not ready or the label selector is wrong.

The real issue is usually the client library. Languages like Java often cache DNS results or use the first result for efficiency. To get true load balancing with headless services, the client needs to resolve the DNS name and randomly select from the results, or connect to different IPs on retry.

If the client library doesn't do random selection, headless services won't balance. Use a regular ClusterIP service instead, which handles load balancing in kube-proxy (round-robin by default).

Alternatively, use connection pooling or HTTP keep-alive, which makes load balancing moot—a single persistent connection is used for many requests. That's fine too; the load will be distributed across connections from different clients.

Follow-up: You switch to a regular ClusterIP service. DNS now returns a single ClusterIP. The pod connects to that IP, but traffic is still going to only 1 backend pod. Why isn't kube-proxy load-balancing?

You have a Service with sessionAffinity: ClientIP set. Requests from the same client are supposed to stick to the same pod. But affinity seems broken—a single client's requests go to different pods. What's wrong?

Session affinity uses the client IP to determine which backend pod to route to. But "client IP" in Kubernetes is tricky. If traffic is proxied through intermediate services or sidecars, kube-proxy sees the intermediate IP, not the original client.

Check if the pod is behind a proxy. Run: kubectl get pod -n namespace -o yaml | grep -i proxy. If there's a sidecar or service mesh (Istio, Linkerd), traffic is proxied, and kube-proxy sees the proxy's IP, not the client's. All requests from the same proxy look like they're from the same "client," so affinity should work. But if multiple proxies exist, they have different IPs, so affinity fails.

Verify session affinity timeout: kubectl get svc -n namespace -o yaml | grep -A5 sessionAffinity. If sessionAffinityConfig.clientIP.timeoutSeconds is very low (e.g., 10 seconds), old affinity entries are discarded, and new clients might be assigned to different pods. Increase the timeout to maintain affinity longer.

Check kube-proxy mode: ps aux | grep kube-proxy on a node. Look for the --proxy-mode flag (iptables, IPVS, or userspace). Session affinity in IPVS works via persistent connection tracking; in iptables, it's iptables rules. If the mode is userspace (very old), affinity is reliable but slow.

Test affinity with a simple curl loop: for i in {1..10}; do kubectl exec pod -- curl -I http://service/; done. Check if responses are from the same backend. If they're from different backends, affinity is broken. If they're from the same one, affinity works.

Follow-up: Session affinity is set, timeout is high, kube-proxy is in IPVS mode. But a single client IP still connects to different backend pods. Why is persistence not working?

You expose a Service as NodePort on port 30080. Port 30080 on every node should route to the service. But traffic to node1:30080 works fine, and traffic to node2:30080 is dropped (connection refused). Both nodes are in the same cluster. What's happening?

NodePort services should be reachable on all nodes. If one node works and another doesn't, there's a node-specific issue. First, verify the NodePort is actually listening on both nodes. SSH to node2 and check: netstat -tuln | grep 30080 or ss -tuln | grep 30080. Does the port show LISTEN? If not, kube-proxy hasn't set up the rule yet.

Check kube-proxy status on node2: kubectl get pod -n kube-system -o wide | grep kube-proxy. Is the kube-proxy pod running on node2? If it's Pending or CrashLoopBackOff, kube-proxy isn't active, so NodePort rules aren't created.

If kube-proxy is running, check its logs: kubectl logs -n kube-system -l k8s-app=kube-proxy -p to see if there are errors setting up the iptables/IPVS rules. Look for permission denied or other failures.

Manually verify the iptables rules on node2: iptables -L -n | grep 30080 (if using iptables mode). If no rules exist, kube-proxy didn't set them up. If rules exist, check if they're correct: they should DNAT the incoming traffic to the ClusterIP and then forward to a backend pod.

Check if node2 has externalTrafficPolicy: Local, which would cause traffic to only work if backend pods are on that specific node. Check the service: kubectl get svc -o yaml | grep -i externalTraffic. If it's Local and no pods are on node2, traffic is dropped.

Follow-up: Both nodes have kube-proxy running, iptables rules exist on both, but node2:30080 still doesn't work. You tcpdump on node2 and see the traffic arriving but being silently dropped. Where is it being dropped?

You have two services: one ClusterIP (internal only) and one LoadBalancer (external). When an external client connects to the LoadBalancer service, they reach the pods. But when a pod inside the cluster tries to connect to the LoadBalancer's external IP, it fails. Why?

This is a classic hairpin traffic problem. When internal traffic tries to reach the LoadBalancer's external IP, it gets routed outside the cluster (to the load balancer), which then tries to route it back inside. Many cloud load balancers don't support hairpin NAT, so the traffic is dropped or times out.

The solution is to use the internal ClusterIP instead of the external IP when calling from inside the cluster. Get the ClusterIP: kubectl get svc loadbalancer-service -o yaml | grep clusterIP:. Have internal pods use this IP instead of the external IP.

Alternatively, enable hairpin mode if your CNI supports it. Check: kubectl get cni or check the network plugin's documentation (Flannel, Calico, Weave, etc.). Some plugins have a hairpin setting that allows internal traffic to use the external IP.

Another workaround: set externalTrafficPolicy: Local on the LoadBalancer service. This can reduce the routing complexity: kubectl patch svc loadbalancer-service -p '{"spec":{"externalTrafficPolicy":"Local"}}'. However, this only works if pods are distributed across nodes; if all pods are on one node, external traffic to other nodes will fail.

The best practice is to have internal services use the ClusterIP and only use the external IP for actual external clients. If pods need to call the service by external IP, that's a DNS/config issue—fix it to use the internal name.

Follow-up: You've set externalTrafficPolicy: Local and pods are distributed across nodes. Internal pod still can't reach the LoadBalancer's external IP. Why?

DNS resolution for a service works: nslookup service.namespace.svc.cluster.local returns the ClusterIP. But connectivity to that IP fails (connection refused). The service exists and has endpoints. What's the disconnect?

DNS resolution and IP routing are separate layers. Just because DNS works doesn't mean traffic reaches the pods. Verify the ClusterIP is correct: kubectl get svc -o wide and compare the IP with the nslookup result. They should match.

Check if the service has endpoints: kubectl get endpoints service-name. If it's empty (no endpoints), kube-proxy has no backends to route to. The connection gets refused because there's nowhere to send it.

Verify the service's label selector is correct: kubectl get svc service-name -o yaml | grep -A5 selector. Cross-check against pod labels: kubectl get pods --show-labels. The selectors must match exactly.

Check if the pods are ready: kubectl get pods -o wide and look at the Ready column. If a pod is NotReady (0/1), it won't appear in endpoints (assuming readiness probes are configured). For a pod without readiness probes, it should appear as soon as it's created.

If endpoints are populated, the service works. The "connection refused" error then comes from the pod itself—the pod is running but not listening on the expected port. Check the pod's app config: is it listening on port 8080? Is the port name correct in the service definition?

Verify the port mapping: kubectl get svc service-name -o yaml | grep -A10 'ports:'. Check if the servicePort matches the containerPort in the pod spec.

Follow-up: Endpoints are populated, port mapping is correct, but connections are still refused. You exec into a backend pod and curl localhost:port directly—it works. So the pod is listening. Why does traffic via the service fail?

You have EndpointSlices enabled on your cluster (newer Kubernetes). Service connectivity intermittently fails in a way that's different from iptables/IPVS issues. What does EndpointSlices change?

EndpointSlices are a scalability improvement over Endpoints. Instead of a single Endpoints object with all pod IPs, Kubernetes splits them into EndpointSlices (default 100 pods per slice). This reduces API server load and improves performance at scale.

For most workloads, EndpointSlices are transparent. But if you're debugging, check them instead of Endpoints: kubectl get endpointslices service-name to see the slices. Each slice shows a subset of backend pods.

If some traffic fails intermittently, it might be because kube-proxy is still processing a slice update. During the transition (when pods are added/removed), there's a brief window where the slice is updated but kube-proxy hasn't synced yet. Traffic to newly removed pods fails; traffic to newly added pods might not be routed yet.

Check if there are many EndpointSlices being created/deleted: kubectl get endpointslices -w while scaling the deployment. If you're constantly scaling (adding/removing pods), you might see churn in the slices.

To diagnose, monitor slice updates: kubectl describe endpointslice service-name and check the metadata.generation number. If it's high, the slices are changing frequently. This shouldn't cause failures, but it adds load to the API server and kube-proxy.

If EndpointSlices are causing issues, you can disable them (temporarily, not recommended for production): kube-apiserver --disable-endpointslice=true and restart the apiserver. Then Kubernetes reverts to using single Endpoints objects. But this doesn't scale well beyond a few hundred pods.

Follow-up: You have EndpointSlices enabled. You scale from 5 to 500 pods rapidly. During scaling, some clients see 5xx errors. After scaling completes, errors stop. Why does rapid scaling cause transient failures?