Kubernetes Interview — Resource Requests, Limits, and QoS Classes

Your Java app only uses 2GB heap (verified by jcmd), but pods are getting OOMKilled with a 3GB limit. The logs show memory exhausted but no heap dump. What's actually consuming the memory?

Heap is only part of Java's memory footprint. The 3GB limit is the entire cgroup limit—heap + metaspace + off-heap structures + JVM internals + native libraries + file buffers. You set Xmx=2G but forgot about the other 1GB.

First, identify what's taking the extra memory: SSH into the node and look at the pod's cgroup: cat /sys/fs/cgroup/memory/kubepods.slice/pod-<uid>.slice/memory.limit_in_bytes (cgroup v1) or v2 paths. Then check RSS across the container: ps aux | grep java inside the pod to see VIRT vs RSS. VIRT includes all mapped memory; RSS is actual physical pages.

For Java specifically: metaspace is not heap. By default it's unlimited, so a poorly-tuned app with lots of classes can explode. Set -XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M explicitly. Also, direct ByteBuffer allocations (off-heap) don't count against Xmx—they come from the JVM's reserved memory. If you're using NIO or Netty heavily, direct buffers can be huge.

Use jcmd <pid> VM.memory_classes_detail inside the pod to break down memory usage. Or deploy jmap: jmap -heap <pid> to see heap vs non-heap. File buffers and socket buffers add up too—if the app reads large files or receives streaming data, the kernel buffers that.

The fix: increase your memory limit to account for JVM overhead. A rough formula: limit = heap + metaspace + (heap * 0.1) for buffer overhead. So if Xmx=2G, set limit to 3-3.5GB at minimum. Or reduce heap and add separate metaspace constraints.

Follow-up: You increase the limit to 4GB and the app stops getting OOMKilled. But now it's consuming the full 4GB and staying there. Is this normal?

You set requests={cpu: 100m, memory: 512Mi} and limits={cpu: 1000m, memory: 1Gi} on a pod. Kubernetes schedules it. But during a traffic spike, the pod uses 1.2GB and gets OOMKilled anyway. The node had plenty of free memory. Why?

Memory limits are hard boundaries enforced by cgroups. When your app tries to allocate more than 1GB, the kernel's OOM killer terminates the process immediately—it doesn't matter if the node has 100GB free. Limits are per-pod, not per-node.

Your pod's actual usage exceeded the limit. Verify the limit was actually 1Gi: kubectl get pod -o yaml | grep -A5 'limits'. If it shows 1Gi, then your app tried to use 1.2GB, hit the hard limit, and crashed.

This is different from Requests, which influence scheduling only. Requests = "I need at least this much"; Limits = "I can never have more than this." If you set request=512Mi, the scheduler might place the pod on a node with 600MB available. Then at runtime, the pod's limit is still 1GB but the node can't provide it—eviction happens.

The fix: either increase the limit to what the app actually needs (measure peak usage with kubectl top pod --containers over time), or fix the app to not use so much memory under load (cache tuning, connection pool limits, batch size reductions).

Alternatively, reconsider QoS. If this pod is critical, ensure it's Guaranteed class (requests == limits) so it's not evicted when the node has pressure. If it's Burstable, it can be evicted, which is acceptable for batch workloads.

Follow-up: The node has 50GB free memory. The pod is still OOMKilled at 1GB limit even though there's no other pods contending. Why is the limit being enforced so strictly?

You set identical requests and limits on two pods: both have memory={requests: 2Gi, limits: 2Gi}. Pod A survives node memory pressure; Pod B gets evicted. Why the difference?

Both pods have the same QoS class: Guaranteed (requests == limits on all resources). Guaranteed pods should never be evicted. But if one is and one isn't, check what else is different.

First, verify they both have Guaranteed class: kubectl get pod <name> -o jsonpath='{.status.qosClass}'. If both say Guaranteed but one was evicted, check the eviction order. When the node is under memory pressure, Kubernetes evicts pods by priority. Check if Pod B has lower priority: kubectl get pod -o yaml | grep -A2 'priorityClassName'. If Pod A has a higher priority class, it survives and Pod B is evicted.

Even with identical QoS and priority, look at actual memory usage at eviction time. If Pod A was using 1.5GB and Pod B was using 1.8GB when pressure hit, Pod B (higher usage) gets picked first for eviction. Kubernetes uses OOM score: pods closest to their limits are evicted first.

Verify the node was actually under memory pressure: kubectl describe node <node> | grep MemoryPressure. If MemoryPressure=True, kubelet started evicting pods. The eviction order is: BestEffort > Burstable > Guaranteed. Since both are Guaranteed, they shouldn't be evicted unless it's a node controller eviction (which prioritizes by priority and usage).

Another factor: age. Pods running longer might be evicted before newer pods (it's not in the spec, but some controllers do age-based eviction). Check pod creation timestamps: kubectl get pod -o wide.

Follow-up: Both pods have Guaranteed class, same priority, same memory usage. But one gets evicted and the other doesn't. How is the kubelet deciding?

You create three pods: Pod A (requests=1Gi, limits=1Gi), Pod B (requests=0.5Gi, limits=2Gi), Pod C (requests=0, limits=0). Categorize their QoS classes. Under memory pressure, which gets evicted first?

Pod A is Guaranteed (all resources have requests == limits == 1Gi). Pod B is Burstable (requests don't match limits; it has some guaranteed portion 0.5Gi and can burst to 2Gi). Pod C is BestEffort (no requests or limits).

Under memory pressure on the node, the eviction order is BestEffort > Burstable > Guaranteed. So Pod C is evicted first, then Pod B, then Pod A is the last to go.

Within each QoS class, Kubernetes sorts by priority. Then by actual memory usage relative to requests. For BestEffort pods (Pod C), they're all equally likely, but the one using the most memory relative to its request gets picked. Since Pod C has 0 request, any usage is "relative excess," so the one using the most RAM gets evicted.

For Burstable pods (Pod B), the OOM score calculation is: (memory.usage - memory.requests) / memory.requests. Pod B's score = (current_usage - 0.5Gi) / 0.5Gi. If it's using 1.5GB, the score is 2.0. If Guaranteed pods are using their full limits, their score is close to 0, so they're protected.

In practice: Pod C goes first because BestEffort. Pod B goes next because Burstable. Pod A survives because Guaranteed and has explicit resource guarantees.

Follow-up: You want to protect Pod C from eviction. Can you achieve this by changing its limits?

You have a namespace with a ResourceQuota: requests.memory=50Gi, limits.memory=100Gi. You've deployed 5 pods with requests=8Gi, limits=12Gi each. The 6th pod is Pending even though the quota math says there's room. Why?

Check the actual consumption: kubectl describe resourcequota -n <namespace> shows Used vs Hard limits. The quota is cumulative across all pods. 5 pods * 8Gi requests = 40Gi used. The 6th pod needs 8Gi, which would be 48Gi total—still under 50Gi, so theoretically it should fit.

But check the limits: 5 pods * 12Gi = 60Gi used. Adding the 6th pod's 12Gi would be 72Gi total, which exceeds the 100Gi hard limit for limits.memory. Even though requests have room, the limits quota is exhausted, so the pod is rejected.

Kubernetes enforces both requests and limits quotas independently. A pod is only admitted if both its requests and limits fit within the respective quotas. So even if requests quota has room, if limits quota is full, the pod stays Pending.

Verify this by checking the pod's status: kubectl describe pod <pending-pod> | grep -i message or check events: kubectl get events -n <namespace>. You should see a message like "exceeded quota on limits.memory."

Solutions: (1) increase the ResourceQuota limits, (2) reduce the pod's limits (if possible), (3) use a LimitRange to cap maximum limits per pod, forcing lower requests/limits defaults, (4) delete or scale down existing pods to free quota.

Follow-up: You increase the limits quota to 200Gi. The pod still doesn't schedule. What else could be blocking it?

Your cluster has a LimitRange set to max memory=2Gi. You try to deploy a pod with memory.limit=3Gi. The pod is rejected before scheduling. What's the error?

LimitRange enforces per-pod/per-container constraints on requests and limits. If max=2Gi, any pod trying to use more than 2Gi per container is rejected at admission time, before even hitting the scheduler.

The error appears in the pod creation: kubectl apply -f pod.yaml returns "container memory limit exceeds max in LimitRange." The pod never exists in the cluster.

Verify the LimitRange: kubectl get limitrange -n <namespace> and kubectl describe limitrange <name>. You'll see the max=2Gi constraint. The pod spec must respect this.

Options to fix: (1) reduce your pod's limit to ≤2Gi, (2) if you need more memory, ask the cluster admin to increase the LimitRange max, (3) if it's a temporary need, deploy to a different namespace with different LimitRange constraints (if one exists), (4) use a Guaranteed QoS (request == limit) at 2Gi to stay within limits.

LimitRange also sets defaults. If you don't specify a limit, it defaults to the LimitRange's default, which might be too low. Specify explicit limits to override defaults.

Follow-up: The LimitRange max is 2Gi, but you have a pod running with 3Gi limit that was deployed yesterday. How?

You set CPU requests and limits: requests=500m, limits=2000m. The pod runs for 5 minutes. Prometheus shows it only used 100m average. But the pod is not being scheduled on nodes with other pods using CPU. Why is it "hogging" resources if it's not using them?

CPU requests and limits work differently than memory. CPU requests reserve scheduling capacity—the scheduler assumes this pod will use 500m, so it subtracts 500m from the node's available CPU when placing other pods. Even if the pod only uses 100m at runtime, the reservation is still there.

Memory requests/limits are about memory capacity; CPU requests/limits are about scheduling and burstiness. CPU is compressible—if a pod requests 500m but only uses 100m, the unused 400m CPU cycles go idle. Other pods can't take them (unlike memory where you physically can't allocate more than the limit). So the CPU is wasted but not "stolen."

Limits (2000m) are soft unless you're on a cgroup-v2 system with strict enforcement. On cgroup-v1 (most older clusters), CPU limits are advisory—a pod can burst above its limit for short periods. The scheduler just won't place so many pods that total limits would saturate the node.

If 10 pods are on an 8-core (8000m) node, each with 1000m limit, total limits = 10,000m > 8000m. Kubernetes allows this overcommit because CPU is burstable. The pods share CPU time; they can't all use 100% simultaneously but they can spike and burst.

Memory is not burstable. If 10 pods each have 1GB memory limit on a node with 8GB, the moment they all try to use 1GB, OOMKills happen. That's why memory is handled conservatively—requests must fit on the node capacity.

Follow-up: Your node has 8 cores and 10 pods with 1000m CPU request each (10,000m total). All 10 pods spike to 1000m usage simultaneously. What happens?

You're running a mission-critical database pod. You set requests=limits=16Gi memory to ensure Guaranteed QoS and zero eviction risk. The pod gets evicted anyway during a node disruption. Why didn't the guarantee work?

Guaranteed QoS protects from kubelet eviction (memory pressure, disk pressure), but not from node controller evictions (node failure, node termination). If the node is drained or fails, even Guaranteed pods are terminated. Guarantees mean "won't be evicted for resource pressure" not "won't be evicted ever."

Check what triggered the eviction: kubectl describe pod <name> to see the deletion reason. Look for "node.kubernetes.io/not-ready" or "node.kubernetes.io/unreachable" or "node-drain" events. These are node-level disruptions, not kubelet eviction.

Node drains happen during cluster upgrades, node maintenance, or explicit drain commands: kubectl drain node/<name>. The drain respects PodDisruptionBudgets (PDB), so if your pod has a PDB that says "never evict," the drain waits. But if there's no PDB or the PDB allows 0 disruptions, the pod is forcefully deleted.

To protect critical workloads: (1) set a PodDisruptionBudget with minAvailable=1 or maxUnavailable=0 (forces multiple replicas), (2) use critical pod priorityClass (system-cluster-critical or system-node-critical) which prevents preemption during drains, (3) ensure the pod is replicated so one disruption doesn't take down the service.

Guaranteed QoS + PDB + high priority + replication = protection from both resource eviction and node disruption.

Follow-up: You set a PDB with minAvailable=1. The pod still gets evicted during a drain and there are other replicas. Why didn't the PDB protect it?