Kubernetes Interview — HPA with Custom and External Metrics

Your HPA is configured with targetAverageUtilization=50%. Every 2 minutes, replicas scale from 3 to 30 and back. The application is handling the traffic fine at 3 replicas. Why is it thrashing?

Flapping means the replica count oscillates wildly, usually because the metrics are noisy or the HPA is responding to temporary spikes. Check the HPA status: kubectl get hpa -w to watch it in real-time. Then check the underlying metric: kubectl get hpa -o yaml to see what metric it's tracking (CPU, memory, custom).

For CPU-based scaling, the issue is often measurement variance. CPU usage spikes during traffic, HPA scales up, CPU drops, HPA scales down, traffic returns, repeat. This is called "thrashing." The fix is to add stabilization window: set behavior.scaleUp.stabilizationWindowSeconds and behavior.scaleDown.stabilizationWindowSeconds to 300-600 seconds. This tells HPA to wait 5-10 minutes before scaling up/down, smoothing out temporary fluctuations.

Also check the policy. By default, HPA can double the pod count each scale-up. If you jump from 3 to 6 to 12 to 24 to 30 in quick succession, the application might take time to stabilize, causing CPU to appear high even though the old replicas are still warming up. Set behavior.scaleUp.maxChange to limit scale-up by a fixed number (e.g., 5 pods at a time) instead of doubling.

Finally, check if 50% target is realistic. If most of your traffic is bursty (sudden spikes), 50% target means you need room to absorb spikes without hitting 100%. Try 70-80% if traffic is somewhat steady, or use custom metrics instead of CPU if you want more predictable scaling.

Follow-up: You add stabilization window of 600 seconds. The flapping still happens every 2 minutes, completely ignoring the stabilization setting. What's wrong?

You configure HPA to scale based on a custom metric "http_requests_per_second" from Prometheus. The metric exists in Prometheus, but HPA stays at 1 replica and says "unable to get metric." Why?

Custom metrics require a custom metrics adapter (Prometheus Adapter, Stackdriver Adapter, etc.) running in the cluster. HPA can't directly query Prometheus; it needs an adapter that implements the custom metrics API.

Check if the adapter is installed: kubectl get deployment -A | grep adapter or kubectl api-resources | grep custom.metrics. If no custom.metrics.k8s.io API exists, the adapter isn't running.

If the adapter is running, verify it's configured to expose your metric. The adapter needs a ConfigMap that maps Prometheus queries to K8s metrics. Check: kubectl get configmap -n custom-metrics -o yaml (or similar namespace). Look for a rule that translates your Prometheus metric name (http_requests_per_second) to a custom metric name (requests.per.second or similar).

Then verify the metric is actually available: kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/<namespace>/pods/\*/http_requests_per_second. If this returns data, the adapter is working. If it returns "not found" or an error, either the adapter isn't scraping the metric or the metric name is wrong.

Check the HPA resource: kubectl get hpa -o yaml. Verify the metric name matches what the adapter exposes. Then check the adapter logs: kubectl logs -n custom-metrics deployment/prometheus-adapter to see if there are errors.

Common issues: (1) adapter not configured, (2) metric name mismatch, (3) Prometheus not scraping the application, (4) metric absent or null in Prometheus (no data yet).

Follow-up: The adapter is installed and the metric is available. The HPA status still says "unable to compute replica count." What now?

You configure HPA with both CPU and custom metrics. The CPU metric shows 50%, custom metric shows 200% target utilization. The HPA scales based on one but ignores the other. Which does it choose?

HPA with multiple metrics uses the one that requires the highest replica count. It calculates replicas for each metric independently, then uses the max. So if CPU requires 5 replicas and custom metric requires 20 replicas, HPA scales to 20.

Verify this by checking the HPA status: kubectl get hpa -o yaml and look at the status.metrics field. It shows the computed replica count for each metric. Find the one with the highest value—that's the one driving the scaling decision.

If you want to scale based on BOTH metrics (requiring all thresholds to be met), you can't with standard HPA. You'd need a custom controller or an advanced tool like KEDA. Standard HPA is "OR" logic (scale if ANY metric is high); it's not "AND" logic.

If one metric is consistently ignored, check if that metric is unavailable or erroring. HPA skips metrics that can't be computed. So if the custom metric is always "not found," only CPU is used. Check the adapter logs and verify the metric is actually being scraped.

Another issue: target values might be different scales. If targetCPUUtilizationPercentage=50 means 50% of pod limit, and targetValue for custom metric is 1000 requests/sec absolute, they're not directly comparable. HPA will use whichever metric requires more scaling.

Follow-up: The CPU metric shows 30% (no scaling needed) and custom metric shows 80% (scaling needed). Yet HPA doesn't scale. Why?

You use KEDA to scale a deployment based on Kafka lag. The lag metric shows 50,000 messages behind, but KEDA scales to only 1 replica. The scaler is configured correctly. What's happening?

KEDA uses scalers that connect to external systems (Kafka, AWS SQS, Redis, etc.). Each scaler defines target metrics and how many replicas are needed for a given metric value.

Check the ScaledObject or ScalerTrigger configuration: kubectl get scaledobject -o yaml. Look at the triggers section and the specific Kafka scaler settings. The key parameter is lagThreshold—this defines how many messages each pod can handle.

If lagThreshold=50000 and you have 50,000 messages of lag, KEDA calculates: 50000 / 50000 = 1 replica. That's correct based on the threshold. If you want more replicas, either increase the lag or decrease the lagThreshold (i.e., each pod handles fewer messages, so you need more pods).

Verify the lag value is being read correctly: kubectl logs deployment/keda-operator to see if the scaler is connecting to Kafka successfully. Check for errors like "unable to read lag" or "broker unavailable."

Also check the minReplicaCount and maxReplicaCount: kubectl get scaledobject -o yaml | grep -A5 'min\|max'. If minReplicaCount=1, the scaler won't go below 1 even if lag is 0. If maxReplicaCount is too low, it caps the scaling.

As a test, manually increase lagThreshold to a very small number to force higher scaling. If replicas spike, the scaler works and your threshold was just too high.

Follow-up: You decrease lagThreshold and replicas scale up correctly. But when lag goes to 0, replicas don't scale down. Why?

Your metrics server is reporting CPU metrics for some pods but not others. HPA on pods without metrics shows "missing: 1m cpu". What's the issue?

Metrics server takes time to collect and aggregate data. New pods might not have metrics for 1-2 minutes. But if pods have been running for hours and still show "missing: 1m cpu," there's a real issue.

First, verify metrics-server is running: kubectl get deployment metrics-server -n kube-system. If it's not there, install it: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml (adjust URL as needed).

If metrics-server is running, check its logs for errors: kubectl logs deployment/metrics-server -n kube-system | tail -20. Look for scrape errors, connection issues, or permission denials.

Check if the pods are exposing metrics on the kubelet ports. Metrics-server queries the kubelet on each node. If a pod's node has firewall rules blocking kubelet access, metrics won't be collected. Check: kubectl describe node <node> for any conditions or errors related to metrics.

Manually query metrics to see what's available: kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods --pretty=true. This shows all pods with metrics. If your problematic pods are absent, metrics-server isn't collecting for them.

Check if those pods have resource requests defined. Some clusters require resource requests for metrics collection (though this isn't strictly necessary). Verify: kubectl get pod -o yaml | grep -A10 'resources'.

If metrics are missing for only specific pods (e.g., on a specific node), that node might have networking issues or a misbehaving kubelet. Restart the kubelet on that node: ssh node && sudo systemctl restart kubelet.

Follow-up: Metrics-server is running and logs look clean. Pods have resource requests. Yet kubectl top pod shows metrics for some pods and "unknown" for others. All pods are identical.

Your HPA is configured with averageValue=1000 for a custom metric. The metric is summed across all pods (total rate). But HPA is calculating replicas based on the sum, not the average. Why?

Check the metric type. There are three types: Resource (CPU/memory), Pods (per-pod custom metrics), and Object (cluster-level metrics). For a per-pod average, use type: Pods. For a cluster-wide sum, use type: Object or Resource.

If the metric is summed across the cluster (e.g., total HTTP requests to the service), and you specify averageValue, HPA will still use the sum divided by replica count. Example: total_requests = 10,000, current_replicas = 2, average_per_pod = 5,000. If averageValue target = 1,000, HPA calculates desired_replicas = 10,000 / 1,000 = 10.

To confirm: kubectl get hpa -o yaml and look at the metrics section. If it says type: Pods and pods.metric.selector references a single-instance metric, it's per-pod. If it says type: Object, it's cluster-wide.

If the metric is truly summed (e.g., prometheus.io:http_requests_total), but you want to scale based on individual pod load, you need to transform the metric in the adapter. The Prometheus Adapter can divide by pod count to get per-pod average. Check the adapter ConfigMap: kubectl get configmap -n custom-metrics -o yaml | grep -A20 'rules' to see if the metric is being aggregated.

If the aggregation is wrong, you need to fix the Prometheus Adapter rules. Or, expose a pre-aggregated metric from your application (e.g., requests_per_pod), and use that instead.

Follow-up: You fix the metric type to Pods. HPA now shows the correct replica calculation, but it still scales based on the Object (sum) value. How is this possible?

Your HPA's targetCPUUtilizationPercentage is 70%, and the deployment has resource requests set. But HPA scales based on actual CPU (from top), not the request-relative percentage. Why?

CPU utilization is calculated as: (actual_cpu_usage / cpu_request) * 100. If the target is 70% and actual usage is 700m with 1000m request, utilization = 70%. If 1400m usage with 1000m request, utilization = 140%, which exceeds the target, so HPA scales up.

But if resource requests are not defined on the pod, HPA defaults to scaling on raw CPU usage (without normalization). So if you see it scaling based on absolute values (e.g., scaling when 2 cores are used, regardless of request size), that's likely the issue.

Verify resource requests are defined: kubectl get pod -o yaml | grep -A10 'resources:'. Check if both requests and limits are set. If only limits are set, HPA uses the limit as the request (for CPU calculations).

If requests are defined but HPA is still using absolute values, check the HPA spec: kubectl get hpa -o yaml. If it has targetCPUUtilizationPercentage: 70, it should use percentage-based scaling. But if there's also a metrics section with type: Resource and targetAverageValue (instead of targetAverageUtilization), it's using absolute values.

The fix: use targetAverageUtilization (percentage) not targetAverageValue (absolute milliCPU). Or ensure both are present and correct—if you have both, HPA uses the one requiring the most scaling.

Follow-up: The HPA clearly shows targetCPUUtilizationPercentage: 70 and resource requests are set. But monitoring shows HPA makes scaling decisions based on absolute CPU (e.g., always scales at 500m regardless of request). What's happening?

You set HPA with minReplicas: 10, maxReplicas: 100. Your deployment is scaled to 50 replicas. The metric shows high usage (should scale to 80 replicas), but HPA doesn't scale up. It's been 10 minutes. What's blocking it?

Check for scaling policies that might limit scale-up rate. Newer K8s versions have behavior.scaleUp policies with maxChange and maxChangeRate. If maxChange is set to (e.g.) +5 pods per scale-up, and the last scale-up was 5 minutes ago, HPA scales by 5 every 5 minutes. From 50 to 80 is 30 pods; at 5 pods per interval, that takes 6 intervals = 30 minutes.

Check the HPA: kubectl get hpa -o yaml | grep -A15 'behavior'. Look at scaleUp.maxChange and scaleUp.maxChangeRate. If they're restrictive, that explains slow scaling.

Alternatively, check if there's a cooldown period: some older HPA configs have downscaleStabilizationWindow that prevents scale-up immediately after scale-down. If the deployment was recently scaled down, HPA might wait before scaling up again.

Also verify the metric is actually high and HPA sees it: kubectl describe hpa shows the current metric and desired replica count. If desired replicas = 50 (same as current), HPA isn't responding to the metric. This means the metric isn't exceeding the threshold, or there's a compute error.

Check if the deployment has maxUnavailable or maxSurge limits that prevent rapid scaling. If maxSurge: 10%, only 10% extra pods can be created, which slows the scaling process: kubectl get deployment -o yaml | grep -A10 'strategy'.

Follow-up: You set maxChange to 0 (no cap) and cooldown to 0. The HPA status shows desired=80. But replicas stay at 50 and don't increase. What's preventing the rollout?