You're draining a node for maintenance. kubectl drain hangs for exactly 30 minutes, then times out. You see 3 pods stuck in Terminating state. Two are database statefulsets with PDB minAvailable=2. The third is a singleton Job. Walk through debugging and unblocking the drain.
Drain timeout usually means PDB constraints prevent eviction. Debug step by step:
1. Check the stuck pods:
kubectl get pods --field-selector=status.phase=Running --all-namespaces | grep Terminating
kubectl describe pod stuck-pod-name -n namespace
# Check events for eviction warnings
2. Check PDB constraints:
kubectl get pdb -A
kubectl describe pdb database-pdb -n production
# Key field: "Disruptions allowed: 0" means no more evictions allowed
3. The issue: drain needs to evict a database pod, but PDB says minAvailable=2 and only 2 replicas exist. Evicting any pod violates the constraint.
4. Solution options:
Option A: Temporarily relax the PDB
kubectl patch pdb database-pdb -n production -p '{"spec":{"minAvailable":1}}'
# Now drain can proceed
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
# After drain completes, restore PDB
kubectl patch pdb database-pdb -n production -p '{"spec":{"minAvailable":2}}'
Option B: Scale up the StatefulSet first, then drain
kubectl scale sts database -n production --replicas=3
# Wait for new replica to be ready
kubectl wait --for=condition=Ready pod -l app=database -n production --timeout=300s
# Now drain won't violate PDB (can evict 1 of 3)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
Option C: For the singleton Job, kill it if it's non-critical
kubectl delete job singleton-job -n production --grace-period=30
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
5. Verify drain completes:
kubectl get nodes node-1
# Should show NotReady,SchedulingDisabled after drain
Best Practice: Define PDB for all critical workloads. Always check "Disruptions allowed" before maintenance.
Follow-up: Design a pre-drain check script that identifies PDB violations before running kubectl drain and fails fast instead of hanging for 30 minutes.
Your team defined a PDB for a critical microservice: "minAvailable: 1" across 3 replicas. In production, you have 2 replicas actually running (one node's pod failed to schedule due to resource constraints). During a node drain, the PDB allows 1 disruption, but that would drop you to 1 replica, which breaks your SLA. How should the PDB have been defined?
This is a gap between PDB definition and reality. PDB enforces constraints, but assumes your target state is already running:
1. Check current vs. desired state:
kubectl get deployment critical-service -o json | jq '.spec.replicas, .status.replicas'
# If desired=3 but replicas running=2, you have an issue
2. The PDB definition was wrong because it didn't account for the real constraint: "minimum 2 replicas must be healthy." But with minAvailable=1, drain can evict down to 1.
3. Correct approach: Use maxUnavailable instead of minAvailable for clearer intent:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: critical-service-pdb
spec:
maxUnavailable: 1 # Never evict more than 1 pod at a time
selector:
matchLabels:
app: critical-service
With maxUnavailable=1 on 3 replicas, Kubernetes can evict 1, leaving 2 running. This is safer than minAvailable=1.
4. Better approach: Use percentage to scale dynamically:
spec:
maxUnavailable: "33%" # Never evict more than 1/3 of pods
selector:
matchLabels:
app: critical-service
5. Add health checks to ensure your desired state matches actual state. Fix the resource constraints that prevented the 3rd pod from scheduling:
kubectl get nodes -o json | jq '.items[] | {name:.metadata.name, allocatable:.status.allocatable.memory}'
# Add node capacity or reduce pod memory requests
6. Once all 3 pods are running, your PDB is now safe:
kubectl get pdb -A -o json | jq '.items[] | {name:.metadata.name, "disruptions allowed":.status.disruptionsAllowed}'
# Should show "disruptions allowed: 1" with 3 replicas and maxUnavailable=1
Key Lesson: PDB is only safe if your desired replica count is actually running. Before maintenance, verify all replicas are healthy.
Follow-up: How would you add a pre-drain validation webhook that prevents drain if any Deployment has fewer running replicas than desired?
You're rolling out a new version of your application. During the rollout, 10 pods are updating. You start draining a node where one of the updating pods lives. The PDB says minAvailable=2, and you have 3 total. But the pod you're trying to evict is from the old ReplicaSet (v1), and the new ReplicaSet (v2) already has 2 pods running. Should drain allow this? What's the PDB behavior?
This is subtle: PDB is label-based, not version-aware. It doesn't understand ReplicaSets or rolling updates:
1. When you deploy a new version with rolling update strategy, both old and new ReplicaSets exist temporarily:
kubectl get rs app-deployment
# Output:
# app-deployment-v1-abc123 0/3 replicas (old, terminating)
# app-deployment-v2-def456 3/3 replicas (new, ready)
2. But the PDB is tied to the label selector (e.g., app=myapp), which matches BOTH old and new pods:
kubectl get pdb app-pdb -o json | jq '.spec.selector.matchLabels'
# {"app": "myapp"}
3. PDB counts: You have 3 total pods matching the label (2 from v2 + 1 from v1). minAvailable=2, so disruptions allowed=1.
4. Drain tries to evict the v1 pod. PDB allows it (1 disruption allowed). The pod is evicted and dies. Now you have 2 pods (both v2), which still meets minAvailable=2.
5. However, if the drain happens BEFORE v2 replicas are fully ready, you could violate SLA:
# Bad scenario:
# ReplicaSet v1: 2 running
# ReplicaSet v2: 1 ready, 1 pending
# Total: 3 pods matching PDB label
# minAvailable=2, so 1 disruption allowed
# Drain evicts a v1 pod, leaving 1 v1 + 1 v2 ready = 2 total (meets PDB)
# But functionally you have only 2 of 3 desired replicas
Solution: Use multiple PDBs for rolling updates or a stricter PDB
Option A: Update PDB during rolling deployment:
kubectl patch pdb app-pdb -p '{"spec":{"minAvailable":3}}' # No disruptions during rollout
# After rollout completes, restore
kubectl patch pdb app-pdb -p '{"spec":{"minAvailable":2}}'
Option B: Use maxUnavailable with higher value during rollout:
spec:
maxUnavailable: 0 # During updates, prevent ALL disruptions
Option C: Schedule drains outside of rollout windows
Best Practice: Coordinate drain operations with your deployment pipeline. Don't drain while a rolling update is in progress.
Follow-up: Design a system that automatically adjusts PDB constraints during rolling updates to prevent accidental SLA violations.
You have a cluster with 100 nodes. You're implementing PBDs for all workloads. You define minAvailable=1 for most services, but a few services don't have PBD at all. When you drain a node, the ones without PDB get evicted immediately (no protection). Your manager asks: should EVERY workload have a PDB? What's the right policy?
PDB policy is about criticality, not universal requirement:
1. Categorize your workloads:
kubectl get deploy -A -o json | jq '.items[] | {name:.metadata.name, namespace:.metadata.namespace, "pod count":.status.replicas}'
# Separate into:
# - Critical (databases, APIs, payment systems)
# - Standard (web services, batch jobs)
# - Non-critical (dev tools, experimental services)
2. For CRITICAL workloads: Require PDB with minAvailable or maxUnavailable
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-api-pdb
spec:
minAvailable: 2 # Always keep 2 replicas during disruptions
selector:
matchLabels:
tier: critical
3. For STANDARD workloads: Recommend PDB with maxUnavailable=1
spec:
maxUnavailable: 1
selector:
matchLabels:
tier: standard
4. For NON-CRITICAL workloads: PDB optional, focus on graceful shutdown
5. Enforce this with a policy (e.g., OPA Gatekeeper):
# Gatekeeper constraint: all Deployments with tier=critical MUST have PDB
# All Deployments with tier=standard SHOULD have PDB
# Non-critical can skip
6. Alternative: Use Pod Quality of Service (QoS) classes to guide PDB requirements:
# Guaranteed QoS pods (request==limit): Higher priority, require PDB
# Burstable QoS pods: Recommend PDB
# BestEffort QoS pods: No PDB needed, can be evicted freely
7. Validate your policy:
kubectl get pdb -A
# Count PBDs, ensure coverage of critical workloads
kubectl get deploy -l tier=critical -A --no-headers | wc -l
# Should have PDB for each
Right Answer: Every CRITICAL workload needs PDB. Standard workloads should have it. Non-critical services don't need it. Your drain operations should never disrupt critical SLAs unexpectedly.
Follow-up: Design an audit script that flags critical workloads missing PDB and generates alerts before production incidents occur.
You're draining a node. A StatefulSet pod (stateful-db-0) is stuck in Terminating for 5 minutes with a PDB maxUnavailable=1 on 3 replicas. The pod has a preStop hook that's hanging. The PDB technically allows 1 disruption, but since there's only 1 pod terminating, it's not being evicted—the preStop just hangs indefinitely. How does PDB interact with preStop hooks and forceful termination?
PDB and preStop hooks are orthogonal. PDB controls eviction permission, but doesn't force termination. preStop hangs are separate:
1. PDB check happens BEFORE eviction attempt:
# Drain calls evict API
# Kubernetes checks: "Is this pod covered by a PDB?"
# If yes, checks: "Can I evict this pod without violating PDB?"
# If no (violates PDB), eviction fails and pod stays
# If yes, proceeds with termination sequence
2. In your case: PDB allows disruption (maxUnavailable=1, only 1 pod terminating), so eviction check passes. But then the preStop hook hangs during actual termination:
kubectl get pod stateful-db-0 -n production -o json | jq '.status.conditions[] | select(.type=="Ready")'
# Should show False because pod is terminating
kubectl logs stateful-db-0 -n production --previous 2>&1 | grep -i "prestop\|shutdown"
3. The issue is NOT PDB—it's the preStop hook. PDB allowed the eviction, but pod won't actually die.
4. Solution: Force kill the pod after gracePeriodSeconds (default 30s):
# Option A: Let it timeout naturally
# After 30s, kubelet force-kills
# Monitor:
kubectl get pod stateful-db-0 -n production --watch
Option B: Force-delete immediately (grace-period=0)
kubectl delete pod stateful-db-0 -n production --grace-period=0 --force
Option C: Increase grace period and hope preStop recovers
kubectl patch pod stateful-db-0 -n production -p '{"spec":{"terminationGracePeriodSeconds":60}}'
5. Fix the preStop hook to have a timeout:
spec:
containers:
- lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "timeout 20 /graceful-shutdown.sh"]
preStopHook:
timeoutSeconds: 25 # Kubelet will kill if preStop exceeds this
6. Drain will now proceed (PDB allows it, preStop times out after 25s, pod dies):
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data --grace-period=30
Follow-up: Design a test that validates PDB + preStop behavior for critical workloads before they reach production.
Your cluster has a PDB for a Deployment: minAvailable=2 out of 3 replicas. But your infrastructure team notices that the node running the 3rd replica is about to fail (hardware degradation detected). They want to preemptively drain it. Drain will fail because it would violate the PDB. Your app goes down if the node fails suddenly. How do you handle this scenario?
This is a legitimate conflict: PDB prevents disruption that would cause catastrophic failure. You need a preemptive mitigation:
1. First, assess the real risk:
# Check node condition
kubectl describe node at-risk-node-1
# Look for MemoryPressure, DiskPressure, PIDPressure: True
# If condition is already True, node could fail any moment
2. Short-term fix: Temporarily modify PDB to allow drain:
kubectl patch pdb deployment-pdb -p '{"spec":{"minAvailable":1}}'
# Now drain will proceed
kubectl drain at-risk-node-1 --ignore-daemonsets --delete-emptydir-data
# Wait for pod to reschedule on a healthy node
kubectl wait --for=condition=Ready pod -l app=deployment --timeout=300s
# Restore PDB
kubectl patch pdb deployment-pdb -p '{"spec":{"minAvailable":2}}'
3. Better approach: Taint the node instead of drain (pods can tolerate taint temporarily):
kubectl taint node at-risk-node-1 hardware-degradation=true:NoExecute
# Workloads will be evicted based on their tolerations
# If a pod has tolerationSeconds=300, it gets 5 minutes to reschedule
# If no toleration, it's evicted immediately
4. Configure tolerationSeconds for critical workloads:
spec:
template:
spec:
tolerations:
- key: hardware-degradation
operator: Equal
value: "true"
effect: NoExecute
tolerationSeconds: 300 # 5 minutes to reschedule elsewhere
5. At 5-minute mark, pod is evicted even if PDB says no. This forces off the bad node.
6. The right PDB was too strict for a degrading node scenario. Consider:
# Use maxUnavailable instead
spec:
maxUnavailable: 1 # If infrastructure needs to drain, we take 1 disruption
This is safer than minAvailable=2 for infrastructure-initiated evictions
Best Practice: Align PDB constraints with infrastructure SLAs. If hardware can fail, PDB should allow at least 1 disruption.
Follow-up: Design a policy that prevents PDB from being too restrictive relative to infrastructure reliability (e.g., if hardware failure rate is 1%, PDB shouldn't prevent 100% of disruptions).
Your team uses a DaemonSet (monitoring agent) that runs on every node. When you drain a node, the DaemonSet pod is also evicted. You're using kubectl drain --ignore-daemonsets to skip the check. But the drain takes 3 hours because monitoring is down during the drain, and the cluster logs no metrics. You miss a critical alert that would have caught a performance issue. What went wrong?
--ignore-daemonsets is a blunt instrument. It skips eviction checks but doesn't keep the DaemonSet running:
1. Understand what --ignore-daemonsets actually does:
kubectl drain --ignore-daemonsets node-1
# This flag means: "Skip DaemonSet pods when draining; don't wait for them to be evicted"
# But the kubelet still stops the pod when it cordons the node
2. The result: monitoring is blind during drain. You miss metrics.
3. Better approach: Use node maintenance affinity to reschedule DaemonSet pods to other nodes BEFORE drain:
kubectl patch node node-1 -p '{"metadata":{"labels":{"maintenance":"true"}}}'
Update DaemonSet to NOT run on maintenance nodes
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: monitoring-agent
spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: maintenance
operator: NotIn
values: ["true"]
4. Now the DaemonSet pod is evicted to another node, and monitoring continues uninterrupted:
kubectl get pods -A | grep monitoring-agent
Pod should reschedule to another node before drain completes
5. Then drain normally:
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data
6. Remove the maintenance label after drain:
kubectl patch node node-1 -p '{"metadata":{"labels":{"maintenance":"null"}}}'
DaemonSet pod will respawn on the node
Alternative: Use Pod Disruption Budget for DaemonSet
7. Even though DaemonSets are typically exempt from PDB, you can make monitoring DaemonSet respect a PDB by excluding maintenance windows:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: monitoring-agent-pdb
spec:
minAvailable: "90%" # Keep at least 90% of monitoring agents running
selector:
matchLabels:
app: monitoring-agent
This won’t prevent drain but will alert you if it violates availability goals.
Follow-up: Design a drain workflow that ensures critical system DaemonSets (monitoring, logging, security) stay operational during node maintenance.
You have a multi-tenant cluster. Tenant A has a Deployment with PDB minAvailable=2. Tenant B has a pod that requires all nodes available for a time-sensitive job. You're draining a node and Tenant A's PDB blocks the drain for 2 hours. Tenant B's job deadline passes. How do you prevent cross-tenant PDB conflicts?
Multi-tenant clusters require careful PDB isolation and priority:
1. First, verify the PDB scope:
kubectl get pdb -A
# Each tenant should have PDB in their namespace
kubectl describe pdb tenant-a-pdb -n tenant-a-ns
2. The issue: Tenant A's restrictive PDB affects cluster-wide drain operations, impacting Tenant B.
3. Implement tenant-scoped PDB with priority levels:
# Tenant A (lower priority)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: tenant-a-pdb
namespace: tenant-a-ns
labels:
tenant: a
priority: standard
spec:
maxUnavailable: 1 # Flexible constraint
Tenant B (critical job, higher priority)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: tenant-b-pdb
namespace: tenant-b-ns
labels:
tenant: b
priority: critical
spec:
minAvailable: 2 # Stricter constraint
4. Implement a pre-drain check that respects priority:
#!/bin/bash
Get all PDBs ordered by priority
kubectl get pdb -A -o json | jq '.items[] | select(.metadata.labels.priority == "critical")'
If any critical PDB would be violated, fail the drain
CRITICAL_PBDS=$(kubectl get pdb -A -o json | jq '[.items[] | select(.metadata.labels.priority == "critical")]')
if [[ $(echo $CRITICAL_PBDS | jq 'length') -gt 0 ]]; then
echo "Cannot drain: critical tenant PDB would be violated"
exit 1
fi
Otherwise proceed
kubectl drain $NODE
5. Alternatively, use drain scheduling: allow tenant admins to opt-in/opt-out of maintenance windows:
kubectl patch pdb tenant-a-pdb -n tenant-a-ns -p '{"spec":{"drainPolicy":"IfHealthyBudget"}}'
IfHealthyBudget means: drain if PDB budget is healthy, else allow disruption
6. For cross-tenant conflicts, enforce SLA-based PDB limits:
# In tenant A’s namespace ResourceQuota:
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-a-quota
spec:
hard:
pods: "100"
Limit PDB strictness per tenant
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["tenant-a-standard"]
Best Practice: In multi-tenant clusters, use platform-level drain orchestration that respects all PDBs but prioritizes critical tenants. Drain during tenant-agreed maintenance windows, not ad-hoc.
Follow-up: Design a drain scheduling API that allows tenant admins to negotiate maintenance windows without blocking each other's critical workloads.