Kubernetes Interview — CNI Networking and Overlay Models

Your cluster runs Flannel VXLAN overlay. You're scaling to 500 nodes across AWS regions. Bandwidth costs are spiraling—$12K/month in inter-region traffic. Your network team says "Why are you encapsulating everything?" You realize Flannel encapsulation is adding 50+ bytes to every packet. Can you switch to Calico/BGP routing without rebuilding the cluster?

Yes, but it requires careful planning. Flannel VXLAN vs Calico BGP represent different architectural philosophies: overlay vs underlay. The migration has network implications.

Phase 1: Understand current state
kubectl get daemonset -n kube-system -o wide | grep flannel kubectl describe daemonset kube-flannel -n kube-system | grep Image kubectl exec -n kube-system kube-flannel-xxxxx -- ip route show kubectl exec -n kube-system kube-flannel-xxxxx -- cat /etc/kube-flannel/net-conf.json
Confirm VXLAN is active:
ssh node-1 ip link show | grep vxlan ssh node-1 bridge fdb show | head -10

Phase 2: Plan Calico migration
Option A: Rolling replacement (preferred for large clusters)
- Install Calico components alongside Flannel (Calico as policy controller, Flannel continues routing)
- Drain nodes one by one, uninstall Flannel, install Calico CNI
- Validate pod-to-pod connectivity after each node

Option B: Create a new cluster, migrate via service mesh (safest but expensive)

Step 1: Install Calico operator and resources in monitoring mode:
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml

Step 2: Configure Calico to coexist (CNI chaining):
apiVersion: projectcalico.org/v1alpha1 kind: CNIConfiguration metadata: name: default spec: containerRuntime: containerd cniSchemaVersion: 1.0
Verify coexistence:
kubectl get daemonset -n calico-system kubectl get pods -n calico-system -o wide

Step 3: Drain and migrate nodes (one per hour to monitor impact):
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data ssh node-1 sudo systemctl stop kubelet ssh node-1 sudo rm -rf /var/lib/cni/flannel /etc/cni/net.d/*flannel* ssh node-1 sudo systemctl start kubelet # Wait for Calico to initialize sleep 30 kubectl describe node node-1 | grep -E 'Ready|network'
Test pod connectivity:
kubectl run test-pod-1 --image=alpine -it --rm -- ping test-pod-2.default.svc.cluster.local

Phase 3: Switch Calico to BGP (underlay) mode
Edit BGP configuration:
kubectl apply -f - < Configure BGP peers (your network routers): kubectl apply -f - <


    Verify BGP peering:

    kubectl exec -n calico-system calico-node-xxxxx -- calicoctl node status

    Expected output: "Calico process is running" and "BGP status: up"
    Phase 4: Monitor and validate

    Compare bandwidth before/after:

    ssh node-1 sar -n DEV 1 5 | grep -E 'eth0|vxlan'
# Check packet overhead reduction
ping -c 100 -s 1472 pod-ip  # Max before fragmentation

    Expected: bandwidth usage drops 5-10% due to eliminated VXLAN encapsulation (50-byte header).

    Cost savings: 500 nodes × 10% reduction = ~$1.2K/month saved.
    Rollback plan: Keep Flannel manifests in GitOps repo with version tag. If issues occur:

    git checkout flannel-v0.21.0
kubectl apply -f flannel-daemonset.yaml
kubectl drain node-1 --ignore-daemonsets
ssh node-1 sudo systemctl stop kubelet && sudo rm -rf /var/lib/cni/calico && sudo systemctl start kubelet


  
    Follow-up: BGP requires your infrastructure team to configure the routers. What happens if they refuse? Design a hybrid approach that reduces costs without requiring router changes.



  
    You've just switched from Flannel to Cilium. Pod-to-pod connectivity works fine, but now kube-proxy is gone and some services are broken. Your NodePort services don't respond. What happened and how do you debug?
  
  
    Cilium replaces kube-proxy entirely, but the replacement isn't automatic. Cilium needs specific configuration to handle LoadBalancer services and NodePorts. If services are broken, you likely have a Cilium service proxy misconfiguration or a mismatch between Cilium's eBPF and your service topology.
    Debug flow:
    1. Verify Cilium replaced kube-proxy:

    kubectl get pods -n kube-system -l app=kube-proxy
# Should return nothing

    Verify Cilium agents are running:

    kubectl get daemonset -n cilium
kubectl get pods -n cilium -o wide
    2. Check Cilium configuration for services:

    kubectl get configmap cilium-config -n cilium -o yaml | grep -E 'kube-proxy|service-proxy-name|bpf-map-dynamic-size-ratio'

    Ensure key configs are set:

    bpf-map-dynamic-size-ratio: "0.25"  # Allocate eBPF maps
enable-host-port: "true"  # For NodePorts
enable-node-port: "true"  # For NodePort services
    3. Test NodePort directly:

    kubectl get svc | grep NodePort
kubectl exec -it debug-pod -- curl http://node-ip:node-port
# If it fails, the service proxy isn't working
    4. Check Cilium service map:

    kubectl exec -n cilium cilium-xxxxx -- cilium service list
kubectl exec -n cilium cilium-xxxxx -- cilium service get 1234

    If the service isn't listed, it wasn't programmed into eBPF.
    5. Inspect eBPF maps directly:

    kubectl exec -n cilium cilium-xxxxx -- bpftool map show
kubectl exec -n cilium cilium-xxxxx -- bpftool map dump name cilium_lb_services_v4 | head -20

    Verify your service IP is in the map.
    6. Check Cilium logs for errors:

    kubectl logs -n cilium -l k8s-app=cilium --tail=100 | grep -i 'service\|lb\|proxy'

    Common error: "failed to create service entry" or "eBPF map full"
    Fix: If eBPF map is full, increase map sizes

    kubectl set env daemonset/cilium -n cilium BPF_MAP_DYNAMIC_SIZE_RATIO=0.5
kubectl rollout restart daemonset/cilium -n cilium
sleep 30
    Restart Cilium agents:

    kubectl rollout restart daemonset/cilium -n cilium
# Monitor for completion
kubectl rollout status daemonset/cilium -n cilium
    7. Validate NodePort again:

    kubectl exec -it debug-pod -- curl http://node-ip:node-port -v
# Should succeed now
    Prevention: When migrating from kube-proxy to Cilium, always:

    1. Run compatibility check first: cilium preflight connectivity-check

    2. Enable Cilium's monitoring: hubble observe --verdict DROPPED

    3. Test all service types (ClusterIP, NodePort, LoadBalancer) in audit mode before cutover
  
  
    Follow-up: How would you handle session affinity (sticky sessions) without kube-proxy? Design a solution that works for gRPC and WebSocket traffic.
  


  
    You're running Calico on a 100-node cluster. Monitoring shows high CPU on calico-node pods and slow pod startup times (45 seconds vs normal 5 seconds). The calico-node pods are consuming 800m CPU each. What's the bottleneck and how do you investigate?
  
  
    High CPU in calico-node typically indicates: policy reconciliation storms, BGP churn, or eBPF map contention. Pod startup slowness suggests the CNI plugin is blocking on IP allocation or policy programming.
    Debug sequence:
    1. Correlate CPU spike with events:

    kubectl top pods -n calico-system -l k8s-app=calico-node --containers
kubectl describe pod calico-node-xxxxx -n calico-system | grep -A 10 Events

    Check if spikes correlate with pod deployments, node additions, or policy updates.
    2. Check BGP stability:

    kubectl exec -n calico-system calico-node-xxxxx -- calicoctl node status
# Expected: BGP status: up
# If showing "down" or frequent changes, BGP is thrashing

    Monitor BGP peering flaps:

    kubectl logs -n calico-system -l k8s-app=calico-node | grep -E 'bgp.*state|Peer.*Up|Peer.*Down' | tail -50

    High volume of Up/Down events = peering instability.
    3. Check policy reconciliation load:

    kubectl exec -n calico-system calico-node-xxxxx -- calicoctl get policies | wc -l
# Count number of policies
calicoctl get networkpolicies | wc -l
# Count NetworkPolicies

    If you have 1000+ policies, reconciliation becomes expensive.

    Profile policy processing:

    kubectl logs -n calico-system -l k8s-app=calico-node --tail=500 | grep -E 'Reconcile|ProcessUpdate' | wc -l
    4. Monitor IP allocation performance:

    kubectl describe daemonset calico-node -n calico-system | grep -A 5 "Limits\|Requests"
# Check memory and CPU limits

    Run a deployment spike and measure pod startup time:

    time kubectl run test-{1..10} --image=alpine --overrides='{"spec":{"terminationGracePeriodSeconds":0}}'
# Measure end-to-end time
kubectl get pods -o wide | grep test- | wc -l
    5. Check eBPF map usage:

    ssh node-1 sudo bpftool map show | grep -E 'cali|felix'
ssh node-1 sudo bpftool map dump name cali_v4 | wc -l

    If maps are at 95%+ capacity, Calico can't program new routes efficiently.
    Common fixes:
    Fix 1: Increase calico-node resource limits

    kubectl set resources daemonset calico-node -n calico-system --limits=cpu=1,memory=512Mi --requests=cpu=500m,memory=256Mi
    Fix 2: Reduce policy churn by batching updates

    kubectl patch configmap calico-config -n calico-system --type merge -p '{"data":{"policy_update_batch_size":"50"}}'
    Fix 3: Enable Felix CPU profiling to identify exact hotspots

    kubectl set env daemonset/calico-node -n calico-system FELIX_CPUPROFILINGFILE=/tmp/felix.prof
kubectl rollout restart daemonset/calico-node -n calico-system
sleep 120
kubectl exec -n calico-system calico-node-xxxxx -- cat /tmp/felix.prof > felix.prof
go tool pprof felix.prof
# top10 to see hottest functions
    Fix 4: Split policies into smaller, more specific rules

    Instead of:

    spec:
  podSelector: {}  # Matches all pods
  ingress:
  - from:
    - podSelector: {}  # 1000 rules evaluated per pod

    Use labeled tiers:

    spec:
  podSelector:
    matchLabels:
      tier: api  # Narrower scope, fewer rules to evaluate
  
  
    Follow-up: How do you scale a single Calico deployment to handle 1000+ nodes? At what point do you need to switch architectures?
  


  
    Your cluster spans 3 availability zones in the same region. You're using Flannel with VXLAN overlay. Pod A in AZ1 pings Pod B in AZ3—latency is 35ms instead of expected 2-3ms. Network engineers say the underlay is fine. Why is the overlay adding so much latency and how do you fix it?
  
  
    VXLAN overlay encapsulation can introduce latency through multiple mechanisms: increased packet size causing fragmentation, MTU mismatches, or additional processing in the VXLAN tunnel endpoints.
    Investigate latency source:
    1. Verify underlay latency is good:

    ssh node-az1 ping -c 100 node-az3-ip | grep avg
# Expected: 1-3ms
    2. Check pod-to-pod latency in detail:

    kubectl run latency-test-az1 --image=nicolaka/netshoot -n default --node-selector zone=us-east-1a
kubectl run latency-test-az3 --image=nicolaka/netshoot -n default --node-selector zone=us-east-1c
kubectl exec latency-test-az1 -- bash
# Inside pod:
for i in {1..100}; do ping -c 1 latency-test-az3; done | tee latency.txt
grep time= latency.txt | awk -F'time=' '{print $2}' | awk -F' ' '{print $1}' | sort -n | tail -1

    Isolate latency: measure pod-to-node, node-to-node, node-to-pod to find bottleneck.
    3. Check MTU and fragmentation:

    kubectl exec latency-test-az1 -- ping -c 3 -M do -s 1472 latency-test-az3
# If "Frag needed but DF set", MTU is too small

    Check current MTU on nodes:

    ssh node-az1 ip link show | grep mtu

    VXLAN adds 50-byte overhead (14 + 20 + 8 + 8 = 50). If physical MTU is 1500, VXLAN MTU should be 1450.

    ssh node-az1 ip link set dev flannel.1 mtu 1450

    Verify on Flannel config:

    kubectl get daemonset kube-flannel -n kube-system -o yaml | grep -A 5 args
    4. Enable Flannel VXLAN UDP port optimization:

    kubectl get daemonset kube-flannel -n kube-system -o yaml | grep -E 'Backend|Type|Directrouting'

    If Directrouting is disabled, enable it:

    kubectl edit daemonset kube-flannel -n kube-system
# Edit net-conf.json:
# "Backend": {
#   "Type": "vxlan",
#   "Directrouting": true  # Direct routing for same-AZ pods
# }
    5. Measure VXLAN processing overhead:

    ssh node-az1 ethtool -S eth0 | grep -E 'rx_csum|tx_csum|rx_packets|tx_packets'
# Compare before/after enabling hardware offload

    Enable TSO (TCP Segmentation Offload) and GSO (Generic Segmentation Offload) if supported:

    ssh node-az1 ethtool -K eth0 tso on gso on
    6. Check if cross-AZ traffic is being unnecessarily routed through a NAT/gateway:

    traceroute latency-test-az3-ip
# Verify direct node-to-node path, not through a gateway
    7. Consider switching to Calico BGP (underlay) if latency is critical

    With BGP, packets aren't encapsulated—they're routed directly by the underlay network. Latency drops to underlay baseline (1-3ms).
    Quick fix ranking by impact:

    1. Enable Directrouting (immediate, ~5ms reduction)

    2. Fix MTU mismatch (immediate, if fragmentation is happening)

    3. Enable hardware offload (2-3ms reduction)

    4. Migrate to BGP/underlay (5-10ms reduction, but requires architecture change)
  
  
    Follow-up: Your latency-sensitive trading application needs sub-1ms pod-to-pod latency. Which CNI would you choose and why? Design the network architecture.
  


  
    You're choosing between Calico, Cilium, and Flannel for a new production cluster. Your requirements: 300 nodes, multi-region, policy enforcement, load balancing, and cost control. You have 2 weeks to decide and 4 weeks to deploy. Which do you pick and why? Walk through your evaluation criteria.
  
  
    Evaluation framework (score each on scale 1-10):
    Criterion 1: Operational Complexity

    Flannel: 9/10 (simple, fewer moving parts)

    Calico: 6/10 (more config, especially for BGP)

    Cilium: 3/10 (eBPF learning curve, requires kernel expertise)

    Winner: Flannel if you want low ops burden; Cilium if you're willing to invest.
    Criterion 2: Policy Enforcement & Observability

    Flannel: 3/10 (no native policies, requires Calico overlay)

    Calico: 8/10 (rich policy language, but not east-west observability)

    Cilium: 10/10 (Hubble provides packet-level flow visibility, L7 policies)

    Winner: Cilium for security/compliance; Calico for policy-heavy workloads.
    Criterion 3: Multi-Region Support

    Flannel: 5/10 (VXLAN works, but high bandwidth cost across regions)

    Calico: 9/10 (BGP with route reflection, designed for multi-region)

    Cilium: 7/10 (Cilium Mesh exists, still maturing)

    Winner: Calico for cost-efficient multi-region.
    Criterion 4: Load Balancing (replacing kube-proxy)

    Flannel: 0/10 (requires kube-proxy)

    Calico: 0/10 (requires kube-proxy)

    Cilium: 9/10 (eBPF-based, replaces kube-proxy, supports session affinity)

    Winner: Cilium if you want modern service proxy.
    Criterion 5: Resource Overhead

    Flannel: 9/10 (20-50m CPU, 100-200m memory per node)

    Calico: 6/10 (150-300m CPU, 300-500m memory)

    Cilium: 4/10 (500-800m CPU, 1G memory, but includes kube-proxy)

    Winner: Flannel for resource-constrained; Cilium competitive if you discount kube-proxy savings.
    Criterion 6: Community & Production Maturity

    Flannel: 9/10 (stable for years)

    Calico: 10/10 (widely deployed, mature)

    Cilium: 8/10 (growing adoption, still some instability reports)

    Winner: Calico; Flannel is safe; Cilium is modern.
    For a 300-node multi-region cluster, I'd recommend:

    Primary choice: Calico (BGP mode) if cost and stability are priorities

    Alternative: Cilium if you need advanced observability and want kube-proxy replacement

    Skip: Flannel for multi-region (high bandwidth costs)
    Decision logic:

    IF (policy_enforcement == "critical" AND observability == "high") THEN Cilium

    ELSE IF (multi_region == TRUE AND cost == "critical") THEN Calico

    ELSE IF (simplicity == "priority") THEN Flannel (but single-region only)
    Deployment timeline for Calico:

    Week 1: Lab testing (3 nodes, 2 regions)

    Week 2: BGP peer config with network team, policy design

    Week 3: Staging deployment (50 nodes, shadow traffic)

    Week 4: Production rollout (rolling 50 nodes/week)

    

    Risk mitigation:

    - Keep kube-proxy as fallback (don't remove immediately)

    - Test policy updates in canary namespace first

    - Monitor BGP stability during first 2 weeks

    - Maintain Flannel manifests for rollback
  
  
    Follow-up: Your cluster has mixed workloads: latency-sensitive services (1ms requirement) and batch jobs (cost-optimized). Can you use different CNIs for different workload types? Design this hybrid architecture.
  


  
    You've deployed Cilium with eBPF on a cluster running older Linux kernels (4.9). Services work sometimes. You're seeing random packet drops and sporadic connection resets. Cilium logs show "bpf verifier error." What's happening and how do you recover?
  
  
    eBPF programs require specific Linux kernel features. Older kernels (pre-5.x) have incomplete eBPF support, missing helpers, and verifier limitations. This causes runtime failures and unpredictable packet loss.
    Diagnosis:
    1. Check kernel version on nodes:

    kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.kernelVersion}{"\n"}{end}'

    If you see 4.9.x or 4.14.x, you've identified the problem.
    2. Verify eBPF verifier errors:

    kubectl logs -n cilium -l k8s-app=cilium | grep -i 'verifier'
# Look for: "invalid memory size" or "unreachable instructions"
    3. Check eBPF program load status:

    kubectl exec -n cilium cilium-xxxxx -- cilium bpf list
# Check if programs loaded successfully

    If programs show "FAILED", eBPF isn't actually active.
    4. Confirm kernel capabilities:

    ssh node-1 cat /boot/config-$(uname -r) | grep -E 'CONFIG_BPF|CONFIG_HAVE_EBPF_JIT|CONFIG_BPF_EVENTS'
# Should all be =y

    On older kernels, many of these will be missing or =m (module).
    Recovery options:
    Option A: Run Cilium in "legacy" mode without eBPF (immediate, but loses performance benefits)

    kubectl set env daemonset/cilium -n cilium CILIUM_DEVICE=eth0 CILIUM_DISABLE_EBPF=true
# Or via helm:
helm upgrade cilium cilium/cilium \
  --set ebpf.enabled=false \
  --set kubeProxyReplacement=partial
kubectl rollout restart daemonset/cilium -n cilium

    This falls back to iptables-based packet processing (kube-proxy-like). Performance degradation: ~10-15% throughput loss.
    Option B: Update nodes to newer kernel (2-3 hours downtime per node)

    ssh node-1 sudo apt-get update && sudo apt-get install -y linux-image-5.15
ssh node-1 sudo reboot
# Wait for node to rejoin cluster
kubectl wait --for=condition=Ready node/node-1 --timeout=5m

    After kernel upgrade, restart Cilium:

    kubectl set env daemonset/cilium -n cilium CILIUM_DISABLE_EBPF=false
kubectl rollout restart daemonset/cilium -n cilium
    Option C: Replace with Calico (safer, but requires CNI switch)

    kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
# See Network Policies question for full migration steps
    Immediate fix (stabilize cluster):

    1. Enable Cilium legacy mode NOW to restore stability

    2. Plan kernel upgrades for this weekend

    3. Test eBPF mode in staging with new kernels

    4. Gradually migrate nodes: drain node → upgrade kernel → rejoin → enable eBPF
    Prevention for future:
    - Document minimum kernel requirement (5.10+) in runbook

    - Add pre-flight check: cilium preflight kernel-check

    - Include kernel version in node provisioning script:

    #!/bin/bash
MIN_KERNEL_VERSION="5.10"
CURRENT=$(uname -r | cut -d. -f1,2)
if [ "$CURRENT" -lt "$MIN_KERNEL_VERSION" ]; then
  echo "ERROR: Kernel too old for Cilium eBPF"
  exit 1
fi
  
  
    Follow-up: You're pinned to old kernel due to legacy workload dependencies. How would you run Cilium alongside a kernel that doesn't support eBPF? Design a workaround.
  


  
    You have a legacy application that requires IP spoofing capability (custom network stacks, real-time packet shaping). Your CNI plugin (Calico) normally prevents this for security. How do you safely enable IP spoofing for specific pods while keeping default deny for others?
  
  
    IP spoofing is a privileged capability. CNIs block it by default via reverse-path filtering (rp_filter) and network namespacing. To allow selective spoofing, you need to bypass the CNI's restrictions at the pod level while maintaining cluster security.
    Approach: Use pod security policies + custom eBPF rules + network namespace overrides.
    Step 1: Create a SecurityPolicy for spoofing-enabled pods

    apiVersion: v1
kind: Namespace
metadata:
  name: legacy-network-apps
  labels:
    require-spoofing: "true"
---
apiVersion: policy.k8s.io/v1beta1
kind: PodSecurityPolicy
metadata:
  name: allow-spoof
  namespace: legacy-network-apps
spec:
  privileged: false
  allowPrivilegeEscalation: true
  capabilities:
    add:
    - NET_RAW  # Required for IP spoofing
    - NET_ADMIN
  fsGroup:
    rule: 'MustRunAs'
    ranges:
    - min: 1
      max: 65535
    Step 2: Create RBAC to restrict which pods can get NET_RAW

    apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spoof-pod-creator
  namespace: legacy-network-apps
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["create", "get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spoof-pod-binding
  namespace: legacy-network-apps
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: spoof-pod-creator
subjects:
- kind: ServiceAccount
  name: spoof-app
  namespace: legacy-network-apps
    Step 3: Configure reverse-path filter bypass for these pods

    Use an init container to disable rp_filter inside the pod's network namespace:

    apiVersion: v1
kind: Pod
metadata:
  name: legacy-app
  namespace: legacy-network-apps
  annotations:
    requires-spoofing: "true"
spec:
  serviceAccountName: spoof-app
  initContainers:
  - name: disable-rp-filter
    image: busybox:latest
    command:
    - /bin/sh
    - -c
    - |
      sysctl -w net.ipv4.conf.all.rp_filter=0
      sysctl -w net.ipv4.conf.default.rp_filter=0
    securityContext:
      privileged: true
  containers:
  - name: app
    image: your-legacy-app:latest
    securityContext:
      capabilities:
        add:
        - NET_RAW
        - NET_ADMIN
      runAsUser: 1000
    Step 4: Network policy: Isolate spoofing pods

    Even with spoofing enabled, restrict their network access to prevent lateral movement:

    apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: isolate-spoof-pod
  namespace: legacy-network-apps
spec:
  podSelector:
    matchLabels:
      app: legacy-app
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 9090
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: external-networks
    ports:
    - protocol: UDP
      port: 53  # DNS only
  - to:
    - ipBlock:
        cidr: 10.20.0.0/16  # Specific destination for packet shaping
    Step 5: Verify spoofing capability

    kubectl exec legacy-app -- cat /proc/sys/net/ipv4/conf/all/rp_filter
# Should return: 0 (disabled)

    Test IP spoofing:

    kubectl exec legacy-app -- scapy
>>> from scapy.all import IP, send
>>> send(IP(src="192.168.1.100", dst="10.0.0.1")/ICMP())
# Should send packets with spoofed source IP
    Step 6: Monitoring & Alerting

    Log spoofing activity for compliance:

    kubectl exec legacy-app -- tcpdump -i eth0 'ip.src != $(hostname -i)' -l | tee /var/log/spoofed-packets.log

    Alert if spoofing pod sends traffic to unexpected destinations:

    - alert: UnauthorizedSpoofedTraffic
  expr: rate(egress_packets{pod_label_requires_spoofing="true",destination_namespace!="external-networks"}[5m]) > 100
  annotations:
    summary: "Spoofing pod {{$labels.pod_name}} sending traffic outside approved range"
    Security audit trail:

    kubectl get events -n legacy-network-apps | grep -E 'NET_RAW|privileged'
kubectl audit logs | grep 'legacy-network-apps' | jq '.requestObject.spec.securityContext'
  
  
    Follow-up: How would you monitor for unauthorized IP spoofing attempts across your cluster? Design a detection system that flags suspect network activity.
  


  
    You've deployed Cilium in a cluster with thousands of pods. After a week, you notice pod-to-pod DNS queries are failing intermittently (1-2% of requests). The issue is DNS resolution timing out. Your infrastructure team says "Network is fine, check your CNI." Cilium's DNS proxy might be the culprit. How do you debug and fix this?
  
  
    Cilium includes a DNS proxy for security and observability. If it's misconfigured or overloaded, DNS queries timeout and pods can't reach services by hostname.
    Diagnosis:

    1. Verify DNS is failing:
kubectl run debug-pod --image=nicolaka/netshoot -it --rm -- nslookup kubernetes.default
# Try multiple times: some succeed, some timeout?

Check Cilium DNS proxy status:
kubectl exec -n cilium cilium-xxxxx – cilium config | grep -i dns

Should show: dns-proxy-enabled=true (or similar)

Monitor DNS query metrics:
kubectl port-forward -n cilium svc/cilium-agent 6543:6543 &
curl http://localhost:6543/metrics | grep -i dns | head -20

Look for: cilium_dns_queries_total, cilium_dns_failures_total

Check Cilium logs for DNS errors:
kubectl logs -n cilium -l k8s-app=cilium --tail=100 | grep -i 'dns|proxy'
 Root causes (most common):

 1. DNS proxy is CPU-saturated (high load, small replicas)

 2. Upstream DNS server (kube-dns/coredns) is slow

 3. DNS query caching is misconfigured

 4. DNS proxy pod is overloaded with too many queries
 Fix 1: Increase Cilium DNS proxy resources

 kubectl set resources daemonset/cilium -n cilium \


–limits=cpu=1000m,memory=1Gi 

–requests=cpu=500m,memory=512Mi
Fix 2: Enable DNS query caching:

kubectl exec -n cilium cilium-xxxxx – cilium config 

–set dns-cache-enabled=true 

–set dns-cache-min-ttl=300 

–set dns-cache-max-ttl=86400
Fix 3: Monitor upstream DNS performance:

kubectl run dns-perf-test --image=alpine -it --rm – 

time nslookup kubernetes.default

Measure query time, should be <100ms

If upstream is slow, scale coredns:

kubectl scale deployment coredns -n kube-system --replicas=3
kubectl get deployment coredns -n kube-system
Prevention:

- alert: DNSQueryLatency
expr: histogram_quantile(0.95, cilium_dns_query_latency_seconds_bucket) > 0.5
for: 5m
annotations:
summary: "DNS queries p95 latency > 500ms"


alert: DNSProxyErrors
expr: rate(cilium_dns_failures_total[5m]) > 0.01
annotations:
summary: "DNS proxy error rate {{ $value }}/sec"

  Follow-up: How would you troubleshoot DNS resolution if you suspect the problem is with the application's DNS client (retry behavior, timeout settings) vs. the CNI?