Kafka Interview — Quotas, Rate Limiting, and Multi-Tenancy

One data engineering team's producer is saturating your Kafka cluster: 5M msg/sec (50% of total cluster capacity). Other teams' throughput drops to 1-2K msg/sec. Design a quota system to fairly isolate this team's traffic.

Kafka quotas: Enforce per-principal (user/application) rate limits. Producer quota = max bytes/sec. Consumer quota = max bytes/sec.

Diagnosis: (1) Monitor metric kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec per client. (2) Identify top producer: use JMXTrans or Prometheus scrape. (3) Find principal (username) of top producer via audit logs or Kafka metrics.

Quota design for fair sharing: (1) Total cluster capacity: 10M msg/sec = 10GB/sec = 10,000 MB/sec. (2) 5 teams total. Fair share: 2,000 MB/sec per team. (3) Set quotas: kafka-configs.sh --bootstrap-server localhost:9092 --entity-type users --entity-name data-eng-team-1 --alter --add-config producer_byte_rate=2097152000 (2GB/sec).

Quota enforcement: Broker delays responses for quota-violating clients. If team exceeds quota, their produce request is throttled (delayed, not rejected). Wait time = (bytes_over_quota / rate_limit). This allows short bursts but prevents sustained overload.

Configuration: quota.producer.default=1048576000 (1GB/sec default for all producers not explicitly configured). Then set specific quotas for teams.

Monitoring: Track kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,clientSoftwareName=.* to see how many requests are delayed by quota. If high, quotas are too aggressive.

Production example at Uber: 100+ teams producing to central Kafka cluster. Per-team quota = 100 MB/sec. Prevents single misbehaving team from affecting SLA for others. Quota burst window = 5 seconds (allows temporary spikes).

Follow-up: If you set producer quota to 2GB/sec but a team's workload needs 3GB/sec for one hour daily (batch job), how do you handle this without penalizing them or affecting other teams?

Quotas are enabled. A producer hits the quota limit and receives throttling: 1-second delay. The producer's application retries aggressively. Retry storms cause broker overload. Design backoff strategy.

Root cause: When producer is throttled (receives error code THROTTLED_RESPONSE), some client libraries aggressively retry (retry.backoff.ms=100ms, retries=3). 100 concurrent producers × 3 retries = 300 requests pile up, further overloading broker.

Mitigation 1 - Exponential backoff: Use client library's exponential backoff: retry.backoff.ms=100, retry.backoff.max.ms=10000. Each retry waits 100ms × 2^attempt (100ms, 200ms, 400ms, ..., capped at 10s).

Mitigation 2 - Respect throttle time: Kafka throttle response includes suggested wait time (throttle_time_ms). Smart clients wait exactly that long before retrying. Configure: reconnect.backoff.ms=throttle_time_ms.

Mitigation 3 - Circuit breaker: If producer receives 3 consecutive THROTTLED responses, circuit opens: producer stops sending for 30s, then tries again. This gives quota time to reset.

Mitigation 4 - Request batching: Instead of sending 1000 small messages individually, batch into 10 large messages. Fewer requests = less quota impact. Set linger.ms=100, batch.size=1MB.

Production pattern at Stripe: Their client libraries respect throttle_time_ms from broker. When throttled, they wait exactly the broker's suggested time, then retry once. No retry storms.

Follow-up: If throttle_time_ms is 5 seconds and your SLA requires <100ms latency, should you increase quota or redesign the application?

You have different priority tiers: team A (critical), team B (normal), team C (best-effort). During overload, team C should be deprioritized. How do you implement priority-based quotas?

Kafka native quotas: Kafka doesn't natively support priority tiers. Quotas are flat: principal A gets 2GB/sec, principal B gets 1GB/sec, etc. No priority during overload.

Solution 1 - Quota ratios: Allocate quotas based on importance: Team A (critical) = 5GB/sec, Team B (normal) = 2GB/sec, Team C (best-effort) = 500MB/sec. During overload, Team C hits quota first and gets throttled. Other teams continue. Trade-off: requires manual quota sizing.

Solution 2 - Burst allowance: Set default quota = 1GB/sec, but allow burst up to 3GB/sec for 10 seconds. Configure: quota.producer.default=1048576000, quota.window.num=10 (10 seconds window for burst). Critical teams get higher default, best-effort gets lower.

Solution 3 - Multi-tenancy broker: Run separate Kafka brokers for each tier: critical cluster, normal cluster, best-effort cluster. Critical cluster has premium hardware, dedicated resources. Overload in best-effort cluster doesn't affect critical. Cost: 3x broker infrastructure.

Solution 4 - Application-level prioritization: Implement priority queue on producer side. High-priority messages are sent first. Low-priority messages are batched, sent only when capacity allows. Kafka level sees all traffic as equal; app enforces priorities.

Production at Amazon: They use Solution 1 + Solution 4 hybrid. Quotas provide baseline isolation. Application queues add priority scheduling on top.

Follow-up: If critical team's quota is 5GB/sec but they only use 2GB/sec, can you dynamically allocate the unused 3GB/sec to normal team? How do you prevent quota reservation hoarding?

You have 10 consumer groups sharing one topic. One group is lagging (consuming slowly). Its slowness is proportional to other groups' throughput (they share broker resources). Design consumer quota isolation.

Consumer quota model: Similar to producer. Set consumer quota = max fetch bytes/sec per consumer group. Example: kafka-configs.sh --entity-type client-ids --entity-name consumer-group-1 --alter --add-config consumer_byte_rate=1048576000 (1GB/sec).

Diagnosis of lag: Use kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group consumer-group-1 --describe. Check LAG column. If LAG > 0 and increasing, consumer is slower than producer rate.

Root causes of lag: (1) Consumer processing is slow (app logic bottleneck, not quota); (2) Consumer is throttled (consumer quota hit); (3) Network/GC pauses; (4) Broker is overloaded (shared resource).

Isolation strategy: (1) Set per-group consumer quota: Group A (critical) = 2GB/sec, Group B (normal) = 500MB/sec. (2) If Group B hits quota, it gets throttled; Group A continues at full rate. (3) Monitor lag independently for each group; if lag persists despite quota, root cause is not quota (look at processing time).

Verification: Temporarily increase Group B's quota to 2GB/sec and measure lag. If lag drops, quota was bottleneck. If lag unchanged, processing is slow.

Trade-off: High consumer quota (2GB/sec) requires broker to serve data fast, consuming broker CPU/network. Low quota prevents lag but increases latency. Find balance: set quota to 80% of broker's per-group capacity.

Follow-up: If you have 10 consumer groups with 1GB/sec quota each = 10GB/sec total consumption, but broker can only serve 5GB/sec, which groups get starved? Is this random or deterministic?

A producer is hitting quota limit. You increase quota from 1GB/sec to 2GB/sec. Producer traffic jumps to 1.9GB/sec, then plateaus. Is producer quota-limited or bottlenecked elsewhere?

Diagnosis: Producer went from quota-limited (dropped to max 1GB/sec) to still-limited-but-higher (1.9GB/sec after quota bump). It didn't fully utilize new quota (2GB/sec), so something else is bottlenecking.

Investigation: (1) Check producer latency: if p99 latency increased (e.g., 5ms → 50ms), producer is backing off. Likely broker is overloaded; (2) Check broker CPU: if broker CPU is 95%+, broker can't serve faster; (3) Check network: if network interface is 80%+ saturated, network is bottleneck; (4) Check GC pauses: if broker has 500ms GC pause every 10s, latency jitter causes producer to slow down (adaptive backoff); (5) Check partition leadership: if partition leader is on slow broker, rebalance to faster broker.

Root cause likely: Broker CPU or GC. Quota was artificial limiter; now removed, natural bottleneck is exposed.

Solutions: (1) Scale broker: add more CPU, faster storage; (2) Reduce message size or increase batching; (3) Monitor broker JVM: tune GC settings (-Xms8G -Xmx8G for fixed heap size, reduces GC frequency); (4) Rebalance partitions to less-loaded brokers.

Production insight at Confluent: Many customers hit broker ceiling (~100MB/sec per broker for large messages). Increasing quota helps only until natural bottleneck appears.

Follow-up: If broker is at 95% CPU and you scale to 10 brokers (10x capacity), but producer still can't reach 2GB/sec, what else could be limiting? Is it producer-side (network saturation, app slowness)?

You have quotas configured in ZooKeeper (classic Kafka). You migrate to KRaft. Where are quotas stored now? Can you migrate quota configs without downtime?

ZooKeeper quota storage: Quotas are stored in /config/users and /config/clients in ZK. Example: /config/users/producer-app/producer_byte_rate=1048576000.

KRaft quota storage: Quotas are stored in the metadata log (like topics, ACLs). They're compacted and replicated across KRaft controllers.

Migration steps: (1) Export all quotas from ZK: kafka-configs.sh --bootstrap-server zk-host:2181 --entity-type users --describe (this queries ZK directly); (2) During migration to KRaft (broker dual-write phase): keep ZK running, quotas still in ZK. (3) After brokers are fully on KRaft (can read from KRaft controller), manually re-apply quotas to KRaft: kafka-configs.sh --bootstrap-server kraft-broker:9092 --entity-type users --entity-name producer-app --alter --add-config producer_byte_rate=1048576000; (4) Verify quotas are in KRaft: kafka-configs.sh --bootstrap-server kraft-broker:9092 --entity-type users --describe; (5) Remove from ZK (after verification).

Downtime prevention: During step 3-4 (re-apply to KRaft), ZK quotas are still active (dual enforcement). Brokers query both ZK and KRaft for quotas (whichever is lower limit). No producer is over-quota. After step 5, only KRaft quotas apply.

Edge case: If quota is in ZK but not yet in KRaft during transition, broker might enforce lower limit (min of two), causing unexpected throttling. Mitigation: apply KRaft quota first, then remove ZK quota 5 minutes later (after all brokers have fetched new quota).

Follow-up: If you forget to apply quotas to KRaft before decommissioning ZK, all producers become unlimited. How do you recover without downtime?

Your cluster has 1000 producer applications. Each should have unique quota. Manual configuration of 1000 quotas is error-prone. Design an automated, scalable quota system.

Automated quota provisioning: (1) Applications register quota need via API: POST /quotas { app_id: "order-processor-v2", producer_byte_rate: 100MB/sec }; (2) Quota service validates and stores in database; (3) Kafka sync job periodically reads database and applies to cluster: for each app: kafka-configs.sh --entity-name {app} --alter --add-config producer_byte_rate={rate}; (4) Monitor quota usage and auto-scale: if app consistently uses 80% of quota, trigger alert for team to request higher quota.

Quota database schema:

{ app_id, principal, producer_byte_rate, consumer_byte_rate, created_at, updated_at, approved_by, business_justification }

Approval workflow: (1) App requests quota via UI; (2) Platform team reviews (business justification, resource availability); (3) If approved, quota is applied automatically via sync job; (4) If rejected, requester is notified with reason.

Auto-scaling quotas: Monitor usage metric kafka.server:type=ClientQuotaMetrics,clientId={app},name=BytesOutPerSec. If 90th percentile > 0.8 × quota_limit for 7 days, auto-escalate quota by 50% (with team notification).

Production at Kafka at Databricks: 5000+ applications share multi-tenant clusters. Automated quota provisioning system handles scaling. Average approval time: 2 minutes. Error rate: <0.1% (misconfigured quotas).

Follow-up: If an app's quota is auto-escalated and it starts consuming 10x more data (due to bug), how do you detect and revert within minutes? Should you implement hard limits or soft warnings?