Your Redis instance is configured with maxmemory 32GB and policy allkeys-lru. Today, memory usage hit 32GB and clients start seeing error: OOM command not allowed when used memory > maxmemory. But the eviction policy should kick in. DEBUG OBJECT on keys shows 10 keys still in memory with high idletime (30+ days). Why isn't LRU evicting them?
LRU isn't kicking in because maxmemory-policy allkeys-lru only evicts when a new command tries to allocate memory and would exceed maxmemory. If memory is already at maxmemory but no new writes are attempted, no eviction occurs. The OOM error means you hit maxmemory before eviction could free space. Check CONFIG GET maxmemory-samples (default 5)—low sample sizes mean LRU candidates are chosen from a small pool, missing true LRU keys. Increase to CONFIG SET maxmemory-samples 100. Also check eviction with INFO stats: look for evicted_keys counter. If 0, eviction never triggered. To diagnose: run OBJECT IDLETIME
Follow-up: If you switch from allkeys-lru to allkeys-lfu, what changes in terms of which keys get evicted?
Your cache has maxmemory-policy volatile-ttl (evict keys with soonest expiration). You notice that during peak traffic (10K QPS), random cache misses spike and users report stale data being served from a fallback source. Memory is at 95% of maxmemory. SLOWLOG shows EVICTED_KEYS counter increasing by 1000/sec. But your app doesn't intend to evict keys. What's happening?
Volatile-ttl is evicting keys too aggressively because most keys have short TTLs (seconds or minutes). During peak traffic, memory pressure intensifies and eviction becomes frequent, causing cache misses on keys you intended to keep. Check INFO stats: evicted_keys should show recent growth. Use MEMORY DOCTOR to analyze key distribution and estimate ideal maxmemory. Root cause analysis: (1) run RANDOMKEY in a loop and call TTL on each to see TTL distribution, (2) use SCAN and collect TTL stats: redis-cli --stat | grep evicted_keys to watch eviction in real-time. Fix: (1) increase maxmemory to reduce pressure (e.g., 64GB instead of 32GB), or (2) change policy to allkeys-lru to keep frequently-accessed keys instead of just evicting by TTL, or (3) reduce TTL values on less-critical keys to focus eviction on those, or (4) implement two-tier caching: hot keys in Redis with short TTLs, cold keys in DynamoDB/memcached. For peak traffic: use MEMORY STATS to find oversized keys (e.g., large hashes/lists) and split them, use pipelining to reduce round-trips and memory churn. Test with redis-benchmark -c 100 -t get,set -q and monitor eviction in real-time.
Follow-up: How would you implement a "do not evict" policy for certain keys while allowing others to be evicted?
You just deployed a new feature that caches user session data in Redis. Within 30 minutes, Redis memory grows from 10GB to 28GB (maxmemory is 32GB). No eviction occurs, and new SET commands start returning OOM errors. You check the data: 10M keys with an average size of 2.8 KB each (~28GB). This is expected, but eviction should have kicked in when approaching maxmemory. Why didn't it?
The issue is memory accounting vs actual allocation. Redis reports memory usage via INFO memory > used_memory_rss (resident set size), but maxmemory is checked against used_memory (logical bytes allocated). If fragmentation is high (used_memory_rss / used_memory > 1.5), Redis might think it has free memory when the OS shows it's swapping. Check CONFIG GET maxmemory-policy and verify it's not noeviction (which never evicts). If policy is correct, the problem is likely: (1) eviction threshold logic: eviction starts BEFORE maxmemory, not after. Set maxmemory-samples 100 and restart redis-server to recalculate eviction candidates. (2) Master-replica lag: if this is a replica, maxmemory enforced differently—replicas can exceed maxmemory during replication. Run INFO replication to check role. (3) Persistence overhead: if bgsave is running (INFO persistence > rdb_bgsave_in_progress: 1), memory temporarily doubles (copy-on-write). Wait for bgsave to finish. Immediate fix: (1) run CONFIG SET maxmemory 40GB to increase headroom, (2) run MEMORY PURGE to clean fragmentation, (3) add keys with expiration: instead of persistent keys, set EXPIRE on session keys (e.g., 1 hour). Long-term: monitor memory before reaching 90%: set up alert when used_memory_rss approaches 90% of maxmemory and trigger preemptive scaling.
Follow-up: If fragmentation is high (e.g., 1.8x), what causes it and how do you reduce it without restarting Redis?
You're running Redis with maxmemory-policy allkeys-lfu for a recommendation engine. After 1 week of production, you notice that some keys that should be frequently accessed are being evicted. Using OBJECT FREQ on them shows low frequency. But your logs show the app queries these keys 100 times per day. Why is LFU evicting high-frequency keys?
LFU tracks frequency at 1-second granularity by default, but after 1 week, old frequency data decays (Redis implements exponential decay to avoid old keys being stuck). Check CONFIG GET lfu-decay-time (default 1): this is minutes between decay ticks. If a key hasn't been accessed in the past few minutes, its frequency counter decays significantly, making it look "cold" even if it was hot historically. Run OBJECT FREQ on multiple keys to check: if all show low freq (1-5), decay is too aggressive. Increase lfu-decay-time: CONFIG SET lfu-decay-time 5 to decay less frequently. Also check lfu-log-factor (default 10): higher values give more granularity to frequency tracking. Increase to CONFIG SET lfu-log-factor 15. These changes take effect immediately on new accesses but don't retroactively update old keys. To reset: (1) manually clear frequency by running OBJECT FREQ on all keys after CONFIG SET changes (won't clear old data though), or (2) implement key versioning: append version suffix to keys (user:123:v2) to treat them as new keys with fresh frequency. For recommendation engine: use separate cache tiers: hot keys (accessed >1/min) in Redis with LFU, medium-frequency keys in L2 cache (e.g., memcached), cold keys in DB. Test with redis-benchmark -c 1000 -t get -q --keyspace 100 to verify LFU behavior under load.
Follow-up: If you need consistent LFU behavior across restarts (frequency counters don't survive restart), how would you preserve this data?
Your Redis is set to volatile-random eviction with maxmemory 64GB. During a traffic spike, you see 50% cache misses and SLOWLOG shows 100K evictions/sec. The problem: most keys don't have TTL set, so volatile-random has very few candidates to choose from and cycles through the same few keys repeatedly. How do you fix this without restarting?
Volatile-random with mostly non-expiring keys is a bad policy—it will evict the same small subset repeatedly. First, identify which keys have TTL: use RANDOMKEY in a loop and check TTL. If >50% show -1 (no TTL), your key population is TTL-deficient. Options: (1) Change policy dynamically: CONFIG SET maxmemory-policy allkeys-lfu to evict all keys by frequency instead of just TTL-expiring ones. This takes effect immediately. (2) Backfill TTLs on existing keys: use SCAN 0 MATCH
Follow-up: If migrating to allkeys-lfu causes evictions of keys your app doesn't expect, how would you whitelist certain keys from eviction?
You're building a multi-tenant Redis cache where each tenant has a quota. Tenant A has quota 10GB, Tenant B has 10GB (total maxmemory 32GB, 12GB reserved for ops). After 2 months, Tenant B grows to 15GB (exceeds quota). Tenant A complains that their cache misses increased because Tenant B is consuming their space. How do you enforce per-tenant limits?
Redis doesn't natively support per-tenant memory limits. You need application-level enforcement. Implement: (1) Key namespace: prefix all keys with tenant ID (tenant:A:user:123). Use SCAN tenant:A:* and sum MEMORY USAGE for each key to track per-tenant memory. (2) Set eviction per tenant: use RANDOMKEY to find keys under tenant:B:* when Tenant B exceeds quota, then UNLINK those keys (faster than DEL). (3) Use MEMORY STATS and MEMORY USAGE
Follow-up: How would you implement a fair eviction strategy if Tenant B exceeds quota—should you evict from Tenant B only, or also proportionally from Tenant A?
Your Redis uses maxmemory 100GB and policy allkeys-lru. Memory grows predictably to ~95GB during peak hours, then shrinks as traffic dies down. But during a DDoS attack (1M requests/sec to random keys), memory spikes to 120GB and Redis crashes with SIGKILL (out of memory). SLOWLOG shows eviction-delay for SET commands. How do you prevent OOM crash and keep Redis available during attacks?
The problem is: under extreme load, eviction can't keep up with incoming writes. Each SET allocates memory first, then eviction checks maxmemory after. If allocation is very large (e.g., 1GB key) or requests are very fast (1M/sec), eviction lags behind and OOM occurs. Prevention layers: (1) Set CONFIG maxmemory-reserve 10GB as a hard stop before OOM. This tells Redis to stop accepting writes when memory reaches maxmemory - reserve. Trade-off: less cache, but prevents crash. (2) Use CONFIG SET maxmemory-samples 1000 (higher values = more accurate LRU selection, faster eviction). (3) Set max-clients 100000 to limit connections—fewer connections = fewer concurrent writes = less memory churn. (4) Configure TCP backlog: CONFIG SET tcp-backlog 1024 to queue requests and smooth spikes. (5) Implement rate-limiting at application level: reject requests if QPS > threshold to prevent DDoS amplification. For immediate DDoS mitigation: (1) use iptables to rate-limit incoming traffic (e.g., 10K pps per IP), (2) enable Redis protection: CONFIG SET protected-mode yes to reject unauthenticated clients, (3) deploy Redis behind a reverse proxy (Redis Commander, Envoy) that rate-limits per connection. Test with redis-benchmark -c 1000 -t set -q --pipe to simulate attack and monitor memory growth. Use LATENCY DOCTOR to identify eviction delays during spike.
Follow-up: If you can't prevent the OOM crash, how would you design failover to backup Redis and recover data?