Redis Interview — Persistence: RDB vs AOF Tradeoffs

Your Redis primary has RDB enabled (save "" every 1 hour). A critical query bug causes 100K mistaken FLUSHALL operations in 2 seconds. RDB snapshot was 5 minutes old. After rollback, you're missing 55 minutes of data writes. Your backup team asks: "Why not use AOF?" How do you evaluate RDB vs AOF for this scenario?

RDB captures point-in-time snapshots, so you can only recover to the last snapshot (5 minutes ago). AOF logs every write and could recover to 55 seconds ago (if fsync: everysec). However, AOF also logs the FLUSHALL, so recovery isn't automatic—you'd need to remove FLUSHALL from the AOF file. RDB is better for: (1) fast startup (snapshot is already serialized), (2) low disk writes (one file per hour vs every write). AOF is better for: (1) fine-grained recovery (rewind to before the bug), (2) durability (every write is durable with fsync: always). For this scenario: implement hybrid persistence (Redis 4.0+): use RDB for snapshots AND AOF with fsync: everysec. This gives: (1) fast crash recovery via RDB, (2) recent-writes durability via AOF. Configure: save "" (disable RDB background), appendonly yes, appendfsync everysec, and use bgrewriteaof weekly to compact AOF. In the bug scenario: restart with AOF, then manually remove FLUSHALL entries from appendonly.aof using sed: sed -i '/FLUSHALL/d' appendonly.aof before restarting. Better prevention: (1) use ACLs to block FLUSHALL from app servers, (2) implement soft-delete: mark records as deleted instead of flushing, (3) run replica-only queries for critical operations, (4) enable rewrite-incremental-fsync to avoid long pauses during AOF rewrite.

Follow-up: If you must use RDB but want finer-grained recovery, how can you increase snapshot frequency without impacting performance?

You're running Redis with AOF enabled, fsync: always (every write syncs to disk). Disk I/O is fine, but SLOWLOG shows some SET commands taking 50-100ms (normal is <1ms). Profiling shows the delay is in AOF fsync calls. Your ops team is concerned about data loss if you switch to fsync: everysec. What's the right tradeoff?

fsync: always is the safest (no data loss on crash) but slowest (every write waits for disk). fsync: everysec syncs every 1 second (batch writes), losing up to 1 second of data on crash (acceptable for most). fsync: no never syncs (OS decides, can lose minutes on power loss). The tradeoff: (1) fsync: always: zero data loss, but 50-100ms latency per write. (2) fsync: everysec: ~1s RPO (recovery point objective), <1ms latency. (3) fsync: no: millisecond latency, but hours of potential data loss. For most production apps, everysec is the sweet spot. To implement: CONFIG SET appendfsync everysec on all replicas first (verify no issues), then on primary. Monitor with LATENCY DOCTOR before/after. If you absolutely need fsync: always but can't afford 50ms latency: (1) use SSD (faster fsync), (2) use battery-backed write cache on disk controller, (3) batch operations: pipeline multiple SETs into one round-trip (e.g., 10 SETs in 5ms instead of 10 * 50ms). Verify data safety: (1) test crash recovery—kill -9 the Redis process and restart, check if data matches expectations, (2) use redis-check-aof to validate AOF integrity after crash. Long-term: measure actual data loss tolerance: if losing 1 second of cache data is acceptable (yes for most caches), use everysec. If running Redis as a DB (not cache), consider stronger durability.

Follow-up: If you switch to fsync: everysec and lose 1 second of data during a crash, how would you detect and alert on this data loss?

You upgraded to Redis 7.0 and enabled hybrid persistence: RDB + AOF. After 1 week, your AOF file is 50GB while RDB is only 2GB. You're concerned about startup time and disk usage. Also, bgrewriteaof is taking 2 hours and blocking writes during the rewrite. How do you optimize?

RDB is 2GB because it's compressed. AOF is 50GB because it logs every command uncompressed. If AOF has many redundant commands (e.g., SET repeatedly), bgrewriteaof should compact it, but (1) rewrite-incremental-fsync is slow (default: 32MB chunks flushed to disk, causing delays), or (2) commands aren't truly redundant (each overwrite is logged), or (3) rewrite is configured to run rarely (default 64MB AOF size before rewrite). Fix: (1) run bgrewriteaof manually during off-peak to pre-compact the AOF: redis-cli BGREWRITEAOF. (2) lower the rewrite threshold: CONFIG SET auto-aof-rewrite-percentage 20 (rewrite when AOF grows 20% since last rewrite, default 100%), or CONFIG SET auto-aof-rewrite-min-size 64mb (minimum size before rewriting). (3) for slow rewrites: increase rewrite-incremental-fsync: CONFIG SET rewrite-incremental-fsync 256000000 (256MB chunks, faster but longer blocking time). After rewrite, the AOF should shrink significantly. To verify: run BGREWRITEAOF and monitor with INFO persistence > aof_rewrite_in_progress. If rewrite is stalled, check disk space (full disk stops rewrite). For faster startup: rely more on RDB, reduce AOF: set appendonly no during off-peak and manually run BGSAVE, then re-enable AOF. Test startup time: time redis-cli SHUTDOWN and time redis-server /path/to/redis.conf. Long-term: implement tiered storage—archive old AOF files to S3 and only keep 1 day of local AOF.

Follow-up: If bgrewriteaof fails midway (OOM or disk full), how would you recover the original AOF and safely retry?

Your Redis primary and replica are configured with RDB persistence. Primary is backed up every 6 hours via BGSAVE. After a primary crash, you restore from a 6-hour-old backup, but the replica is 4 hours ahead (it was still running and accepting reads). Clients complained about reading stale data from the replica for days after recovery. How do you design persistence to avoid this?

The issue: replicas don't persist data by default (only receive updates via replication stream). When primary crashes and you restore from old backup, the replica's data becomes inconsistent with the recovered primary (and more recent than it!). Fix: (1) enable persistence on replica: set save "60 1000" on replica config so replica also snapshots. (2) on failover, check replication offset: if replica_repl_offset > restored_primary_repl_offset, replica is ahead—promote replica as new primary (it has fresher data). (3) implement dual-backup: (a) backup primary's RDB, (b) backup replica's RDB separately. On primary crash, compare timestamps: use the newer backup. Configure: on replica, set replica-read-only no and enable appendonly yes + save commands. This makes replica act like a primary (ready for promotion). When restoring: (1) restore primary from latest backup, (2) check LASTSAVE timestamp, (3) compare with replica's LASTSAVE from INFO persistence, (4) if replica > primary, promote replica: run REPLICAOF NO ONE on replica, then SLAVEOF on the old primary. Test: (1) simulate primary crash by killing -9 redis-server on primary, (2) measure time to restore and verify data matches replica, (3) measure client impact (error rate) during restore. Verify persistence is working: run redis-cli LASTSAVE every 60 seconds and alert if timestamp isn't advancing.

Follow-up: If both primary and replica crash simultaneously, how do you determine which backup to restore from and avoid data loss?

You're using AOF with fsync: everysec. During a sustained write spike (100K writes/sec), AOF file size grows 500MB/min. After 1 hour, AOF is 30GB. Disk starts to fill up, and after 2 hours, disk is 95% full. AOF stops flushing (disk full error). Redis is now at risk of OOM and AOF corruption. How do you prevent and recover?

This is a perfect storm: high write rate + limited disk + no early warning. Prevention: (1) monitor AOF file size: alert when it reaches 50% of available disk space. Use du -sh /var/lib/redis/appendonly.aof every 60 seconds. (2) configure auto-rewrite: CONFIG SET auto-aof-rewrite-percentage 50 (rewrite when AOF grows 50% since last rewrite). This compacts redundant commands and shrinks the file. (3) increase rewrite frequency: CONFIG SET auto-aof-rewrite-min-size 10mb (trigger rewrite more often). (4) switch to RDB during high write periods: temporarily set appendonly no and increase BGSAVE frequency. Recovery (disk full): (1) immediately trigger bgrewriteaof to compact AOF: redis-cli BGREWRITEAOF. If this fails (insufficient disk), proceed to (2). (2) move AOF to a larger disk or S3: redis-cli SHUTDOWN (safe shutdown, persists final AOF), then cp appendonly.aof /mnt/larger-disk/. Update redis.conf to point to new dir, restart. (3) if AOF is corrupted (partial writes on full disk): use redis-check-aof to find corruption point and truncate: redis-check-aof --fix appendonly.aof. (4) switch to RDB-only for now: CONFIG SET appendonly no and BGSAVE, then safely clean up AOF. Test disk growth simulation: write 100K keys of 1MB each and monitor disk usage vs time. Set up alerts for disk usage >80%.

Follow-up: If you need fast recovery (minutes, not hours) from a 30GB AOF file after a crash, what optimization would you prioritize?

You're running Redis with RDB snapshots. A developer accidentally commits code that corrupts the RDB format (e.g., wrong serialization of custom data). The corrupt RDB is saved and replicated to all replicas. Now all Redis instances refuse to start: redis-server exits with "corrupted RDB file". You have no recent clean backup. How do you recover?

Corrupt RDB = can't restart Redis = production outage. If you have AOF enabled alongside RDB, use AOF for recovery: (1) stop redis-server on all instances, (2) delete the corrupted dump.rdb (mv dump.rdb dump.rdb.bak), (3) restart redis-server. Redis will replay AOF and recover (slower but safe). If no AOF: (1) check if corrupted RDB is recent (BGSAVE was running when crash occurred). Use LASTSAVE timestamp to estimate corruption window. (2) restore from a clean backup from before the corruption (e.g., 6 hours ago). Accept data loss for the interim period. Prevention: (1) enable AOF (appendonly yes) as a safety net. AOF format is line-based (human-readable) and more resilient to serialization bugs. (2) test RDB loading: run redis-cli --rdb on replicas periodically (doesn't require restart), then delete . (3) use redis-check-rdb to validate RDB before it's saved: (a) run redis-server with flag --checking-rdb, (b) or run redis-check-rdb /var/lib/redis/dump.rdb to validate. For immediate recovery: (1) export clean data from a replica (if one exists), (2) use redis-cli MONITOR on a working replica to capture recent commands, replay them. (3) implement multi-backup strategy: keep 3 RDB snapshots (current, 1-hour-old, 6-hour-old) on different disks.

Follow-up: If the corruption was caused by a memory issue or CPU bug (not code), how would you verify data integrity before resuming production?

You're implementing Redis as a critical cache for a payment system. Persistence is non-negotiable: zero data loss on crash. But you're concerned about fsync: always slowing down transactions. You have SSD and battery-backed cache. What persistence strategy balances durability and performance?

For payment systems, durability trumps performance. However, you can optimize: (1) use hybrid RDB + AOF: RDB for fast recovery (startup), AOF for durability (every write). Configure: save "" (disable background RDB), appendonly yes, appendfsync always, auto-aof-rewrite-min-size 100mb. On startup, Redis loads RDB first (fast), then replays AOF (ensures durability). (2) with SSD: fsync: always is acceptable—SSDs have fast fsync (< 5ms typically). Test: run redis-benchmark -t set -c 1 -q and measure latency before/after. (3) with battery-backed cache: enable write caching on disk controller and rely on battery to flush data during power loss. Verify with vendor's cache performance metrics. (4) confirm durability: after each critical transaction, run CLIENT SETNAME critical and verify AOF writes with redis-cli SYNC > /dev/null (initiates full resync, forces AOF sync). (5) run crash tests: kill -9 redis-server and restart, verify all transactions are present (write test data, crash, restart, query test data). For production: (1) enable AOF replication (replica also has appendonly yes) so two copies exist, (2) backup RDB + AOF hourly to S3, (3) monitor INFO stats > total_system_memory_used to ensure no memory pressure (which can delay fsync). Run LATENCY DOCTOR to identify fsync delays and alert on >10ms. Test with payloads matching production (average transaction size).

Follow-up: If you need even stronger durability (multi-region replication), how would you extend this strategy?