You're implementing atomic bank transfer: debit account A, credit account B. You run two separate Redis commands: DECRBY account:A 100, then INCRBY account:B 100. Between the two commands, a network error occurs and the second command never reaches Redis. Account A is debited but B isn't credited—money vanishes! How does Lua scripting prevent this?
Lua scripts are atomic: all commands in the script run together, indivisible. Even if network dies midway through script execution, either the entire script completes or none of it does. Use EVAL: EVAL 'redis.call("DECRBY", KEYS[1], ARGV[1]); redis.call("INCRBY", KEYS[2], ARGV[1]); return 1' 2 account:A account:B 100. Redis holds lock during script execution—no other client can interleave. If network dies during script execution on the client side (before receiving response), the script still completes on Redis side. To verify: run redis-cli EVAL ... and kill the connection with CTRL+C while script is running. Restart redis-cli and query account balances—transfer will have completed. Atomicity guarantee: (1) no client can read partial state (A is debited but B isn't), (2) script always either fully succeeds or fully fails, (3) ACID properties within script scope. However, Lua doesn't protect against crash of Redis itself. If Redis crashes midway: (1) use persistence (AOF) to recover. AOF logs the EVAL command, and on restart, the script is re-executed atomically. (2) idempotent scripts: if script runs twice, same result. Use unique ID: EVAL '... SET accounting:transfer:
Follow-up: If the Lua script calls a command that's very slow (e.g., FLUSHDB on a large database), how would you prevent the entire Redis from hanging?
Your Lua script hits an error: redis.call("LRANGE", "mylist", 0, -1) returns a list, but the script tries to access a hash field: local val = result["field"]. Script crashes. All subsequent commands from clients are stalled while Redis tries to handle the script error. SLOWLOG shows "script killed" errors. How do you debug and fix?
Lua type errors crash the script, and while the script is running, Redis can't process other commands (blocking everyone). Debug: (1) test script locally: install redis-cli with --eval flag or use redis-lua-debugger. Run redis-cli EVAL script.lua 0 and inspect error output. (2) use redis-call error handling: EVAL 'local result = redis.call("LRANGE", "mylist", 0, -1); if type(result) == "table" then ... else redis.error_reply("wrong type") end' 0. (3) use pcall instead of redis.call to catch errors gracefully: EVAL 'local ok, result = pcall(function() return redis.call("LRANGE", "mylist", 0, -1) end); if not ok then return {err = result} else ... end' 0. This prevents script crash and allows retry. To fix the stalled clients: (1) run SCRIPT KILL on Redis CLI from another client connection to forcibly abort the runaway script. (2) if SCRIPT KILL doesn't work (script is in middle of redis.call), you must SHUTDOWN or kill -9 the Redis process. (3) prevent by: (a) test scripts thoroughly before production, (b) set script timeout: CONFIG SET lua-time-limit 5000 (5 seconds) so slow scripts auto-abort, (c) add error checking in script. Verify: run redis-benchmark --eval script.lua 0 -c 100 to load test with multiple clients and ensure script doesn't block.
Follow-up: If a script repeatedly times out (lua-time-limit exceeded), how would you optimize it without rewriting the entire script?
Your Lua script uses redis.call inside a loop: for i=1, 1000 do redis.call("SET", "key"..i, "value") end. The script runs for 5 seconds, blocking all clients. A monitoring system detects the block and alerts: "Redis under heavy load". You want to keep atomicity but reduce blocking time. What's the tradeoff?
Lua scripts block Redis single-threaded event loop. Long scripts = no other clients execute commands. This isn't an atomicity problem (all 1000 SETs complete atomically), but a fairness/latency problem (other clients starve). Tradeoff: (1) break into multiple smaller scripts: instead of 1 script with 1000 SETs, run 10 scripts with 100 SETs each. Each is separately atomic, but state between transitions is partial (not fully atomic across all 1000 keys). Good for use cases where partial updates are acceptable (e.g., caches). (2) use MSET instead of Lua loop: redis.call("MSET", "key1", "val1", ..., "key1000", "val1000"). Still 1 atomic operation but much faster (no script overhead). Limitation: MSET has command length limit (~512MB), and you need to construct the entire payload on client-side. (3) use pipelining client-side: send 100 SETs in batch without waiting for each. Not atomic across 100 commands but atomic per command. Fastest for non-transactional workloads. (4) Redis Cluster sharding: shard keys across multiple nodes, execute script in parallel. Each node processes subset. Requires rewriting application logic. Recommendation: use MSET for bulk SET operations (fastest), use client-side pipelining for non-transactional workloads (acceptable consistency), use Lua scripts only when true atomicity across multiple keys is required. Monitor with: redis-cli --latency-history to detect script blocking, and redis-cli SLOWLOG GET to identify slow commands. Alert if slowlog > 100ms.
Follow-up: If you can't avoid a long-running Lua script, how would you prevent it from blocking critical operations like PING health checks?
You deploy a Lua script to increment a counter, but during rolling update, you deploy new script with different SHA1 hash. Old clients (using old script SHA) and new clients (using new SHA) run simultaneously. Both claim they increment the same counter atomically, but their atomic blocks don't overlap—non-deterministic behavior results. How do you safely version Lua scripts?
Different script SHAs = different atomic blocks = no cross-version atomicity. Scenario: old script does SET counter
Follow-up: If you deployed a buggy new script and need to immediately roll back without deploying client code, how would you do this?
Your Lua script acquires a distributed lock using SET with NX and EX, then does work, then releases the lock with DEL. But if the script times out (lua-time-limit exceeded) midway through the DEL, the lock is partially released (not fully deleted). Clients trying to acquire the same lock see inconsistent state. How do you ensure lock release is atomic?
The problem: if script timeout happens during DEL, the script is killed, leaving lock in ambiguous state. Solution: (1) implement lock release atomically within the same script: EVAL 'if redis.call("GET", KEYS[1]) == ARGV[1] then redis.call("DEL", KEYS[1]); return 1 else return 0 end' 1 lock:resource
Follow-up: If you can't modify the script (e.g., third-party code) and it times out frequently, what operational workaround would you implement?
Your Lua script iterates over a ZSET using ZRANGE and performs complex calculations on each member. The script is deterministic but each ZRANGE call returns results in different order based on Redis version or floating-point precision. This causes replication divergence: primary and replicas execute script and produce different final results. How do you ensure script replicates correctly?
Script non-determinism is dangerous in replication: primary computes X, replicas compute Y, inconsistency results. Causes: (1) ZRANGE order depends on Redis version's floating-point handling, (2) script uses redis.call("RANDOMKEY"), (3) script uses system time (redis.call("TIME")). Fix: (1) sort explicitly: EVAL 'local members = redis.call("ZRANGE", KEYS[1], 0, -1); table.sort(members); ...' to force deterministic order even if Redis returns unsorted. (2) avoid randomness: if RANDOMKEY is needed, make it deterministic—use SCAN with fixed cursor instead. (3) use ZRANGEBYSCORE with explicit range to ensure consistent ordering across versions. (4) test on both primary and replica: run script on both, compare output. Alert if divergence. Prevention: (1) require scripts to be deterministic (Redis actually enforces this for cluster mode but not standalone). (2) version scripts and test exhaustively before deploying to production. (3) use SCRIPT DEBUG YES on replica to single-step script and compare execution flow with primary. (4) run redis-cli MONITOR on primary and replica simultaneously and log all commands. If output diverges, check script execution. For already-diverged data: (1) restore replica from RDB snapshot of primary, then re-sync replication. (2) use redis-cli --raw --csv to export keys from primary and replica, diff them, and manually fix divergence.
Follow-up: If a Lua script calls redis.call("FLUSHALL"), can replicas be prevented from also executing this script?
Your app uses Lua scripts heavily: 10K scripts loaded across 100 Redis instances. Script cache is full (CONFIG GET script-flush-limit reached). New scripts can't load. SCRIPT FLUSH would clear ALL scripts, breaking all clients momentarily. You need to selectively unload old scripts without affecting clients. How do you manage script lifecycle?
Redis doesn't provide SCRIPT DELETE for individual scripts (only FLUSH for all or FLUSH ASYNC for non-blocking). Managing 10K scripts requires strategy: (1) implement script versioning: include version in SHA metadata. Old versions are tracked and candidate for removal. Use redis-cli SCRIPT EXISTS
Follow-up: If you have a script that's used by 1000 concurrent clients and you want to upgrade it, how would you roll out the new version without dropping requests?
Your Lua script uses redis.call() but doesn't perform I/O to external systems (no HTTP calls). However, the script reads from Redis, performs 1000 operations, then writes results. If redis is_read_only (replica mode), redis.call("SET") inside the script fails with READONLY error. Scripts can't detect replica mode and adapt. How do you handle read-only replicas?
Replicas are read-only by default (replica-read-only yes). Scripts running on replicas that try to write will fail. Fix: (1) detect replica mode on client-side before executing write scripts: redis-cli INFO replication | grep role:slave tells you if it's a replica. Only execute read-only scripts on replicas. (2) use SCRIPT DISABLE-WRITES or similar: some Redis modules (Redis Enterprise, Redis Stack) provide mechanisms to detect/prevent writes on replicas. (3) execute write scripts only on primary: route all EVAL commands with writes to primary. Read-only scripts can execute on replicas. (4) use Lua conditional: EVAL 'if not redis.call("INFO", "replication")[1].role == "master" then return redis.error_reply("Replica is read-only") else redis.call("SET", ...) end' 0. This safely errors if executed on replica rather than crashing. (5) for read-heavy workloads: move read-only portion to replica, then send write requests to primary: EVAL-on-replica for reads, EVAL-on-primary for writes. Batch writes to amortize overhead. Implementation: check CLIENT LIST | grep ROLE on replica vs primary. Run EVAL scripts only on appropriate instance. Test with: (1) run redis-cli --replica mode and execute scripts, verify write scripts error gracefully. (2) measure latency on replicas (should be lower) vs primary (higher due to replication overhead).
Follow-up: If you have a complex Lua script that must execute on both primary and replica (for consistency), how would you ensure both produce identical results?