You implement a real-time notification system using Redis Pub/Sub. Clients SUBSCRIBE to channels like "user:123:notifications". Server PUBLISHes events. During a traffic spike, a subscriber process crashes (SIGKILL). The server continues publishing events, but the crashed subscriber misses events while down (2 minutes). When subscriber restarts, it has lost all events. How do you add persistence to Pub/Sub?
Pub/Sub is fire-and-forget: events aren't persisted. Subscribers must be connected to receive. If subscriber is down, events are lost. Solution: (1) use Redis Streams instead: XADD channel message (persists to stream). Subscribers use XREAD to fetch from last-known-offset. If subscriber down, they can catch up from where they left off. (2) dual-write: publish to Pub/Sub AND write to a queue/stream. Pub/Sub for real-time subscribers, queue for offline subscribers. (3) implement ack pattern: when subscriber receives event, ACK it (e.g., store last-processed-id). On restart, query last-processed-id and fetch missing events. (4) backup message queue: for critical events, write to durable queue (Kafka, RabbitMQ) alongside Pub/Sub. Kafka persists for replay. (5) client-side buffering: if subscriber crashes, client buffer events on disk. On restart, replay buffered events. For your scenario: (1) migrate to Streams: change from PUBLISH to XADD: XADD user:123:notifications * event_data. Change subscriber from SUBSCRIBE to XREAD: XREAD BLOCK 1000 STREAMS user:123:notifications $. The $ means new messages (for new subscribers). Existing subscribers can read from last-known-id. (2) track last-read: when subscriber processes event, store id: SET user:123:last-read-id
Follow-up: If you use Streams but also have subscribers that need real-time notifications (can't afford latency of XREAD), how would you combine both?
Your system has 1M subscribers on channel "broadcasts" (SUBSCRIBE broadcasts). The server wants to PUBLISH a message, but the message is dropped because only 0 subscribers are connected (they all crashed or reconnected). This is a design flaw: Pub/Sub doesn't guarantee delivery. How do you ensure critical broadcasts reach all subscribers?
Pub/Sub assumption: subscribers are always connected. If not, messages are lost. For critical messages, use: (1) persistent queue: instead of PUBLISH, write to a list or stream. Subscribers periodically POLL the queue (e.g., every 10 seconds). Guarantees every subscriber will eventually get the message. (2) acknowledgement pattern: after subscriber receives PUBLISH, it sends ACK. Server tracks which subscribers ack'd. For non-ack'd subscribers, retry (send again via private channel or log for manual intervention). (3) use Redis Streams with consumer groups: XADD broadcasts message. XGROUP CREATE broadcasts group-1 0 (all subscribers read from start). Each subscriber is a consumer: XREAD GROUP group-1 consumer-1 STREAMS broadcasts >. The consumer group ensures each message is delivered to each consumer exactly once (even if consumer fails and restarts). (4) combine Pub/Sub + Stream: real-time subscribers use Pub/Sub (fast), offline subscribers use Stream (persistent). Dual-write: PUBLISH and XADD both. (5) explicit subscriber registration: subscribers register themselves (e.g., SADD subscribers:active
Follow-up: If you have 1M subscribers on Streams consumer group and one subscriber crashes, how do you handle the backlog of unack'd messages?
You use PSUBSCRIBE with pattern "user:*:notifications" to subscribe to all user notification channels. During traffic spike, 10K events/sec are published to different channels. Your subscriber process can only handle 1K events/sec (processing is slow). Events queue up in the subscriber's buffer and eventually it crashes (OOM). How do you handle this?
Slow subscribers in Pub/Sub cause buffering: Redis queues undelivered events in the subscriber's output buffer. If buffer exceeds client-output-buffer-limit (default 256MB), Redis forcibly disconnects the subscriber. Processing rate (1K/sec) < publish rate (10K/sec) = guaranteed backlog. Solutions: (1) increase processing speed: optimize subscriber code (parallelize, batch processing, cache). Can you increase from 1K to 5K/sec? (2) drop events gracefully: instead of buffering, drop old events. Use CONFIG SET client-output-buffer-limit pubsub 100mb (reduce limit to trigger drop earlier, or increase to buffer more). (3) scale subscribers: use multiple subscriber processes. Each subscribes to subset of channels. 10 processes * 1K/sec = 10K/sec total. Use PSUBSCRIBE patterns that partition by user: process-1 subscribes to user:0-99:*, process-2 subscribes to user:100-199:*, etc. (4) use dedicated subscriber node: if publishing app and processing app are same, separate them. Publisher publishes fast, subscribers process on different machine(s). (5) switch to Streams: instead of Pub/Sub, use Streams with consumer groups. Streams buffer on-server (much larger capacity). Subscribers consume at their own pace. Streams handle backpressure better. For your scenario: (1) measure current subscriber performance: time processing for 1K events. Identify bottleneck (network, DB query, computation). (2) optimize: if DB query, add caching. If computation, parallelize. (3) test improvement: target 5K/sec if possible. (4) if 5K/sec still < 10K/sec, switch to multiple subscribers or Streams. Prevention: (1) monitor subscriber buffer: CLIENT LIST and check omem (output buffer memory). Alert if > 100MB. (2) measure event publish rate vs subscriber consumption rate. Alert if publish > consumption for >5 minutes. (3) implement backpressure: if subscriber buffer grows, temporarily slow down publishers: send to publishing queue, drain queue at subscriber's pace.
Follow-up: If you can't optimize processing speed and must handle 10K events/sec with a 1K events/sec subscriber, should you drop events or queue indefinitely?
Your Redis Pub/Sub system publishes sensitive data (e.g., payment notifications). An attacker on the network performs packet sniffing and intercepts PUBLISH messages containing payment amounts and user IDs. Pub/Sub sends data in plaintext (if not over TLS). How do you secure Pub/Sub?
Pub/Sub security concerns: (1) plaintext data over network: sniff sensitive info. (2) unauthenticated subscribers: anyone can SUBSCRIBE without auth. (3) no encryption of message content. Fix: (1) TLS transport layer: configure Redis with TLS (tls-port 6380). All traffic, including PUBLISH/SUBSCRIBE, is encrypted. Sniffing fails. (2) authentication: require AUTH before SUBSCRIBE. ACL SETUSER app_user on >password +subscribe +publish. Unauthenticated connections rejected. (3) encryption at application level: app encrypts message before PUBLISH: encrypted_msg = encrypt(payment_info, key). Subscriber decrypts: decrypt(encrypted_msg, key). Double encryption (TLS + app-level). (4) per-channel access control: use ACL with channel patterns. ACL SETUSER user1 on +subscribe ~user1:* (can only subscribe to user1:* channels). Prevents subscribing to all channels. (5) avoid sensitive data in channels names: don't use payment:amount:$50 as channel name. Use generic channel (payment:processed) and include encrypted details in message. (6) short-lived subscriptions: use PEXPIRE on subscription metadata to auto-expire. Prevents indefinite listening. Prevention: (1) always use TLS for Pub/Sub over network. (2) enforce AUTH on all Redis clients. (3) encrypt sensitive message content (payment, PII). (4) audit SUBSCRIBE commands: log who subscribes to which channels. Alert on unexpected subscriptions. (5) implement channel whitelisting: only allowed channels can be subscribed. Implementation: (1) configure TLS: redis.conf with tls-port 6380, tls-cert-file, tls-key-file. (2) setup ACL: ACL SETUSER app on +@all ~* and +subscribe (allow subscribe to all, but only for specific users). (3) encrypt messages: in publish code: import cryptography; message = encryption.encrypt(json.dumps(payment_info)); redis.publish(channel, message). In subscribe code: message = decryption.decrypt(received_message); payment_info = json.loads(message). (4) test: capture traffic with tcpdump, verify no plaintext messages. Also test with tls disabled (should fail to connect).
Follow-up: If you need to revoke a subscriber's access mid-subscription, how would you disconnect them without affecting other subscribers?