MongoDB Interview — Write Concern, Read Concern, and Consistency

Your e-commerce application processes a payment: debit customer's wallet and credit seller's wallet. You use write concern "majority" to ensure both writes are durable on majority of replicas before returning success to customer. However, 1% of customers report they were charged twice. Investigation shows: the customer's wallet was debited successfully (write concern "majority" confirmed), but the seller's wallet credit failed (network error). Your application retried the credit, but by then, customer saw success and went away. Next month, they see the duplicate charge. Design how write concern should handle failures.

Write concern "majority" means both writes individually are durable, but doesn't mean atomicity across two operations. When debit succeeds with "majority" and credit fails, you have divergence: wallet updated, but not the other.

Solutions: (1) Multi-document transactions: wrap both updates in transaction with write concern "majority". Atomically: debit + credit succeed or both rollback. If credit fails, entire transaction aborts, wallet debit is rolled back (no divergence); (2) Idempotent writes: assign each transfer a unique ID. Retry logic checks: if transfer ID already processed, skip (prevents duplicate). Code: `db.transfers.insertOne({_id: transferId, from, to, amount, status: "pending"}); db.wallets.updateOne({userId: from}, {$inc: {balance: -amount}}); db.wallets.updateOne({userId: to}, {$inc: {balance: amount}}); db.transfers.updateOne({_id: transferId}, {$set: {status: "complete"}})`. If retry occurs, insert of transfers collection fails (duplicate key), transaction aborts early; (3) Saga pattern: debit/credit are separate operations, recorded in saga log. If credit fails, saga engine triggers compensation (undo debit). Async eventually-consistent model; (4) Distributed transactions: use 2-phase commit (Saga or Outbox pattern). Debit writes to outbox table, background job publishes to message queue, consumer credits seller wallet, marks outbox entry processed.

For your case: use multi-document transactions with write concern "majority". Both debit and credit succeed atomically or fail together, no divergence.

Follow-up: If transactions are too slow for your latency SLA, how would you redesign using eventual consistency without duplicate charges?

You have a replica set with 3 nodes. You process an insert with write concern "majority" (default). The write goes to PRIMARY, PRIMARY applies to oplog, 2 secondaries replicate. MongoDB confirms "majority" ack. Then PRIMARY crashes 1 millisecond after acknowledging but before writing to disk checkpoint. After PRIMARY recovery, the write is gone (lost). Your customer's order disappeared. Why didn't write concern "majority" prevent data loss?

Write concern "majority" means majority of replica nodes have the write in their oplog, but doesn't mean the write is persisted to disk on any node. If PRIMARY crashes before checkpoint, the write in oplog is lost (oplog is in-memory cache with background persistence). With "majority" ack but no disk persist, data loss is possible in scenarios of cascading failures (PRIMARY + 1 SECONDARY crash simultaneously, losing majority).

MongoDB doesn't guarantee zero data loss with "majority" write concern alone. To guarantee persistence to disk: (1) Use write concern "majority" with `j: true`: forces journal write to disk before returning. Write concern becomes {w: "majority", j: true}. This ensures majority have write in journal (durably persisted), not just oplog. Adds ~10-20ms latency per write due to disk I/O; (2) Use replication factor 5+ and write concern 3+: if you have 5 nodes and require write concern 3 (3 of 5), even if 2 nodes crash, 3 still have the write—no data loss; (3) Accept potential data loss: design application to handle occasional missing writes (idempotent retries, audit trail).

Best practice: use {w: "majority", j: true} for financial/critical data. For non-critical data (analytics, logs), {w: "majority"} without journal is acceptable (faster but slightly less safe). Verify configuration: `rs.conf().settings` shows `getLastErrorDefaults`.

Follow-up: If you need <50ms write latency with maximum durability, how would you balance write concern and journal settings?

Your application uses read concern "majority" to read data, which should return committed writes. However, after you write a document with write concern "majority", you immediately read with read concern "majority" and get "no results" (document not found). Then you read with read concern "local" and see the document. This seems inconsistent: if write concern "majority" succeeded, why isn't the majority-read seeing it?

Write concern "majority" and read concern "majority" are supposed to be consistent: writes acked by majority are visible to majority-reads. However, timing and replication lag can cause this scenario.

Explanation: (1) Write reaches PRIMARY and 1 SECONDARY (2 of 3 nodes, majority). PRIMARY acks client; (2) At that exact moment, read is routed to a different SECONDARY that hasn't replicated the write yet. Read concern "majority" checks if write is visible on majority—but this SECONDARY hasn't replicated yet, so from its perspective, majority doesn't have the write; (3) OR: read concern "majority" uses lastCommittedOptime, which lags replication. If replication hasn't reached "committed" state yet (acknowledged but not committed), read concern "majority" doesn't see it.

Root cause: temporal race condition. Between write ack and read, there's a window where read might miss the write if routed to slow SECONDARY. Solution: (1) Read from PRIMARY after write: `db.collection.insertOne(doc); db.collection.findOne({_id: doc._id})` on PRIMARY (default) guarantees you see your own write; (2) Use session causal consistency: `{causalConsistency: true}` ensures same session's subsequent reads include all prior writes from the session, even on SECONDARYs; (3) Use read concern "afterClusterTime" with write's cluster time: after insert, extract cluster time, read with afterClusterTime = cluster time. Ensures read waits for replication to that time; (4) Add retry: if read returns nothing, retry after 100ms (gives time for replication).

Follow-up: How would you design a payment system that guarantees read-after-write consistency for all users, even those reading from SECONDARYs?

You run a distributed application across 3 AWS regions with a 3-node MongoDB replica set in each region (total 9 nodes). Nodes are synchronized via cross-region replication with ~100ms latency between regions. Write concern is set to "majority" (3 of 9 nodes). After a write acks with "majority", you read from the nearest region and sometimes get stale data (write not replicated to that region yet). The user sees "no orders" then after refresh sees the order. Design a consistency model.

With 9 nodes spread across 3 regions and write concern "majority" (3 of 9), a write only needs acknowledgment from 3 nodes. These could all be in a single region (e.g., us-west). When user reads from a different region (e.g., eu-west), that region's 3 nodes haven't replicated the write yet—stale read.

Design options: (1) Strong consistency across regions: increase write concern to "majority" = 5 of 9 (majority of entire 9-node set). This forces writes to replicate to at least 2 regions before ack. Latency adds ~100ms per write; (2) Read-your-writes consistency per user: use session causal consistency. After write, embed write's clusterTime in response. Client's next read includes readConcern: {afterClusterTime: clusterTime}, forcing read to wait for replication to that cluster time. Hides replication lag from user; (3) Regional write preference: direct writes to local region (PRIMARY in us-west), SECONDARYs in other regions handle reads. Set `writeConcern: {w: 3, wtimeout: 5000}` (wait for 3 nodes, but timeout after 5s). This forces writes to spread to at least 2 regions, reducing but not eliminating stale reads; (4) Accept eventual consistency: display "order may take 1-2 seconds to appear due to replication" message to user, or implement polling with backoff.

Recommended for user-facing writes: use read-your-writes consistency (option 2). Code: after insertOne, return {insertedId, clusterTime}. Next read: `find({_id: insertedId}, {readConcern: {afterClusterTime: clusterTime}})`. This is transparent to user but guarantees they see their own write.

Follow-up: If write concern "majority" across 9 nodes is too slow, how would you maintain strong consistency with <50ms write latency?

You implement a queue system on MongoDB: workers dequeue items {_id, status: "pending"} and update to status: "processing". Your query: `db.queue.findOneAndUpdate({status: "pending"}, {$set: {status: "processing", workerId: id}}, {returnNewDocument: true})`. Write concern is "majority" (default). Two workers simultaneously try to dequeue the same item and both succeed (both see {status: "processing"}). This means the item is processed twice. Why doesn't write concern prevent this race condition?

Write concern "majority" ensures individual writes are durable but doesn't prevent concurrent writes from conflicting. Both workers' updates are durable ("majority" confirmed), but they operate on the same item sequentially (or near-concurrently), causing both to succeed on the same item.

MongoDB's document-level locking prevents this IF queries run concurrently on the same document. However, `findOneAndUpdate` is atomic at document level: (1) find matching document (status: "pending"); (2) update it; (3) return updated doc. Both workers might execute these steps in interleaved fashion, both finding the same "pending" item before either updates.

Prevention: (1) Use unique constraint + upsert: change schema to have one document per worker. Workers compete to insert their own document: `db.workers_processing.insertOne({_id: itemId, workerId: myId}, {upsert: false})` (fails if already inserted). First worker succeeds, second fails. Simple but doesn't mark item as processing; (2) Use optimistic locking: add version field {_id, version: 1, status: "pending"}. Update: `findOneAndUpdate({_id: itemId, version: 1}, {$set: {status: "processing", version: 2, workerId: id}})`. Only succeeds if version matches. If 2nd worker has version 2, their update fails (CAS-like behavior); (3) Distributed locking: before dequeue, acquire lock in separate collection: `db.locks.insertOne({_id: itemId, lockId: newId()})` (fails if already exists). Worker who succeeds acquires lock, dequeues item, releases lock; (4) Use transactions: wrap find + update in transaction with write concern "majority". If both workers' transactions run concurrently on same doc, only one succeeds (lock-based).

Recommended: use distributed locking (option 3) for simplicity. Or use transactions if you need atomicity across multiple items.

Follow-up: Design a distributed queue on MongoDB that handles 10K enqueues/sec and 5K dequeues/sec with guaranteed at-most-once processing.

Your MongoDB setup uses read concern "snapshot" for consistent snapshots across replica sets. However, when you increase traffic from 1000 req/sec to 10000 req/sec, you see errors: "no snapshot available" from MongoDB. Latency also spikes from 50ms to 500ms. Why does read concern "snapshot" fail under load, and how would you scale?

Read concern "snapshot" provides consistent snapshots: all data in a read is from the same point-in-time across the entire cluster. Under the hood, MongoDB maintains a "committed snapshot" (stable version of data agreed upon by majority of nodes). Creating/maintaining snapshots consumes resources.

Under load (10K req/sec), each read takes a snapshot. If snapshot creation/garbage collection is slow, snapshots pile up in memory. Eventually, MongoDB runs out of snapshot handles (configurable limit, typically 50-100 concurrent snapshots). New reads fail with "no snapshot available". Latency spikes because read must wait for older snapshots to complete and be garbage collected.

Scaling options: (1) Reduce snapshot generation rate: use read concern "majority" (no snapshot overhead) for non-critical reads. Reserve "snapshot" for critical consistency requirements only; (2) Increase snapshot buffer size: configure `snapshotHistorySize` on server. Larger buffer holds more snapshots, reducing "no snapshot available" errors. But increases memory usage: `db.serverStatus().snapshotWindow` shows current snapshot size, increase `storage.wiredTiger.snapshotHistorySize` in config; (3) Add more nodes to cluster: with more nodes, snapshot generation is parallelized. Read concern "snapshot" creates consistent view across nodes more efficiently; (4) Upgrade MongoDB version: MongoDB 4.4+ has better snapshot management. Newer versions batch snapshots and have lower overhead; (5) Shard the data: split load across shards. Each shard maintains fewer snapshots independently, improving throughput.

For 10K req/sec: use read concern "majority" for 80% of reads (non-critical), "snapshot" for 20% (critical consistency). This reduces snapshot pressure while maintaining consistency where needed.

Follow-up: Design a system that uses appropriate read concern for different query types based on consistency requirements and performance targets.

You process a transaction with write concern "majority" and read concern "snapshot". Inside the transaction, you: (1) read a document, (2) modify it based on read value, (3) write it back. Everything succeeds. However, when you query outside the transaction immediately after, the written value is different than what you wrote. The transaction sees its own write (reads new value), but external readers see old value. Is MongoDB broken?

MongoDB transactions use snapshot isolation and write concern is per-transaction. When you commit transaction with write concern "majority", MongoDB ensures majority have the write in their oplog. However, "majority" doesn't mean "all"—minority nodes might not have it yet.

External reader after transaction commit has two cases: (1) Reader queries PRIMARY: sees the write (PRIMARY always has latest); (2) Reader queries SECONDARY that hasn't replicated: sees old value (replication lag). This is by design in MongoDB—eventual consistency for SECONDARYs.

If you see different values for same document within the same session/transaction, that's a different issue (isolation level problem). If you see different values in external queries, it's replication lag.

Guarantee consistency: (1) Read from PRIMARY after transaction commit: `session.commitTransaction(); db.collection.findOne({_id})` (uses PRIMARY by default); (2) Use causal consistency: `{causalConsistency: true}` ensures session-level read-after-write consistency. Subsequent reads in same session see committed writes, even on SECONDARYs; (3) Specify read concern "majority" for external queries: `db.collection.findOne({_id}, {readConcern: {level: "majority"}})` ensures reads wait for replication to majority, hiding replication lag.

For your case: likely replication lag. Use read concern "majority" for queries that must see committed data, or accept eventual consistency and design application accordingly.

Follow-up: Design a payment processing system that uses transactions, write concern, and read concern to guarantee strong consistency without sacrificing latency.