Your e-commerce application needs to transfer $100 from buyer wallet to seller wallet atomically: debit buyer, credit seller. Without transactions, if mongod crashes between the two updates, money disappears. You implement this with a multi-document transaction on MongoDB 4.0+. However, you notice that if the buyer wallet update takes 5 seconds and the seller wallet update takes 10 seconds, the total transaction latency is 15 seconds. Your application SLA requires transactions to complete in <1 second. Why doesn't transaction latency improve with faster hardware, and how would you redesign?
Transaction latency is bounded by the slowest operation inside the transaction. If seller wallet update takes 10 seconds (maybe due to a slow query or lock contention), the entire transaction waits. Faster hardware won't help if the bottleneck is algorithmic (e.g., document lock wait, index traversal).
Root cause investigation: Check if seller wallet update has a query predicate that triggers a full collection scan. Example: `session.startTransaction(); db.buyers.updateOne({_id: buyer}, {$inc: {balance: -100}}); db.sellers.updateOne({userId: seller}, {$inc: {balance: 100}})`. If the second update uses `userId` without an index, it scans all sellers—slow. Fix: Add index `db.sellers.createIndex({userId: 1})` to make the update fast.
Alternative redesign for SLA: Use a write-ahead log (WAL) pattern: (1) Insert a transaction record: `{type: "transfer", buyer, seller, amount: 100, status: "pending"}`; (2) Update buyer atomically: `db.buyers.updateOne({_id: buyer}, {$inc: {balance: -100}, $push: {txIds: txId}})`; (3) Update seller atomically: same as seller; (4) Mark transaction complete: `db.transactions.updateOne({_id: txId}, {$set: {status: "complete"}})`. Each operation is fast, and if any step crashes, recovery can replay incomplete transactions. This sacrifices ACID atomicity for speed, but provides eventual consistency.
Verify transaction performance: `db.currentOp()` shows active transactions with `secs_running`. Transactions taking >1s should be logged for investigation.
Follow-up: What are the risks of the WAL approach vs true transactions? How would you ensure consistency if the process crashes after buyer update but before marking complete?
You're implementing a banking system where account transfers must be ACID compliant. Your initial design uses MongoDB transactions: transfer $100 from account A to B. However, you discover that transactions acquired read locks on account A and B for the duration, preventing concurrent transfers involving the same accounts. With 100 concurrent transfer requests, you see timeout errors due to lock waits. How do transactions handle concurrency, and how would you optimize?
MongoDB transactions use intent locks at document level (4.0+) and row-level locks in WiredTiger. When transaction T1 reads/modifies document A, it acquires an exclusive lock. Concurrent transaction T2 attempting to access A blocks until T1 releases the lock (at commit/rollback). This serialization ensures isolation but hurts concurrency.
With 100 concurrent transfers, if many involve the same accounts, you create a bottleneck: all transfers to popular accounts serialize behind locks. Lock wait time accumulates, causing timeouts.
Optimization: (1) Batch transfers by account to reduce contention—process 10 transfers to account B sequentially, then move to next account; (2) Use sharding by account ID to distribute transfers across shards—each shard handles transfers for a subset of accounts independently, allowing true parallelism; (3) Redesign schema to avoid document conflicts: instead of a single account balance, use a distributed ledger pattern with immutable transaction records and compute balance via aggregation. Each insert to transaction log doesn't conflict with concurrent inserts (append-only).
Example sharding: `sh.shardCollection("bank.accounts", {accountId: 1})` splits accounts across shards. Transfer within same shard still locks, but transfers between shards run in parallel.
Monitor lock contention: `db.currentOp({$and: [{ns: "bank.accounts"}, {type: "op"}]})` shows operations; if `secs_running` is high, locks are held. Run `db.serverStatus().locks` to see lock statistics per collection.
Follow-up: Design a high-throughput ledger system for millions of daily transfers without using traditional locks. What tradeoffs exist?
Your transaction spans 3 collections: orders, inventory, and payments. If any step fails, the entire transaction rolls back. However, you notice that after rollback, your application can't determine why the transaction failed (e.g., was it a conflict, timeout, or error?). You implement a retry loop that retries 5 times if the transaction fails, but occasionally see duplicate orders or inventory inconsistencies. What's happening?
Transaction rollback reverts changes to those 3 collections, but external side effects (e.g., emails sent, webhooks triggered) aren't rolled back. If your app sends "order confirmed" email before the transaction commits, then the transaction rolls back, the email was sent but order doesn't exist—inconsistency.
Duplicate orders can occur if: (1) Transaction fails with TransientTransactionError (network hiccup)—app retries and accidentally inserts twice; (2) Retry logic doesn't use idempotent keys—each retry is a distinct operation. Fix: Use a unique idempotency key (e.g., {userId, orderTime, merchantId}): `db.orders.updateOne({idempotencyKey: key}, {$set: order}, {upsert: true})`. Second attempt finds existing order, updates in-place instead of inserting duplicate.
Error classification: MongoDB transactions throw different errors: (1) WriteConflictError: retry; (2) TransientTransactionError: retry; (3) DuplicateKeyError: don't retry (logic error); (4) Timeout: check if partial write occurred before retrying. Use `db.serverStatus().transactions` to see commit count vs abort count.
Best practice: (1) Use idempotent operations; (2) Don't retry indefinitely—set max retries (3-5); (3) Log all transaction failures for debugging; (4) For side effects, use compensating transactions: if email fails, store in failed-emails queue for async retry; (5) Wrap external side effects in try-catch outside transaction to isolate them from rollback.
Follow-up: Design an idempotency framework that would work across multiple domains (orders, payments, shipping) without creating architectural coupling.
You're running a replica set with 5 nodes. Your application uses multi-document transactions with write concern "majority". A transaction commits on PRIMARY after majority ack (3 of 5), then you immediately read that data from a SECONDARY. The SECONDARY hasn't replicated the committed data yet and returns stale results. How do you guarantee read-after-write consistency with transactions?
The issue: after write concern "majority", the data is durable but not yet replicated to all SECONDARYs. If you read from a SECONDARY that lags, you get stale data. This violates strong consistency guarantees of transactions.
Solutions: (1) Read from PRIMARY after transaction commit: `session.startTransaction(); db.orders.insertOne(order); session.commitTransaction(); db.orders.findOne({_id: order._id})` (second read uses default PRIMARY read pref). This guarantees you read your own write; (2) Use session causal consistency: `{causalConsistency: true}` in session options ensures all reads in the session include all prior writes from that session, even on SECONDARYs (SECONDARY will wait for oplog replication if necessary); (3) Use read concern "afterClusterTime" with clusterTime from transaction commit: after commit, MongoDB returns the commit timestamp; use this in subsequent queries to force SECONDARYs to read at that timestamp or later.
Code example: `session.startTransaction(); db.orders.insertOne(order); session.commitTransaction(); const commitTime = session.operationTime; db.orders.findOne({_id: order._id}, {readConcern: {afterClusterTime: commitTime}})`.
Verify consistency: Query both PRIMARY and SECONDARY with same filter and confirm results match. If SECONDARY lags, increase replication lag monitor alert threshold and investigate network/IO bottlenecks.
Follow-up: What's the performance impact of causal consistency vs eventual consistency? When would you choose each?
Your transaction modifies a document inside a nested array: db.users.updateOne({_id: userId, emails: {$elemMatch: {address: old}}}, {$set: {"emails.$.address": new}}). This must be atomic—if it fails, the email mustn't be partially updated. You wrap it in a transaction, but it only modifies one element per operation. After 100K user email updates via transactions, you realize transaction overhead is significant: each operation takes 50ms. You need to complete 100K updates in <1 hour (36ms per update). Should you use bulk writes instead?
Transaction overhead comes from: (1) Starting session; (2) Taking locks on the document; (3) Checking write concern (majority); (4) Committing. For single-document updates, the transaction overhead (45ms) swamps the actual update (5ms). For bulk operations without atomicity requirements across documents, direct updates are faster.
However, if you need atomicity (email update must not partially complete), transactions are necessary. Options: (1) Optimize transaction latency: remove unnecessary locks by using `{$set: {"emails.$[elem].address": new}, arrayFilters: [{elem.address: old}]}` which is faster than $elemMatch; (2) Batch multiple email updates into a single transaction: `session.startTransaction(); for (i = 0; i < 100; i++) db.users.updateOne({_id: userIds[i]}, {...}); session.commitTransaction()` reduces transaction overhead per update (1 transaction for 100 updates); (3) If atomicity isn't required per-update (eventual consistency is ok), use bulk writes: `db.users.bulkWrite([{updateOne: {filter, update}}, ...])` without transaction—much faster.
For your 100K update: if true atomicity needed, batch 10 updates per transaction = 10K transactions, each taking 50ms = 500 seconds (too slow). Better: use idempotent bulk updates with application-level deduplication to detect and skip duplicate updates if process crashes mid-way.
Follow-up: How would you design a system to handle 100K email updates with atomicity requirements and <1 hour SLA?
You discover that MongoDB's transaction implementation uses snapshot isolation: each transaction reads as of a specific point-in-time snapshot (not live data). This prevents write skew anomalies but creates a "phantom read" scenario: your transaction reads data, then later in the transaction, a concurrent transaction modifies related data. However, the transaction doesn't see the modification (because it's reading from a snapshot). Can two concurrent transactions see inconsistent data? Is snapshot isolation safe for all use cases?
Yes, snapshot isolation creates write skew. Example: Bank account transfer where you check total balance before transferring. Transaction T1: read balance (sum of all accounts), T2: read balance (before T1's transfer), both see same total, then T1 transfers $100 to T2's account, T2 transfers $100 to T3. After both commit, total changed even though both checked it. This violates invariant that total balance is constant.
MongoDB's snapshot isolation doesn't prevent write skew—it prevents dirty reads, non-repeatable reads, and lost updates, but not all anomalies. For finance/accounting, write skew is unacceptable. You must add application logic: (1) Use serializable isolation if available (not in MongoDB—would require explicit locking); (2) Add invariant check in application: after transfer, re-read total and verify it matches expectation; (3) Use optimistic locking with versioning: each account has {version: N}; transfer increments version; if versions changed unexpectedly, abort and retry; (4) Redesign to avoid read-then-write-based-on-read patterns—instead, use append-only ledger where each transfer is an immutable entry.
Verify snapshot behavior: Start 2 sessions, T1 reads count, T2 reads count, T1 modifies, commit both. They read same snapshot but made conflicting updates—this is allowed in snapshot isolation.
Follow-up: Design a payment system that uses MongoDB transactions while preventing write skew anomalies.
Your transaction writes to a collection with a unique index. Inside the transaction, you attempt to insert a document with a duplicate key: `session.startTransaction(); db.users.insertOne({email: "dup@test.com"}); db.users.insertOne({email: "dup@test.com"}); session.commitTransaction()`. The second insert violates the unique index. The entire transaction aborts and rolls back. But your application expected to handle this error gracefully. Instead of detecting which email is duplicate, you get a generic "transaction aborted" error. How would you handle this?
MongoDB throws MongoWriteConcernError for duplicate key inside transaction. The transaction aborts, rolling back the first insert too. The error is generic—doesn't tell you which specific field caused duplication.
Better approach: (1) Use upsert semantics: `db.users.updateOne({email: newEmail}, {$set: {email: newEmail, ...}}, {upsert: true})` succeeds whether inserting or updating existing email—no error; (2) Check for existence before insertion: `if (!db.users.findOne({email})) db.users.insertOne({email, ...})`—but this is non-atomic (check-then-act race condition); (3) Use bulk writes with `ordered: false`: inserts continue even if one fails, returning per-operation errors detailing which failed; (4) Pre-validate schema outside transaction: validate all emails are unique before transactional batch; (5) Use application-level deduplication: maintain in-memory set of emails being inserted to catch duplicates before hitting database.
For your scenario: pre-validate inside transaction before inserts: `const emails = [emails passed as args]; const existing = db.users.find({email: {$in: emails}}).toArray(); if (existing.length > 0) throw new Error("Duplicate emails: " + existing.map(e => e.email))`. This fails fast with clear error message instead of a generic transaction abort.
Follow-up: Design a schema and validation strategy for handling bulk user imports where some emails may be duplicates. How would you report back which emails failed and why?
You implement a distributed job queue using MongoDB transactions. Each job is a document: {_id, status, worker_id, startTime}. Workers query for jobs with status "pending", start a transaction, update status to "running" and set worker_id, commit, then execute the job. If a worker crashes, the job stays "running" forever. You implement a recovery mechanism: every 5 minutes, scan for jobs with status "running" and startTime > 5 minutes old, update to "pending", and reassign. However, you see jobs being executed by two workers simultaneously. Why, and how would you fix?
Race condition: Between the moment a job is claimed (status: "running" by worker A) and when worker A updates MongoDB, another recovery scanner might see the old state and reassign to worker B. Both workers execute the job simultaneously.
Root cause: Your recovery scanner has a time-of-check-time-of-use (TOCTOU) bug. Transaction T1 (recovery): read job status="running" startTime=5m old, update to "pending". Transaction T2 (worker): read job status="pending", update to "running", execute. If T1's read and update have latency, T2 can sneak in between.
Fix: (1) Use a version/generation number: `{_id, version: 0, status: "pending"}`. Worker updates with CAS (compare-and-swap): `db.jobs.updateOne({_id: job, version: 0}, {$set: {status: "running", version: 1, workerId: W}})`. Recovery checks for version 1, and if it's aged >5 minutes, resets version to 0. This prevents worker from claiming a version that's already being reset; (2) Use automatic job expiry: add TTL index on heartbeat field, jobs automatically expire after 5 minutes without heartbeat update—don't use manual recovery; (3) Use distributed locking (Redlock algorithm) to coordinate job assignment: worker acquires lock before claiming job, recovery can't touch locked jobs.
Best practice: Implement heartbeat instead of 5-minute timeout. Worker continuously updates {heartbeat: now} while executing. Recovery marks jobs with stale heartbeats as available. This is safer than time-based recovery.
Follow-up: Design a production-grade job queue system on MongoDB that handles worker crashes, duplicate execution, and maintains ordering guarantees.