You run a 2TB MongoDB production database. You implement daily backups using `mongodump` at 2 AM: `mongodump --out /backup/$(date +%Y%m%d)`. Backups complete in 4 hours. One day, a developer accidentally runs a batch delete (removes 300GB of data). You try to restore using the latest backup from 2 hours ago with `mongorestore /backup/20240315`. However, mongorestore takes 8 hours and causes massive I/O spike on production server, impacting customer queries. How would you design faster restore and backup strategy?
Problems with mongodump/mongorestore: (1) Slow backup (4 hours)—uses network to stream data, CPU-intensive JSON parsing; (2) Slow restore (8 hours)—sequential inserts without parallelization, can't utilize multi-threaded storage engine; (3) High I/O impact—all data rewritten sequentially during restore, locks out reads.
Better strategies: (1) Filesystem snapshots: LVM snapshots (Linux) or block-level snapshots (AWS EBS, GCP Persistent Disk). Snapshot 2TB in <1 second (metadata only), restore to new volume in <1 second. Zero downtime during restore; (2) Replica set backup: run a hidden replica with large oplog (7-day retention), keep it in sync at all times. For recovery, promote hidden to SECONDARY (already has all data), catch up to PRIMARY, then manually replay oplog changes that happened after incident (only 300GB recent changes instead of 2TB); (3) Cloud provider managed backups: MongoDB Atlas Backup or MongoDB Cloud Backups auto-snapshot at point-in-time without load on cluster; (4) Parallel mongorestore: `mongorestore --numParallelCollections 8 /backup` parallelizes restore across collections (still slow, but 2-3x faster than serial).
For your incident: use replica set backup. Keep a hidden SECONDARY synchronized. For 300GB restore, sync hidden (already done in background), promote hidden to SECONDARY role, now you have 2 working nodes with old data. Then gradually replicate forward using oplog. If oplog is only 24h and incident was 2h ago, replay oplog changes: backup is 2h old, replay 2h of oplog to catch up to current (300GB changes only).
Follow-up: If you can't use snapshots (your storage doesn't support them), design a backup strategy that allows sub-hour restore times.
Your MongoDB cluster is sharded across 3 shards. You run `mongodump --oplog --archive=backup.archive` to backup with oplog. However, when you restore to a test environment using `mongorestore --archive=backup.archive --oplogReplay`, the restored cluster is inconsistent: some shards have 100M docs, others have 95M. You check the source and confirm both have 100M. Why is sharded backup/restore inconsistent?
Sharded mongodump takes a snapshot of each shard independently but not atomically. Between dumping shard 1 (100M docs) and shard 2 (started before migration), chunk migrations can move data between shards. If migration moves 5M docs from shard 2 to shard 1 during backup window, backup captures shard 1 post-migration (105M) and shard 2 pre-migration (95M)—inconsistent.
Additionally, `--oplogReplay` replays oplog entries, but if shard configurations have changed (rebalancing), replaying old oplog on new sharding topology can cause issues.
Fix: (1) Stop balancer during backup: `db.disableAutoSplit(); sh.stopBalancer()`, wait for ongoing migrations, dump, then `sh.startBalancer()`. This ensures consistent chunk distribution during backup; (2) Use backup database with proper locking: some cloud providers lock all shards during backup to ensure consistency; (3) Backup to unsharded replica first: take a shard backup to a unsharded replica (follower), sync to latest oplog, dump unsharded replica (single-node, naturally consistent), restore to target. More work but guarantees consistency; (4) Use enterprise backup tools: MongoDB Ops Manager or MongoDB Cloud Backups handle sharded cluster backups atomically.
Verify consistency after restore: `db.printCollectionStats()` on each shard should match source shards. If mismatch, re-do backup with balancer disabled.
Follow-up: Design a sharded cluster backup strategy that maintains consistency without stopping the balancer (zero downtime).
You keep incremental backups using oplog: take full backup on Sunday, then replay oplog incrementally during the week. By Friday, you've accumulated 5 days of oplog (500GB). To restore to Friday state, you need to replay 5 days of oplog into 2TB database, which takes 12 hours. Also, the oplog is stored on the same SAN as the database—if SAN fails, both database and oplog backups are lost. Design a more resilient strategy.
Issues: (1) Slow oplog replay (12 hours)—oplog application is sequential, can't parallelize; (2) Single point of failure (SAN)—if SAN fails, database and all backups lost; (3) Large oplog (500GB)—5 days of oplog is huge, indicates high write volume or inefficient oplog storage.
Resilience improvements: (1) Geographic replication: keep backup on separate site/SAN. After Sunday full backup, replicate backups to cloud storage (S3, GCS) or secondary site immediately. Protects against site-level failure; (2) Optimize oplog sizing: MongoDB's oplog is capped collection taking 5% of available RAM (default). Increase to 10-20% to extend oplog retention to 10-14 days: `rs.conf().settings.oplogSizeMB = 20000`. Longer oplog reduces need for frequent full backups; (3) Point-in-time restore without oplog: instead of replaying 5 days of oplog, keep multiple full backups: Sunday, Wednesday, Friday. Restore to any of these directly without oplog replay. Trade storage (extra backups) for restore speed; (4) Async oplog replay: don't wait for oplog to finish replaying. Open database for reads after primary data is restored, replay oplog in background. Queries may see slightly stale data but services remain up.
Recommended: Keep full backups on Monday, Thursday, Sunday (3 backups, covers any day). Plus incremental oplog (max 7 days stored). To restore to Friday, use Thursday backup + 1 day oplog replay (much faster than 5 days).
Follow-up: How would you design a backup schedule that minimizes storage while maintaining <4 hour restore time for any date in the past 30 days?
Your MongoDB instance has a critical database with customer data. You backup to an external service daily. However, one day a developer runs a database-level drop command (wipes all databases). You try to restore but realize the backup was 24 hours old—the system has 24 hours of data loss (all transactions from yesterday). Your customer retention dropped 5%. Design backup architecture to reduce RPO (Recovery Point Objective) to <1 hour.
24-hour RPO is legacy backup strategy. Production systems require continuous backup. Strategies to reduce RPO: (1) Continuous oplog backup: stream oplog to external service (AWS S3, Google Cloud Storage) in real-time as operations occur. RPO = 0 (no data loss). Implementation: use change streams or oplog tailers to push entries to cloud storage continuously; (2) Multi-region replication: replicate to secondary region (different cloud region) in real-time. If primary fails, failover to secondary (already has latest data). RPO = 0 if replication is synchronous (write concern "majority" across regions); (3) Hourly backups: instead of daily backups, take backups every hour: 2 AM, 3 AM, ..., 11 PM. Worst-case RPO = 1 hour. But 24 backups accumulate—need retention policy; (4) Incremental hourly backups: hourly backup stores only changes from previous hour (using oplog or binary diff). Reduces storage (not 24x) while maintaining 1-hour RPO.
For your case: implement continuous oplog backup. Every MongoDB operation writes to oplog. Stream oplog to S3 using mongodump --oplog continuously or custom tailer. In case of drop, restore full backup (from yesterday) + replay oplog from past 24 hours (all changes preserved). No data loss. RPO = 0.
Verification: To test restore, simulate drop: `db.dropDatabase()`, restore from backup, replay oplog. Verify all data appears.
Follow-up: Implement continuous oplog archival to S3 that guarantees no gaps in oplog history and handles MongoDB restarts gracefully.
Your backup strategy uses Binary log (binlog) streaming to S3. You backup oplog entries continuously to S3. However, you discover that during a network partition between MongoDB and S3 (lasting 30 minutes), 50K oplog entries weren't streamed to S3. When you restored from backup and replayed available oplog, those 50K entries were lost—inconsistent with source. How would you guarantee backup consistency across network failures?
The problem: oplog tailer streams to S3, but if S3 upload fails (network issue), the entries are still in oplog on server. After 24-48 hours (oplog retention), those entries expire from local oplog—now they're lost forever (not in backup, not in local oplog).
Consistency guarantee: (1) Dual backup destination: stream oplog to both S3 AND local disk simultaneously. If S3 fails, local disk has data. Periodically sync local disk to S3. As long as one destination succeeds per write, data is preserved; (2) Backup with acknowledgment: don't remove entry from oplog until S3 upload is acknowledged. Hold entries in local buffer. Once S3 confirms, the entry can age out of oplog safely. If S3 fails for extended period, local buffer fills up (set max buffer size, e.g., 10GB). When buffer is full, stop accepting new writes (backpressure) until S3 recovers; (3) Sync upload before oplog expiry: periodically check S3 backup timestamps. If S3 hasn't received updates in past oplog retention window (e.g., 24 hours), manually trigger full backup and sync to S3; (4) Store resume cursors: track the last oplog timestamp successfully backed up. Restart from cursor. If 50K entries failed to upload, next attempt resumes from last successfully backed up timestamp, retrying those 50K entries.
Recommended: dual destination (S3 + local disk) with resume cursor. On network recovery, retry failed batch. Code: `try { await s3.uploadOploEntry(entry); recordCheckpoint(entry._id); } catch (err) { if (!localDisk.write(entry)) backpressure.stop(); }`
Follow-up: Design a backup system that provides RPO=0 while handling extended outages (network down for 1 week) without stopping production writes.
You implement a backup strategy where you replicate data to a standby MongoDB cluster in a different region (async replication). The standby is read-only and kept 2 hours behind primary (to protect against logical errors—if someone runs drop database, the standby won't have it yet, giving 2 hours to stop replication before applying bad changes). However, if the primary fails and you failover to standby, customer data from the last 2 hours is lost. How would you maintain standby safety (protection from logical errors) while minimizing RPO?
The tradeoff: standby lag protects from logical errors (accidental drop, mass delete) by giving time to detect and stop replication. But it increases RPO. With 2-hour lag, worst-case data loss is 2 hours.
Better design: (1) Separate backup vs. standby roles: use standby for high availability (zero lag, sync replication) for RTO, but keep separate backup replica with 2-hour lag for logical error protection. If logical error occurs, restore from lagged backup. If standby fails, have fast failover to primary; (2) Semantic replication validation: instead of time-based lag, validate changes before replication. Example: if drop command arrives, validate by checking if it matches expected pattern (e.g., only drop test_* databases). Reject unexpected drops before applying to standby; (3) Oplog checkpoints: maintain oplog position markers (checkpoints) every 10 minutes. If logical error detected, rollback to last good checkpoint (last 10 minutes). Combine with standby: standby is 10 minutes behind, can detect issues quickly without 2-hour delay; (4) Change stream filtering: on standby, use change stream aggregation to filter out dangerous operations: `watch([{$match: {operationType: {$nin: ["drop", "dropDatabase", "delete"]}}}])`. Allows replication to flow but blocks dangerous operations.
For your case: use semantic validation. If drop database command appears, check if database name matches approved pattern (e.g., test_*, staging_*). If not approved, alert and block replication. Standby continues syncing non-blocked operations in real-time (RPO = 0). Data loss protection comes from validation, not lag.
Follow-up: Design a backup strategy that provides both RPO < 1 hour and RTO < 5 minutes for a 10TB production database across 2 regions.
You receive a compliance requirement: all backups must be encrypted and stored in a separate geographic region from the primary database. Your current backup uses `mongodump --archive | gzip > /backups/db.archive.gz` stored on NFS mounted in the same data center. You upgrade to: `mongodump --archive | openssl enc -aes-256-cbc -pass file:/secure/key.txt | aws s3 cp - s3://backup-bucket/db.archive.enc`. However, now restore is complicated: you must decrypt before mongorestore. Also, encryption key management is manual. Design an enterprise backup strategy.
Issues with manual encryption: (1) Key management is manual (risky, easy to lose keys); (2) Restore requires manual decryption steps (error-prone); (3) No versioning or audit trail for backups; (4) No automated retention policy (backups accumulate indefinitely).
Enterprise approach: (1) Use managed backup service: MongoDB Cloud Backup (MongoDB Atlas) or AWS Backup provides encryption, key rotation, audit logging, retention policies, and one-click restore. They handle encryption transparently; (2) If self-hosted, use hardware security module (HSM) for key management: store encryption keys in HSM (not in files), HSM handles encryption/decryption. Application never sees raw key; (3) Implement backup lifecycle: `aws s3 lifecycle put-bucket-lifecycle-configuration --bucket backup-bucket --lifecycle-configuration '{"Rules": [{"ID": "delete-old-backups", "Status": "Enabled", "ExpirationInDays": 90}]}'` auto-deletes backups older than 90 days (compliance retention); (4) Audit trail: enable S3 access logging, CloudTrail logging for all backup operations. Compliance can audit who accessed what backups and when.
For compliance: use AWS Backup + KMS (Key Management Service). AWS Backup automatically encrypts backups with KMS keys, provides audit logging, and handles retention. One-line restore: `aws backup restore-test-database --backup-vault backup-vault --backup-id arn:...`
Follow-up: Design a backup system that meets SOC2 compliance requirements (data encryption, audit logging, retention policies, access control).
Your backup archive has grown to 500GB. Restoring takes 10 hours because MongoDB processes each insert sequentially. You want to parallelize restore to 4 hours. However, MongoDB 4.4's `mongorestore` has limited parallelization (only per-collection, not per-document). How would you speed up restore without upgrading?
Sequential restore limitation: mongorestore processes inserts one at a time by design (maintains insertion order, ensures consistency). Parallelizing per-collection helps if you have many small collections but not for few large collections.
Workarounds: (1) Split large collections before backup: if users collection is 400GB, partition into users_0, users_1, ..., users_9 (40GB each) before backup. Restore each partition in parallel across 4 threads: each thread restores 1-2 partitions. Total time: 40GB / (4 threads * bandwidth) ≈ reduces to ~3 hours; (2) Use streaming inserts: instead of mongorestore, write a custom restore script that reads archive and issues insertMany in batches: `collection.insertMany(batch, {ordered: false})` (unordered inserts are parallelized internally by MongoDB). Write script to handle ~10K documents per batch; (3) Upgrade MongoDB: MongoDB 5.0+ has better parallelization, native parallel restore support; (4) Use secondary as restore target: restore to unsharded secondary replica (single node), let it catch up replication from primary. Restore (write order doesn't matter, just needs to reach eventual consistency). Then promote secondary or sync back to primary; (5) Pre-populate from cloud: if backup is in S3, restore to new EC2 instance in same region (low-latency) in parallel, then stream to production (network-bound, not CPU-bound).
Recommended for your case: partition large collections and restore each in parallel. Script: `for i in {0..9}; do mongorestore archive_part_$i.archive --jobs 4 & done; wait`
Follow-up: Design a zero-downtime restore strategy where you restore to a shadow environment and switch traffic without interruption.