MongoDB Interview — Profiler and Slow Query Analysis

Your MongoDB profiler is disabled in production to avoid overhead. You notice application is slow (2-3 second latencies) but can't identify which queries are problematic. Your ops team enables profiler: `db.setProfilingLevel(1, {slowms: 500})` to log queries >500ms. After 1 hour, profiler overhead causes CPU to spike to 95%, and application becomes slower (ironically). Profiler logging itself is now a bottleneck. How would you profile production without impacting performance?

MongoDB profiler adds overhead: every query is logged to system.profile collection. With 10K queries/sec, profiler must insert 10K documents/sec to system.profile—significant I/O and CPU cost. Enabling profiler in production can make performance worse.

Low-overhead profiling strategies: (1) Sampling: enable profiler on only 10% of queries with probabilistic sampling. MongoDB 4.4+ supports `sampleRate` in profiling: `db.setProfilingLevel(1, {slowms: 500, sampleRate: 0.1})` profiles only 10% of queries. 90% reduction in profiler overhead; (2) Increase slowms threshold: instead of 500ms, use 5000ms (profile only very slow queries). Most queries are fast, profiler only logs true outliers; (3) Query logging on application side: replace MongoDB profiler with application-level logging. Wrap MongoDB queries with timing: `const start = Date.now(); const result = await collection.find(...); const latency = Date.now() - start; logger.info({query, latency})` if latency > threshold. Application logging is often faster than MongoDB profiler (less contention); (4) Use MongoDB Atlas or Cloud-based solutions: they have managed profiling with low overhead (offloaded to separate system); (5) Profile on SECONDARY or test environment: enable full profiling on a replica SECONDARY that's not serving production traffic. Analyze SECONDARY profiles, apply fixes to PRIMARY; (6) Time-based profiling: enable profiler only during off-peak hours (e.g., 2-4 AM) when load is low, profile captures queries without impacting production.

Recommended: application-level sampling (#3) + increased slowms threshold (#2). Log queries > 1000ms with 1% sampling: catches real problems without overhead.

Follow-up: Design a profiling system that captures representative slow queries without impacting a production cluster handling 100K req/sec.

You enable `db.setProfilingLevel(1, {slowms: 100})` to capture slow queries. After 1 hour, you query `db.system.profile.find({millis: {$gt: 1000}}).sort({millis: -1}).limit(100)` expecting to see top 100 slowest queries. However, results are inconsistent: same query sometimes appears as 100ms, sometimes as 2000ms (100x variation). Why are profile timings unreliable?

Profile timings have multiple sources of variation: (1) Cold cache: first query execution might be cold (data not in cache), future runs warm (data cached). Same query: first run 2000ms, subsequent runs 100ms; (2) Concurrent load: when server is busy, query gets delayed in queue before execution. Profiler records query latency including queue wait time, not just execution time. Under load, same query takes longer; (3) Lock contention: if write operations are heavy, query is blocked waiting for locks. Profiler records wall-clock time (includes lock waits), not CPU execution time; (4) Clock skew or measurement error: profiler uses wall-clock timestamps which can vary with system clock adjustments; (5) Sampling: if profiler is sampled (10% of queries), you're seeing subset of queries, potentially biased toward slow ones (slower queries more likely to be sampled due to profiler's internal logic).

Better analysis: (1) Look at execution stage breakdown, not just total latency. Profiler shows `"executionStages"` with timing per stage (IXSCAN 50ms, FETCH 100ms, SORT 1800ms). Identifies true bottleneck; (2) Use explain() on specific queries instead of profiler. Run explain with specific parameters to understand plan: `db.collection.find({...}).explain("executionStats")` shows keysExamined, docsExamined, execution stages—more reliable than profiler timing; (3) Aggregate profiler data: instead of looking at single query instances, aggregate: `db.system.profile.aggregate([{$match: {...}}, {$group: {_id: "$command", avg_millis: {$avg: "$millis"}, p99_millis: {$percentile: [$millis, 0.99]}}}])` gives statistical view (average, percentile), smoothing out outliers; (4) Log independent metrics: CPU usage, lock time, disk I/O during query. Cross-reference with profiler to understand if latency is query-internal or environmental.

Follow-up: Design a query profiling system that distinguishes between query execution time, lock wait time, and cache miss latency.

Your profiler shows a complex aggregation pipeline taking 30 seconds. Profiler output: `"executionStages": {stage: "SORT", input: {stage: "GROUP", input: {...}}}`. This shows stages but doesn't clearly identify which stage is bottleneck. You know SORT is there but SORT usually takes milliseconds. Where's the 30 seconds? Profiler doesn't show per-stage timing. How would you identify the bottleneck?

MongoDB profiler shows execution stage tree but not per-stage timing in all versions. To identify bottleneck: (1) Use explain with executionStats: `db.collection.aggregate([...], {explain: true})` returns detailed executionStats. Look for "executionStages" and examine each stage for "executionTimeMillis" (MongoDB 5.0+) or estimate based on "nReturned" and "totalDocsExamined" (higher ratio = more efficient stage); (2) Run aggregation with $facet to isolate stages: split pipeline into separate branches, time each: `aggregate([{$facet: {group_branch: [{$group: {...}}], sort_branch: [{$sort: {...}}]}}])` runs both in parallel, shows which is slower; (3) Incrementally add stages and time: run pipeline with just first stage, then first+second, etc. Time each run to identify which stage causes jump in latency; (4) Monitor resource usage during execution: while aggregation runs, check `db.currentOp()` to see stages in progress. Check CPU usage, disk I/O, memory—might reveal if bottleneck is I/O (slow disk) or CPU (expensive computation); (5) Check stage output volume: large intermediate results slow down downstream stages. For example, if $group produces 1M intermediate documents, $sort on 1M docs is expensive. Profiler shows "nReturned" per stage—if huge jump, that stage is producing large result set impacting downstream.

For your 30-second aggregation: check "nReturned" for GROUP stage. If GROUP produces 10M documents and SORT is sorting them, SORT takes 25 seconds (reasonable). If GROUP produces 100K docs and SORT takes 25 seconds, SORT is bottleneck (should be <1 second for 100K docs).

Follow-up: Design an aggregation performance tuning system that automatically suggests query rewrites to eliminate slow stages.

You investigate a slow query from profiler: `db.orders.find({customerId: "123", status: "completed"}).sort({createdAt: -1}).limit(100)`. Profiler shows: millis: 5000, keysExamined: 50M, nReturned: 100. This means MongoDB examined 50M index keys to return 100 documents (500K:1 ratio, very inefficient). You add an index {customerId: 1, status: 1, createdAt: -1} thinking this should help. However, profiler still shows keysExamined: 50M after adding the index. Why didn't the new index help?

Index {customerId: 1, status: 1, createdAt: -1} should filter efficiently. If keysExamined is still 50M, either: (1) MongoDB chose different index: MongoDB's query planner might prefer existing index (e.g., {customerId: 1}) over new compound index if planner thinks it's faster based on stats. Hint the index: `find({...}).hint({customerId: 1, status: 1, createdAt: -1})` forces MongoDB to use new index; (2) Query doesn't match index field order: if query is {customerId, status} but index has {status, customerId}, the order matters. MongoDB uses left-to-right field matching; (3) Index isn't used for sorting: if new index is {customerId: 1, status: 1, createdAt: -1} but query filters on customerId+status and sorts by createdAt, explain() should show index used for both. If SORT stage still appears, index isn't helping sort; (4) Stale statistics: explain() uses index statistics that might be outdated. Rebuild stats: `db.collection.reIndex()` or run `analyze` command to refresh.

Debug approach: (1) Run explain("executionStats") with new index: `find({customerId, status}).hint({customerId: 1, status: 1, createdAt: -1}).explain("executionStats")`. Compare keysExamined before/after index addition; (2) If still 50M, check if filter predicates are selective. If 50M documents match {customerId: "123", status: "completed"}, you must scan all 50M (no more efficient). Problem is data cardinality, not index; (3) If keysExamined drops but returns still high, bottleneck shifted to SORT stage or memory constraints.

Follow-up: How would you design a system to automatically detect and suggest index improvements for slow queries without requiring manual analysis?

Your MongoDB has 1000 collections. You enable profiling on all with `db.setProfilingLevel(1, {slowms: 1000})`. After 1 week, the system.profile collection has grown to 500GB (profiler collections don't have size limits by default). MongoDB now spends significant resources managing profile collection (indexes, queries on profile itself are slow). Disk space is constrained. How would you manage profiler collection growth?

Profiler collection can grow without bounds. With 1000 collections and profiling enabled, system.profile becomes a dumping ground.

Management strategies: (1) Use capped collection: profiler is usually configured as capped (auto-rolls over), but if it's not: drop current profile collection and recreate as capped: `db.setProfilingLevel(0); db.system.profile.drop(); db.createCollection("system.profile", {capped: true, size: 1000000000, max: 1000000}); db.setProfilingLevel(1)`. Capped at 1GB max size (adjust to suit); (2) Archive and prune: move old profiler entries to archive collection: `db.system.profile.aggregate([{$match: {ts: {$lt: new Date(Date.now() - 7*24*60*60*1000)}}}, {$out: "system.profile.archive"}])`, then delete archived: `db.system.profile.deleteMany({ts: {$lt: ...}}`; (3) Reduce sampling/threshold: instead of profiling all queries >1000ms, increase threshold to 5000ms or reduce sample rate. Log fewer queries = smaller profile collection; (4) Use external profiling: disable MongoDB profiler, log to external system (ELK, DataDog, Prometheus). Application-level logging keeps MongoDB clean; (5) Index profile collection: if you're querying profile heavily, add indexes on {ts}, {millis} to speed up queries and reduce bloat from slow queries on profile itself: `db.system.profile.createIndex({ts: -1})`.

Recommended: convert to capped collection (1GB max) + increase slowms threshold to 5000ms. This keeps profile manageable and focuses on true outliers.

Follow-up: Design a profiling and analytics pipeline that captures slow queries long-term without overwhelming MongoDB storage.

You're debugging a transaction that's slow: `db.collection.find({...}).explain()` shows fast (100ms), but inside a transaction it takes 3000ms. Profiler shows different stages: inside transaction, a COLLSCAN appears instead of IXSCAN. Same collection, same query, different plan inside vs outside transaction. Why do transactions use different query plans?

Transactions have isolation requirements: they read a snapshot of data at a specific point-in-time. This snapshot might have different index statistics or query optimizer decisions than current data. Additionally, transactions hold locks that prevent normal index selection.

Reasons for plan difference: (1) Snapshot isolation: transaction reads from snapshot, which might have stale optimizer statistics. Index stats say "index is good for this query" but snapshot's stats are old, so optimizer chooses different plan; (2) Lock-induced replanning: locks held by transaction prevent index rebalancing or operations that might affect optimizer. Query planner makes conservative choices to avoid blocking; (3) Transaction isolation level: READ_COMMITTED vs SNAPSHOT_ISOLATION might use different plans. SNAPSHOT_ISOLATION can't use certain optimizations that READ_COMMITTED uses; (4) Cursor hints: transaction might be using implicit hint that differs from standalone query.

Debug: (1) Run explain inside transaction: `session.startTransaction(); db.collection.find({...}).explain(...); session.abortTransaction()`. Compare explain output inside vs outside transaction; (2) Check if index exists at transaction snapshot: `db.collection.getIndexes()` might differ if indexes were created after transaction started; (3) Disable transaction, see if performance improves: use find() without transaction, confirm query is fast. If fast, transaction overhead confirmed; (4) Try forcing index inside transaction: `find({...}).hint(indexName)` inside transaction to override planner choice.

Mitigation: (1) Pre-warm indexes before transaction by querying; (2) Use explicit hints in transaction queries; (3) Keep transaction small (few documents) to minimize lock-time and planner overhead.

Follow-up: Design a transaction profiling system that alerts when transaction latency exceeds expected query latency by >10x.

You use profiler to analyze application performance. Profiler shows average query latency is 50ms, but application users report 5-second latencies. You run the same queries directly in mongosh and confirm they complete in 50ms. Why the 100x discrepancy? Is profiler lying?

Profiler measures MongoDB query execution time only. It doesn't include: (1) Network latency: query sent from app to MongoDB server takes time (especially if far apart). mongosh is local (or same-region), app might be remote; (2) Connection pool wait: if connection pool is exhausted, app query waits in queue before being sent to MongoDB. Profiler sees query execution (50ms) but app waits 4 seconds for connection; (3) Driver overhead: MongoDB driver adds overhead (BSON encoding, decoding, cursor management). mongosh is lean, app driver might be heavy; (4) Application-level processing: app might fetch results, process in memory (sort, filter, aggregate in app), send back to user. Profiler sees fetch (50ms) but processing takes 4 seconds; (5) Replication lag: profiler shows PRIMARY latency (50ms) but if app is reading from SECONDARY with lag, SECONDARY takes longer; (6) GC pauses: if app runs Java/Python with GC, query completes but GC pause stalls response (user sees 5 seconds, MongoDB saw 50ms query).

Diagnosis: (1) Add application-level timing: wrap queries with start/end time logging in app code. Compare app latency vs profiler latency to see where gap is; (2) Check connection pool: if connection_wait > 4 seconds, pool is saturated; (3) Network latency: ping MongoDB server from app host, should be <10ms (same region). If >100ms, network is issue; (4) Monitor GC: if app is Java, enable GC logs, check for pauses during query; (5) Check SECONDARY lag: if app reads from SECONDARY, query might stall waiting for replication.

Follow-up: Design a comprehensive latency attribution system that breaks down 5-second response into network + queue + execution + app processing + GC components.

Your profiler shows a query taking 1000ms. You check the explain() output and see: "planSummary": "IXSCAN", "executionStats": {keysExamined: 100, nReturned: 100, "totalDocsExamined": 100}. This looks perfect (index used, only 100 docs scanned/returned). Why is 100 documents taking 1000ms when they should take <10ms?

Perfect IXSCAN stats but slow execution suggests: (1) CPU contention: 100 documents take 10ms CPU, but server is busy (100 other queries running). Query waits for CPU, profiler shows 1000ms wall-clock time (includes wait); (2) Document I/O: keysExamined=100, nReturned=100 is efficient, but each document requires disk I/O to fetch. If 100 documents are on different disk pages, 100 page faults = slow. Index has keys, but fetching document bodies requires disk access; (3) $group or sorting downstream: profiler shows IXSCAN but doesn't show downstream stages. If query has $group or $sort in pipeline after IXSCAN, those stages might be slow; (4) Lock waits: query holds lock, but other operations (writes) are blocked waiting for lock. Query latency increases; (5) Network batch size: 100 documents sent over network in multiple TCP packets if each document is large. 100 documents * 100KB each = 10MB transferred, which takes time over network (especially WAN).

Diagnosis: (1) Check execution stages in profiler: if stage tree shows GROUP or SORT after IXSCAN, those are bottleneck; (2) Check document size: if documents are large (1MB each), fetching 100 takes longer than small documents; (3) Check lock stats: `db.serverStatus().locks` shows lock statistics. If lock wait time is high, that's issue; (4) Run explain with server stats: `explain("executionStats")` shows more detailed timing per stage (some MongoDB versions); (5) Check if document is in cache: WiredTiger cache hit vs miss. If 100 documents all miss cache, disk I/O is slow.

Follow-up: Design a profiling system that separates CPU time, I/O time, lock wait time, and network time for queries.