MongoDB Interview — Query Explain and Performance Tuning

Your application queries are slow. You run `db.orders.find({status: "completed", amount: {$gte: 100}}).explain("executionStats")` and see: "executionStats": {totalDocsExamined: 10M, nReturned: 500K, executionStages: {stage: "COLLSCAN"}}. This means MongoDB scanned 10M documents to return 500K—a 20:1 ratio, very inefficient. You create an index {status: 1, amount: 1}. Explain now shows totalDocsExamined: 500K, matching nReturned. But queries still feel slow (2 seconds). What else would you optimize?

Index is working (COLLSCAN reduced to IXSCAN), but 500K returned documents is still slow to transmit and process. Optimizations: (1) Add projection to reduce document size: `.project({_id: 1, orderId: 1, amount: 1})` instead of returning full document (reduce network transfer); (2) Add skip/limit: `.limit(100)` if application only needs first 100 results; (3) Check if results are sorted correctly: if query sorts by `amount` and index is {status: 1, amount: 1}, index can be used for sort—verify with explain looking for SORT stage. If SORT stage exists, add sorting to index; (4) Use aggregation pipeline to filter early: `db.orders.aggregate([{$match: {status, amount: {$gte: 100}}}, {$limit: 100}, {$project: {orderId, amount}}])` applies limit after match, avoiding processing all 500K docs; (5) Add timeout: `maxTimeMS(5000)` to abort slow queries and surface issue to application.

In this case: use projection + limit. Example: `.find({status: "completed", amount: {$gte: 100}}, {_id: 1, orderId: 1, amount: 1}).limit(100)`. This should be <100ms. Verify explain shows: IXSCAN stage with 100 nReturned and 100 totalDocsExamined (perfect index usage).

Follow-up: If pagination requires returning ALL 500K results (10 pages of 50K each), how would you optimize without changing the index?

You optimize a slow query by adding an index. Explain shows the query now uses index correctly, but when you run it in production with 100 concurrent users, latency spikes to 10+ seconds. Individual explain execution takes 100ms. Explain stats show: "executionStages": {stage: "IXSCAN", "nReturned": 50K, keysExamined: 50K, "executionStats": {executionStages: [...]}} with multiple stages (IXSCAN, FETCH, SORT). Why does explain show fast but production is slow?

Explain with default parameters uses a small sample/subset of data for estimation. Under 100 concurrent queries, you're hitting different bottlenecks: (1) Lock contention: 100 concurrent reads compete for locks on collections/indexes, causing queueing; (2) Cache eviction: working set (all 50K docs per query) doesn't fit in cache with 100 queries running simultaneously (5M docs total). WiredTiger evicts cache pages aggressively, causing page faults; (3) CPU saturation: 100 queries each doing SORT in-memory exhausts CPU; (4) Network saturation: returning 50K documents * 100 concurrent connections saturates network bandwidth.

Investigation: (1) Run `db.currentOp()` to see all active operations. Check secs_running and micros_running—if consistently >5 seconds, you're hitting lock waits; (2) Monitor `db.serverStatus().wiredTiger.cache` for eviction rates and page faults; (3) Check CPU usage—if 100%, CPU is bottleneck; (4) Profile slow queries: `db.setProfilingLevel(1, {slowms: 1000})` logs queries taking >1s, then `db.system.profile.find().sort({millis: -1}).limit(10)` to see slowest queries.

Fixes: (1) Reduce result set size: add $limit to aggregation to return 1000 docs instead of 50K; (2) Move SORT outside MongoDB: query returns unsorted docs quickly, sort in application code; (3) Use read preference SECONDARY to distribute load across replicas.

Follow-up: How would you design a query benchmarking system to catch performance regressions before they hit production under load?

You inherit a legacy application with 50 queries accessing the users collection. For performance, you want to create one "god index" on all 10 fields involved in any query predicate: {email: 1, status: 1, country: 1, age: 1, role: 1, lastLogin: 1, ...}. You think this way, any query on any combination of these fields will use the index. However, after creating this index, you notice some queries are actually slower than before (using COLLSCAN). Why would a compound index with all fields make some queries slower?

Compound indexes have strict field ordering (ESR rule). Your 10-field index is ordered left-to-right: {email: 1, status: 1, country: 1, ...}. MongoDB can only use the index if the query's predicates match the index's left-to-right order (or a prefix of it).

Example: query `db.users.find({age: 30, country: "US"})` (middle/end of index) can't use the 10-field index efficiently. MongoDB would need to scan the entire index (comparing all 10 fields) to find matches on just age and country—slower than COLLSCAN which stops early on first match. So MongoDB chooses COLLSCAN instead.

Solution: Create multiple focused indexes instead of one "god index". (1) Profile all 50 queries: identify which fields each uses in WHERE clause; (2) Group by common patterns: if 20 queries filter by {email, ...}, create {email: 1}; if 15 queries filter by {status, country, ...}, create {status: 1, country: 1}; (3) Accept some queries using COLLSCAN if they're rare or return small result sets.

Index design rule: (1) Identify top 80% of queries (by frequency/importance); (2) Create indexes to support them; (3) Let remaining 20% use COLLSCAN if they're infrequent. Don't create indexes for every possible query combination.

Verify: Run explain on each of your 50 queries with the god index. If any show COLLSCAN (or IXSCAN with huge keysExamined vs nReturned), remove god index and create specific indexes.

Follow-up: Design an automated system to analyze 100+ queries and recommend the minimal set of indexes to optimize them all.

You run an explain on a find query with sort and see: "executionStages": {stage: "SORT", input: {stage: "IXSCAN"}}. This means MongoDB uses IXSCAN to find matching documents, then sorts results in-memory (SORT stage). The SORT is consuming CPU and memory. Your index is {status: 1, createdAt: 1} and query is: `find({status: "active"}).sort({amount: -1})`. Why isn't the index used for sorting?

The index {status: 1, createdAt: 1} is used for filtering on status but can't be used for sorting by amount (amount isn't in the index). To use index for sorting, the sort field must be in the index in the correct position: {status: 1, amount: -1}.

MongoDB can use index for sort only if: (1) Sort field appears after all equality fields in the index; (2) Sort order matches index order (ascending in index, ascending in sort, or both descending).

Fix: Change index to {status: 1, amount: -1} to support both filter and sort. Query becomes: find({status: "active"}).sort({amount: -1}) will now use index for both filtering and sorting—no SORT stage.

Verify: Run explain on modified query. "executionStages" should show only IXSCAN with stage: "IXSCAN" (no nested SORT). If SORT stage disappears, sorting is handled by index—good.

Alternative: if you can't change index (other queries depend on createdAt), move SORT to application code after fetch. Trade MongoDB CPU for application CPU (usually cheaper to scale application tier).

Follow-up: If you have 5 different queries, each sorting by different fields, can you create one index to support all sorts efficiently? Why or why not?

You notice a query that returns 5K documents but scans 1M documents (totalDocsExamined >> nReturned). The index exists but appears inefficient. Explain shows: "executionStages": {stage: "IXSCAN", "keysExamined": 1M, "nReturned": 5K}. This means MongoDB is scanning all 1M index keys but only returning 5K. The query filter looks selective: `{userId: "123", status: {$in: ["completed", "pending", "shipped"]}}`. Why is the index not selective enough?

The query filters on userId (should be selective) but also on status with $in operator matching 3 values. If the index is {userId: 1, status: 1}, MongoDB uses the index to find userId="123" (selective), but then must examine all 3 status values: "completed", "pending", "shipped". If userId has 1M documents and status is distributed evenly (300K per value), the index returns 1M keys (300K for each status value) before filtering—not selective.

Better index: if most queries use userId + specific status combinations, create index {userId: 1, status: 1} (what you likely have already). The issue is $in operator on status isn't index-optimizable for $in with multiple values—MongoDB must check each value.

Optimization: (1) Use explicit or instead of $in: `.find({userId: "123", $or: [{status: "completed"}, {status: "pending"}, {status: "shipped"}]})` might hint MongoDB to use index more efficiently (often doesn't help, but try); (2) If one status is much more common (e.g., "completed" is 80%), split query: `find({userId, status: "completed"})` first, then add remaining statuses separately in application code; (3) Accept the inefficiency if 5K returned out of 1M examined is acceptable latency (e.g., if query completes in 100ms, leave it); (4) Reduce cardinality: if userId="123" actually matches 50K documents instead of 1M, that's better. Check if userId is selective enough in your data.

Follow-up: How would you redesign the schema or query to make status filtering more index-optimizable?

You have a query that uses a multi-key index on an array field: `{tags: [...]}` with index {tags: 1}. Query: `find({tags: {$all: ["urgent", "customer-facing"]}})`. Explain shows: "executionStats": {keysExamined: 100K, nReturned: 500, totalDocsExamined: 500}. This means MongoDB examined 100K index keys but only 500 documents and returned 500. Why are keysExamined so much higher than doc count?

Multi-key indexes create one index entry per array element. Document {_id: 1, tags: ["urgent", "customer-facing", "bug"]} creates 3 index entries. Query {tags: {$all: ["urgent", "customer-facing"]}} means "find docs with both tags present".

MongoDB uses index as follows: look up "urgent" in index (returns 200 index entries/docs), look up "customer-facing" in index (returns 100 index entries/docs), intersect results (takes 500 docs). But to find the intersection, MongoDB examines all 200+100+... keys from the union (100K keys total across all lookups), even though final result is 500 docs.

This is normal for multi-key indexes with $all. The keysExamined >> nReturned isn't necessarily bad—it reflects the multi-key nature. However, to verify efficiency: (1) Check nReturned vs totalDocsExamined—they should be close (500 vs 500 is good). If totalDocsExamined >> nReturned, then inefficiency exists; (2) Verify keysExamined is reasonable. For "urgent" + "customer-facing" on 500 docs total across 3 tags average, 100K keys means each doc is being examined multiple times—acceptable for multi-key.

Optimization if slow: (1) Use exact tag match (single tag): if queries usually filter one specific tag first, query {tags: "urgent"} uses index more efficiently; (2) Pre-compute frequently-used tag combinations: add field {tagCombos: ["urgent+customer-facing"]} and index that instead of array index; (3) Use search engine: switch to Atlas Search for complex multi-tag queries.

Follow-up: How would you optimize queries on large tag arrays (documents with 100+ tags, searching for multi-tag combinations)?

You notice a query that sometimes is fast (100ms) and sometimes slow (5000ms), with same parameters and data. Running explain multiple times shows different execution plans: sometimes uses index A, sometimes index B, sometimes COLLSCAN. Your indexes are: {userId: 1} and {status: 1}, and query is: `find({userId: "123", status: "active"})`. Why is MongoDB choosing different plans?

MongoDB's query planner uses adaptive plan selection. For queries matching multiple indexes, it explores which index is faster by running plan candidates in parallel (query is executed with different index plans simultaneously). The fastest plan wins and is cached. However, if cached plan becomes stale (index statistics change, data distribution shifts), MongoDB re-explores plans.

With your two indexes {userId: 1} and {status: 1}, MongoDB might choose either depending on: (1) Index statistics: if userId is very selective (100 docs) but status matches 1M docs, userId index wins. If data shifts and status becomes more selective, status index might win; (2) Memory/cache state: if userId index is cached in memory and status index isn't, userId index is faster (even if less efficient) due to cache hits; (3) Plan cache invalidation: after ~100 queries or after collection modifications, MongoDB invalidates plan cache and re-explores.

Fix: (1) Create a compound index {userId: 1, status: 1} that works for both predicates, removing ambiguity. MongoDB will use the compound index consistently; (2) Hint the index: `.find({userId, status}).hint({userId: 1, status: 1})` forces MongoDB to use a specific index; (3) Accept the variability if latency is acceptable (100ms vs 5s is bad, but 200ms vs 300ms is fine).

Debug: Run explain with `executionStats` multiple times and collect execution plans. If they differ, check MongoDB version (older versions have worse plan selection) and consider upgrading or implementing compound index.

Follow-up: How would you implement a query telemetry system to detect and alert on plan changes?

You have a slow aggregation pipeline: `db.orders.aggregate([{$match: {status: "completed"}}, {$lookup: {from: "customers", localField: "customerId", foreignField: "_id", as: "customer"}}, {$sort: {amount: -1}}, {$limit: 100}])`. Explain shows: "executionStats": {totalDocsExamined: 10M, nReturned: 100} with stages showing $match uses COLLSCAN, then $lookup for all 10M matched docs. Why isn't the aggregation optimized?

The issue: $match stage doesn't have an index on status, so it COLLSCAN all docs (10M). Then $lookup joins each of 10M orders with customers (10M lookups). Even though final result is 100 docs (after $limit), MongoDB processes all 10M docs first.

MongoDB's aggregation optimizer should push $limit down (early termination after 100 matched docs), but it doesn't always work if previous stages are expansive ($lookup multiplies docs).

Optimizations: (1) Add index on status: `db.orders.createIndex({status: 1})`. Now $match uses IXSCAN, reducing docs scanned. Optimizer can push $limit earlier; (2) Reorder pipeline: move $limit before $lookup if possible (but you need customers field, so can't reorder here); (3) Use $limit before $lookup: `[{$match: {...}}, {$limit: 100}, {$lookup: ...}]` limits to 100 docs before joining. Trade: you might miss some results if limit is applied before sorting, but 100x faster. If you need top 100 by amount, sort before lookup won't work, but you can sort after lookup: `[{$match: {...}}, {$lookup: ...}, {$sort: {amount: -1}}, {$limit: 100}]`; (4) Batch lookups: split into smaller batches, process each batch separately in application code.

Recommended: add index on status, reorder to limit before expensive operations when semantically valid. Verify with explain that $match uses IXSCAN and stages process fewer docs.

Follow-up: Design an aggregation pipeline analyzer that recommends stage reordering and index creation for performance.