Python Interview — Memory Management and Garbage Collection

A production service processes 10M objects/hour. Memory grows from 500MB to 2GB over 6 hours, then stabilizes. GC collection times spike from 50ms to 500ms during the memory bump. What's the GC behavior and how do you optimize?

CPython uses generational garbage collection: objects in generation 0 (new objects) are collected frequently (~240 object allocations trigger Gen0 collection), Gen1 less frequently, Gen2 rarely. Your pattern shows Gen2 collection kicking in as heap grows. When Gen2 is collected, all 2GB is scanned, causing 500ms pause. Solution: (1) tune `gc.set_threshold()`: default is `(700, 10, 10)` for (Gen0, Gen1, Gen2)—increase Gen0 threshold to reduce collection frequency, at cost of higher memory transients. (2) disable automatic GC during critical sections: `gc.disable(); ... critical work ...; gc.collect(); gc.enable()`, (3) use `gc.collect_gen(0)` to collect only new objects, not the whole heap. (4) reduce object churn: cache objects, reuse allocations (e.g., list pooling), use `__slots__` to reduce per-object overhead. Measure with: `gc.get_stats()` to see collection times/counts, `tracemalloc.take_snapshot()` to find allocation hotspots. Profile: `profile.run()` + `pstats` to identify which code creates the most objects. For latency-critical apps, consider "pause-free" strategies: use incremental GC phases, or separate GC to a low-priority thread.

Follow-up: If you disable GC to reduce pauses, how do you guarantee garbage is eventually collected without creating unbounded growth?

A long-running API service shows memory leak: RSS grows from 100MB to 800MB over 48 hours. Heap analysis shows all memory is "reachable" (not garbage), but you can't find a reference cycle. What's likely holding the memory?

If memory is reachable but not obviously leaked, check: (1) global caches—decorators, singletons, or module-level dicts that accumulate entries without eviction. Use `sys.getsizeof()` to measure cache sizes: `@functools.lru_cache(maxsize=128)` caps memory, without maxsize it grows indefinitely. (2) circular imports—module A imports B imports A creates a reference cycle, not garbage-collected if Gen2 doesn't run. (3) C extensions holding buffers—external libraries (numpy, protobuf) may retain memory after use. (4) thread-local storage—`threading.local()` objects stay in memory per thread; if threads are created/destroyed frequently, stale thread-local data accumulates. Debug: use `gc.get_referrers(obj)` to trace what's holding references, `objgraph.show_most_common_types()` to see which types are growing. Use `memory_profiler` to track memory per line: `@profile` decorator on functions, run with `python -m memory_profiler script.py`. Check if GC is even running: `gc.get_count()` returns counts for each generation—if gen2 count never changes, it's not collecting. Force GC: `gc.collect()` to trigger full collection and measure memory delta. For caches, use bounded structures: `collections.OrderedDict` + manual eviction, or `cachetools.TTLCache` for time-based expiry.

Follow-up: If a reference cycle involves objects from C extensions, how do you debug and break the cycle?

You migrate a batch job from sync Python (8 workers, 1.5GB memory each = 12GB total) to async with 1000 concurrent coroutines. Total memory is now 3GB. All coroutines are "parked" waiting for I/O. Why is async so much more memory-efficient, and what's the memory floor?

Async coroutines are lightweight: ~1-2KB per suspended coroutine, vs 8MB per OS thread. 1000 coroutines * 2KB = 2MB overhead, plus application data. Thread version: 8 threads * 8MB stack + heap = 64MB baseline, plus application data. The difference is dramatic. Memory floor: minimum heap for asyncio event loop machinery + application state. For 1000 concurrent operations: baseline event loop (~50KB), coroutine frames (~1MB total), plus per-operation data (DB connection buffer, HTTP response buffer). Most memory is application-level, not framework. Risks with high coroutine counts: (1) event loop scheduler overhead—managing 1000 tasks has cost; performance peaks around 10k-100k coroutines depending on work complexity, (2) event loop latency—if one coroutine is CPU-bound, all 1000 stall. Test scaling: measure throughput at 10, 100, 1000, 10k coroutines; you'll find sweet spot. Measure memory: `tracemalloc.take_snapshot().statistics('lineno')` to see top allocators; most should be application data, not framework. If memory grows unbounded with coroutine count, check for: leaked tasks (create_task without awaiting), accumulated frames in exception tracebacks, or application-level buffers not being freed.

Follow-up: How do you handle the case where an async coroutine throws an exception but the exception traceback pins frames in memory indefinitely?

A service uses weak references extensively: `weakref.ref(obj)` to avoid circular references. After an update, garbage collection drops from 2 events/sec to 0.5 events/sec, but heap size is stable. What changed and is this good or bad?

Reduced GC frequency likely means fewer objects are being allocated (good—less churn), or GC became disabled inadvertently (bad). Check: (1) code changes that reduce object creation (caching, pooling), (2) GC settings: someone called `gc.disable()`, (3) weak references being collected more eagerly—if weak references are cleared, referents become garbage sooner, reducing live object count. Pros: fewer GC pauses, lower CPU. Cons: if you disabled GC entirely, memory will eventually explode when you least expect it. Verify GC is running: `gc.get_count()` should show increasing counts for at least Gen0. If stable, GC is disabled. Check `gc.isenabled()`. Weak references are useful for caches, observers, etc., but don't solve GC—objects are still garbage-collected via reference counting + cycle detection. If heap is stable with lower GC frequency, excellent—you've likely improved efficiency. Monitor over time: run 1-week benchmark comparing memory curve, pause times, throughput. If memory drifts up over days, GC may be insufficient; increase thresholds or explicit gc.collect() calls during off-peak.

Follow-up: How do weak references interact with `__del__` finalizers, and can they cause deadlocks?

A data processing pipeline loads a 500MB DataFrame into memory, processes it, and should release it. Memory is freed, but OS RSS doesn't drop—still shows 500MB allocated. After several cycles, RSS hits process limit. How do you force OS memory reclamation?

Python's memory allocator (malloc/jemalloc) doesn't immediately return freed memory to the OS. When you delete a large object, the heap is marked free but the OS doesn't reclaim it until the process explicitly asks or memory pressure forces it. Solutions: (1) call `gc.collect()` after deleting the object to ensure garbage is collected, (2) use `ctypes.CDLL(None).malloc_trim(0)` (GNU malloc) or `malloc.trim()` to force memory reclamation—not portable across platforms/allocators, (3) move large processing to a subprocess: spawn a worker, process data, exit—OS reclaims all memory when process ends. (4) use memory-mapped files: `mmap` the DataFrame, process without loading into heap. (5) process in chunks instead of loading all 500MB at once. For production: use subprocess workers for batch jobs, let them exit after work completes. Verify: before/after `gc.collect()`, check RSS via `ps` or `resource.getrusage()`. Note: modern allocators (glibc malloc, jemalloc) are optimized for throughput, not memory reclamation—they keep freed memory in thread caches. Configure via env vars: `MALLOC_TRIM_THRESHOLD_` (glibc) controls when memory is returned. For data pipelines with repeated load/process/release cycles, subprocesses are the safest bet.

Follow-up: If you process 1000 items in parallel via subprocesses, each using 500MB, how do you manage resource limits without swapping?

A service experiences unpredictable pauses (100-500ms) during peak traffic. Profiling with `py-spy` shows time is spent in `gc.collect()` but GC threshold is high (set to 10000 for Gen0). These massive pauses happen only at 8am when batch jobs start. Why is GC firing despite high threshold?

Multiple GC triggers: (1) manual `gc.collect()` calls from batch jobs (search for explicit `gc.collect()` in batch code), (2) GC forced by memory pressure—Python automatically triggers full GC when approaching memory limits even if threshold not reached, (3) Gen1/Gen2 threshold hit: Gen0 fires at 10k allocations, but Gen1 fires at Gen0_count * Gen1_ratio (10 by default). If Gen0 collects 1000 times, Gen1 fires, and Gen1 collects all generations. (4) explicit `gc.set_debug(gc.DEBUG_SAVEALL)` which affects GC behavior. Debug: instrument batch job code: `print(gc.get_count())` before/after to see generation counts, `print(gc.get_stats())` to see collection times. Identify manual `gc.collect()` calls: use `grep -r "gc.collect" src/` and audit. Solution: (1) remove unnecessary `gc.collect()` calls (Python collects automatically), (2) increase Gen1/Gen2 thresholds: `gc.set_threshold(10000, 50, 50)` to reduce Gen2 collections, (3) run batch jobs at low priority or off-peak, (4) use GC time windows: `gc.disable()` during peak, `gc.collect()` during off-peak, (5) profile batch job to reduce object churn. For predictable pause times, use `gc.set_debug(gc.DEBUG_STATS)` to log all GC events and tune thresholds based on real data.

Follow-up: How would you design a "GC avoidance" strategy for a real-time system that can't tolerate 100ms pauses?

You're profiling a Django request handler. Each request creates ~500 temporary objects. In tests, memory is stable. In production (100 requests/sec), memory grows 50MB/hour. GC runs every second. Why doesn't production GC work?

Concurrency issue: in tests, requests are sequential; in production, 100 requests/sec = requests overlap. Each request creates objects; if requests execute concurrently, garbage accumulates faster than GC runs. At 100 req/sec * 500 objects/req = 50k new objects/sec, but Gen0 threshold is 700—Gen0 collection fires nearly constantly, spending CPU on GC instead of work. Worse, if requests hold references to each other or to long-lived caches, garbage accumulates until Gen1/Gen2 collection, which is slow. Solutions: (1) increase Gen0 threshold: `gc.set_threshold(10000)` to batch collections, (2) reduce object churn per request: cache objects, reuse data structures, (3) use request-local cleanup: Django middleware to explicitly delete request-local state after response, (4) implement object pooling for frequently-allocated types. Debug: compare GC stats between test (sequential, 1 request at a time) and production (concurrent). Use `locust` or `ab` to simulate load in staging. Measure: `gc.get_stats()` to see collection counts/times at different concurrency levels. Profile: use `py-spy --idle=off` to see where production differs from tests—likely you'll find Django ORM queries creating too many objects, or middleware accumulating state. For Django specifically, check if `connection.queries` is growing unboundedly (it keeps a list of all executed queries for logging).

Follow-up: If you reduce GC threshold to collect more frequently, what's the performance cost, and how do you measure it?

A service uses many small objects (e.g., namedtuples, dataclass instances). Heap fragmentation is severe: malloc reports 50MB free but process can't allocate a contiguous 10MB. Requests start failing with "out of memory" despite 100MB available. How do you fix fragmentation?

Heap fragmentation happens when allocators can't find contiguous free space. Small objects with different lifetimes cause "Swiss cheese" heap: blocks of free space too small to be useful. Python's default allocator is subject to this. Solutions: (1) switch allocator: use `jemalloc` (more efficient fragmentation handling) or `mimalloc`. Set via `LD_PRELOAD=libjemalloc.so python` (Unix) or link Python with jemalloc, (2) object pooling: pre-allocate a pool of objects in one contiguous block, reuse instead of allocating/freeing. For namedtuples: create 100 instances up-front, store in list, hand out from pool, (3) use `array` module or `ctypes` for fixed-size buffers instead of Python objects—much lower fragmentation, (4) enable `MALLOC_MMAP_THRESHOLD` (glibc): large allocations use mmap instead of malloc, reducing fragmentation. For Python 3.13+, use free-threaded Python which may have better allocators. Debug fragmentation: `dmalloc` or `valgrind --tool=massif` to visualize heap layout. Measure: calculate `free_memory / fragmentation_ratio`; if ratio > 2x, fragmentation is severe. For services with stable memory usage, object pooling is proven effective. Pre-allocate at startup: create objects in a pool, reuse via get/put, never delete/recreate.

Follow-up: How do you implement a thread-safe object pool that doesn't itself become a bottleneck or leak objects?