Python Interview — Generators, Iterators, and yield from

A data pipeline yields 1M items/sec through chained generators: producer -> transformer -> consumer. Context switches between generators cause CPU overhead. How do you optimize?

Generators context-switch on every yield (~1µs overhead). At 1M yields/sec, switching dominates. Solutions: (1) batch processing—yield 100 items per yield instead of singles (reduces switches 100x). (2) use `yield from` (PEP 380)—CPython optimizes to direct chain, no intermediate frames. (3) profile with `py-spy` to confirm switching is bottleneck. (4) inline loops if data fits memory—no yields, no switching. (5) use numpy for bulk operations—C code, no Python switching. Measure: `timeit` single-yield vs batched. Batch 10-100 items typically balances latency vs throughput. For streaming: generators are necessary; batching is proven fastest.

Follow-up: How do you implement a generator that supports both pull (consumer next()) and push (producer feeds)?

A generator function uses `yield` inside a try/except. Generator cleanup on exception is unreliable—`finally` blocks sometimes don't execute. How do you ensure cleanup?

Generator cleanup is complex: `finally` executes when generator is garbage-collected or explicitly closed. On early exit (exception), cleanup may be deferred. Solutions: (1) use `contextlib.contextmanager` to wrap generator—ensures cleanup via context manager protocol. (2) explicitly call `generator.close()` in finally block: `try: ... finally: gen.close()`. (3) use `try/finally` inside generator body—cleanup code runs before yield returns. (4) implement `__del__` but be careful—finalizers can be delayed. Best: contextmanager + explicit close guarantees cleanup.

Follow-up: How do you implement generator cleanup that works across multiple levels of nesting?

You use `itertools.chain()` to concatenate 100 generators. After 10M iterations, memory is stable but CPU is high—profile shows time in chain bookkeeping. How do you optimize?

`chain()` adds overhead per-item: it checks which sub-generator is active. Solutions: (1) merge generators at source (don't chain 100 generators, combine their logic into 1). (2) use `itertools.chain.from_iterable()` for list of generators—slightly more efficient. (3) profile to confirm chain overhead (use `py-spy --idle=off`). If CPU is high, chain is likely culprit. (4) batch chain outputs—collect N items, yield batch. Reduces chain calls N times. Measure: time chain separately to quantify overhead. For 100 generators * 10M items: if chain adds 5% overhead, optimization is low-priority.

Follow-up: How do you implement a merged generator that efficiently handles generators of different speeds?

A generator produces 1k items/sec. Consumer is slow (100 items/sec). Generator queues fill, consuming memory unboundedly. How do you implement backpressure?

Without backpressure, fast producer causes slow consumer to lag, memory grows. Solutions: (1) use `queue.Queue` with maxsize—producer blocks when queue is full. (2) implement `asyncio` with `asyncio.Queue` and `await put()`—non-blocking backpressure. (3) use multiprocessing with bounded queue. (4) yield with rate limiting: `time.sleep()` after yields to slow producer. (5) implement "pull-based" architecture—consumer pulls from producer, not push. Best: asyncio or thread pool with bounded queue. Measure: queue depth over time. If unbounded growth, backpressure missing.

Follow-up: How do you handle backpressure when producer and consumer are in different processes?

You use `itertools.tee()` to split a generator into N independent consumers. After consuming 1M items, memory grows 100MB per tee'd generator. Is this normal?

`tee()` buffers items in memory so independent consumers can iterate at their own pace. If consumers are at different positions, buffered items accumulate. Solutions: (1) if consumers don't need independent iteration, don't use tee—consume once, store results. (2) if tee is necessary, ensure consumers progress at similar rate—if one lags, buffer explodes. (3) limit tee depth: `n` parameter to tee limits number of independent consumers. (4) use external storage (queue, file, database) instead of tee. Measure: monitor memory during tee usage. If 100MB/1M items = 100 bytes per item, that's typical. If grows unboundedly, one consumer is slow.

Follow-up: How do you implement efficient tee for multiple consumers without exponential memory growth?

A generator filters items (1M items/sec). Performance test shows 40% CPU time is in the generator function itself. Flamegraph shows time in `yield` statement. Can you optimize yield overhead?

Yield overhead is unavoidable in generators (context save/restore). Can't optimize yield itself. Solutions: (1) use list comprehensions instead if data fits memory—no yield overhead. (2) move complex logic outside generator—generator only yields, application filters. (3) use numpy/pandas for bulk operations—compiled code. (4) implement in Cython or C extension—compiled yield is 10x faster. (5) profile more carefully—"time in yield" might include filter logic executed before yield. Use `py-spy` to sample actual hot lines. For 40% in yield + filtering: likely the logic is expensive, not yield. Measure: time filter independently. If logic is 95% of time, optimize logic.

Follow-up: How do you profile generator efficiency to distinguish between yield overhead and application logic?

You implement a generator that reads from a file in 1MB chunks. After reading 10k chunks (10GB file), seek position is correct, but generator sometimes returns duplicate chunks. How do you debug?

Seek issues cause duplicates: if generator yields, saves state, resumes at wrong position, duplicates appear. Solutions: (1) log chunk offsets: before yielding, log file position and chunk data. If duplicate appears, positions should be different but data is same—seek bug confirmed. (2) ensure file handle state is preserved: file position changes must be explicit. (3) use context manager for file handle: `with open(file) as f: for chunk in generator(f): ...` ensures cleanup. (4) test edge cases: file size not multiple of chunk size, seek at EOF, mid-chunk error. For file reading: generators are safe if seek state is maintained. Test: read same file twice, verify chunks match.

Follow-up: How do you safely share a file handle across multiple generators without seek conflicts?

A recursive generator traverses a tree (1M nodes). At depth 100, recursion limit is hit. Using `sys.setrecursionlimit()` is risky. How do you safely handle deep recursion?

Recursion limit (default 1000) prevents stack overflow. Increasing limit is risky. Solutions: (1) convert to iterative with explicit stack: maintain a list of nodes to process, pop/process instead of recursing. (2) use `yield from` (PEP 380)—CPython optimizes tail recursion in generators. (3) limit tree depth—ensure app doesn't create trees deeper than safe limit. (4) increase recursion limit carefully: `sys.setrecursionlimit(5000)` if depth is known and stack is available. Check stack: `resource.getrlimit(resource.RLIMIT_STACK)`. (5) use itertools or deque for explicit stack. Best: iterative approach is most robust. Test: generator at various tree depths, ensure no crashes.

Follow-up: How do you convert recursive generators to iterative while preserving output order?