Python Interview — Import System Internals

Large application has 500+ modules. Import time at startup is 2 seconds. Each module imports others, creating dependency DAG. How do you find and optimize import bottlenecks?

Python imports are sequential: module A imports B, which imports C. Parsing/compiling each module takes time. Solutions: (1) profile imports: `python -X importtime app.py 2>&1 | grep "import time" | sort -k2 -rn` shows slowest. Optimize top 10. (2) lazy imports: import inside functions that need it, defer until first use. (3) remove circular imports: if A imports B imports A, Python loads A twice—restructure. (4) move heavy __init__.py code to functions: don't run expensive_function() at import time. (5) use pypy (alternative Python) which has faster import. Measure: `python -X importtime app.py` logs all imports + times. For 2 second startup: profile, find top 3 slow imports, lazy load them. Common culprits: scipy, sklearn, tensorflow (auto-execute __init__ with heavy computation). If using: import inside functions.

Follow-up: How do you implement lazy module initialization without breaking circular dependencies?

You use lazy imports for expensive libraries inside functions: `def analyze(): import scipy`. First call to `analyze()` takes 1 second (import). Subsequent calls are instant. In production, is this acceptable?

Lazy imports defer overhead; first call takes longer. Trade-off: either pay import time at startup or first call. For HTTP servers: first request has 1s latency (bad for SLA). For CLI tools: first run waits 1s (acceptable if infrequent). Solutions: (1) prewarm in background: start import on separate thread before first call needed. (2) cache after import: subsequent calls are instant (Python caches modules in sys.modules). (3) measure: if first call to analyze() is rare (once per hour), 1s acceptable. If common, preload at startup. (4) use importlib.import_module() to control when imports happen. Best: lazy import for library-like code (conditionally used), preload for application-critical paths. For production: profile to find if first-call latency impacts users. If accept rate <1%, lazy import OK. If 10%+ of requests hit first call, preload or accept startup time.

Follow-up: How do you prewarm lazy imports without blocking server startup?

Module A imports B at module level, B imports A at module level (circular import). Python loads both, but attributes are undefined (not set yet), causing AttributeError. How do you fix?

Circular imports at module level create partial initialization. When A imports B imports A, A is not fully initialized (code after `import B` hasn't run yet). Solutions: (1) defer imports to function scope: in A, define function that imports B only when needed (function calls are lazy). (2) restructure: move shared code to module C, both A and B import C (no circle). (3) import at end of module: move `import B` to bottom of A after all definitions, so A is mostly initialized when B imports A. (4) use TYPE_CHECKING guard: `if TYPE_CHECKING: import B` for type hints only (not runtime). (5) split module: if A has X and Y, move Y to separate module Y_mod, B imports Y_mod (breaks circle). Best: avoid circular imports entirely via good design. If unavoidable: defer to function scope.

Follow-up: How do you detect circular imports automatically in CI?

You add a module to sys.path dynamically: `sys.path.insert(0, '/custom/path')`. After adding, `import custom_module` works. But later, removing from sys.path and re-adding doesn't re-import (still uses cached module). How do you force reimport?

Python caches modules in sys.modules. Removing from sys.path doesn't invalidate cache; old module still accessible. Solutions: (1) explicit reload: `importlib.reload(module)` reimports from current sys.path. (2) remove from sys.modules: `del sys.modules['module_name']` clears cache, next import re-does. (3) combined: `del sys.modules[name]; importlib.import_module(name)` guarantees reimport. (4) for development: use `-B` flag to skip bytecode cache, or importlib tools. Best: if path changes, explicitly reload. For production: paths should be static; dynamic path changes are rare. Test: verify correct module is loaded after path change: `print(module.__file__)` should show new path.

Follow-up: How do you safely reload modules without breaking application state?

Namespace packages (PEP 420) allow split modules across directories. After adding module to new directory in Python path, import fails. What's missing?

Namespace packages don't have __init__.py. Solutions: (1) ensure directory is in sys.path: `sys.path.insert(0, '/path/to/package_dir')`. (2) Python 3.3+: namespace packages auto-detected (no __init__.py needed). (3) if using older Python or regular packages, add __init__.py to each directory. (4) test: `python -c "import package; print(package.__file__)"` should work. If AttributeError or ImportError, path issue. For namespace packages: verify Python 3.3+, directories in sys.path, no __init__.py files (they would make it regular package). Best: use regular packages (__init__.py) for clarity unless namespace packages are explicitly needed.

Follow-up: When should you use namespace packages vs regular packages?

Application imports module X which has side effects (prints "Loading X", modifies global state). In tests, importing X multiple times executes side effects multiple times. How do you isolate tests?

Module-level code (not inside functions/classes) runs on import, once per process. In tests, if same module imported in different tests, side effects happen once per test. Solutions: (1) avoid module-level side effects: move to __init__() or explicit initialize() function. (2) test isolation: each test runs in separate process (pytest plugin `xdist` does this). (3) clear sys.modules before test: `del sys.modules['module_x']` so next import re-runs code. (4) mock side effects: mock print, global state modifications in tests. Best: don't have module-level side effects. If necessary, encapsulate in explicit setup/teardown functions. For testing: use subprocess-based isolation (each test fresh Python) if side effects are unavoidable.

Follow-up: How do you design modules to be test-friendly and avoid side effects?

Using importlib to dynamically load plugins. After loading 1000 plugins, memory grows significantly. Are loaded modules cached forever?

Modules are cached in sys.modules indefinitely (until explicitly removed). 1000 modules * 100KB each = 100MB cached. Solutions: (1) accept memory cost: 100MB for 1000 modules is reasonable. (2) if modules are temporary, explicitly remove: `del sys.modules['module_name']`. (3) memory-map if possible: keep only hot modules in memory, lazy-load others. (4) use separate processes for plugin isolation: each process has own sys.modules, can be killed. Best: understand that modules are per-process singletons. For 1000 plugins: if memory is issue, consider separate plugin processes or lazy loading (load on first use, unload on timeout).

Follow-up: How do you implement plugin unloading without breaking references?

staticmethod/classmethod decorators not working correctly on imported modules: after dynamic reload, staticmethod is lost. How do you preserve decorators across reloads?

Decorators are applied at class definition time. On reload, classes are re-created, decorators re-applied. If reload incomplete, old class lingersaches. Solutions: (1) full reload: use `importlib.reload()` which re-executes module code entirely. (2) clear cache: `del sys.modules[name]` ensures fresh import. (3) test decorators after reload: `assert isinstance(Class.static_method, staticmethod)` to verify. (4) for production: avoid reloading modules (reload is mainly for development). Best: test reloading in dev environment thoroughly. For production: restart process for clean slate, don't attempt live reload.

Follow-up: How do you implement safe live-reloading for development without losing application state?