Python Interview — Bytecode, Compilation, and Execution

Production Scenario Interview Questions

Your team's microservice processes 100K requests/sec. A .pyc file becomes corrupted in production, causing bytecode validation failures across 5000 workers. The corruption happened mid-deployment. How do you diagnose and recover?

The Python runtime validates bytecode headers (magic number, timestamp/hash) before execution. Corrupted .pyc files fail with ImportError or bytecode validation errors. Recovery: (1) Clear __pycache__ directories across all workers (rm -rf), (2) restart workers to force recompilation from .py sources, (3) pre-compile bytecode in CI/CD with py_compile.compile() to catch corruption early, (4) use hash-based .pyc files (PEP 552) for immutable validation—Python 3.7+ uses reproducible hashes instead of timestamps, immune to clock skew. (5) Monitor __pycache__ size and staleness; implement bytecode versioning in deployment pipelines.

Follow-up: How does importlib.util.cache_from_path() interact with __pycache__ structure? What's the performance difference between compile-on-import vs. pre-compiled bytecode in high-frequency dynamic imports?

Your data pipeline imports 2000+ dynamically-named modules from S3 at runtime. Bytecode caching exploded to 8GB across workers. CPU profile shows 60% time in pyximport and compile(). How do you optimize?

Root cause: recompiling identical code repeatedly, no bytecode reuse. Solutions: (1) Cache compiled bytecode in shared Redis/Memcached keyed by (source_hash, python_version, optimization_level), (2) Use importlib.machinery.ModuleSpec + FrozenImporter to pre-freeze bytecode, (3) implement compile-once-load-many: compute source hash, check cache before compile(), store with pickle/marshal, (4) disable __pycache__ for dynamic modules with PYTHONDONTWRITEBYTECODE and manage manually, (5) use compileall.compile_dir() in initialization to pre-compile, not per-import. (6) Profile with dis.dis() to ensure bytecode is reusable—closures and late bindings can prevent cache hits.

Follow-up: How do code objects (accessed via func.__code__) behave across process boundaries? Can you marshal bytecode safely across workers with pickling?

Your platform runs untrusted Python code submitted by users. You need bytecode inspection to prevent malicious sys.exit(), exec(), or file I/O. How do you audit bytecode at load time?

Use the dis module to inspect bytecode opcodes before execution. (1) Extract code object from compiled bytecode with compile() + __code__, (2) iterate co_consts (nested code objects in functions/classes) recursively, (3) scan for dangerous opcodes: IMPORT_NAME, LOAD_GLOBAL (for sys/os), CALL_FUNCTION with dangerous callables, (4) blacklist specific opcodes with dis.hasarg analysis. Stricter: use RestrictedPython or compile with compile_restricted() to inject runtime guards. For advanced cases, implement a custom sys.modules import hook via importlib.abc.MetaPathFinder to validate bytecode before installation. Note: bytecode inspection isn't foolproof—obfuscation can hide intent—pair with runtime sandboxing (seccomp, pledge) or containers.

Follow-up: How does the co_freevars tuple in code objects affect bytecode sandboxing? Can closure variables leak sensitive data across audit boundaries?

Your CI/CD pipeline builds Python wheels for 15 target environments (py3.9-3.13, x86/ARM, Linux/macOS/Windows). Build times spike when __pycache__ optimization levels differ. How do you standardize bytecode across environments?

Bytecode is version-specific (magic number encodes Python version). Python 3.11+ supports PEP 671 (universal wheels with bytecode compatibility). Best practices: (1) compile all wheels with identical -O flag (0=none, 1=-O, 2=-OO), encoded in wheel filename, (2) use sys.flags.optimize to lock optimization level at runtime, (3) pre-compile in CI with python -m compileall --optimize=1 wheel_dir, (4) strip optimization-sensitive bytecode markers with custom build hooks, (5) test bytecode cross-compatibility with importlib.util.find_spec() on all target versions before release. For legacy targets, pin Python versions in CI; for modern: use universal wheels (3.13+) that ship bytecode for multiple versions in __pycache__/.

Follow-up: How does PYOPTIMIZE interact with compile() optimization levels? What's the performance delta between -O and -OO in production workloads?

Your monorepo has 50K Python files. At import time, you're loading bytecode from a network mount (NFS). File stat() calls are killing latency. How do you architect bytecode delivery for fast cold starts?

The problem: Python's import system validates .pyc freshness on every load via timestamp/hash comparison. NFS round-trips compound. Solutions: (1) Pre-stage bytecode to local disk on container startup with rsync/http, (2) use .zip imports (zipimport module) to bundle __pycache__ as a single .zip file on disk—reduces NFS calls 50K→1, (3) pre-compute bytecode hashes (PEP 552) in CI and embed in wheel metadata to skip validation, (4) use zipimport with bytes=True to load bytecode directly into memory, (5) implement importlib cache layer with functools.lru_cache wrapping __import__, (6) containerize with bytecode baked in at build time—don't compile at runtime. For extreme cases: use frozen-imports (via importlib.machinery.FrozenImporter) to bundle bytecode as C data in a statically-linked Python binary.

Follow-up: How does zipimport interoperate with importlib.resources? Can you layer bytecode caching over both?

You're shipping a Python library that must work on CPython, PyPy, and Jython. Bytecode formats are incompatible. How do you deliver multi-runtime compatible packages?

Different runtimes have different bytecode formats and magic numbers (CPython uses MAGIC from importlib.util.cache_from_path(); PyPy has its own; Jython bytecode is Java). Don't ship runtime-specific bytecode in wheels. Instead: (1) distribute only .py source files, let each runtime compile at install time, (2) use environment markers in setup.py to exclude bytecode from wheels (exclude_dirs=[__pycache__]), (3) set PYTHONDONTWRITEBYTECODE=1 in test/CI to force source loading, (4) if you must ship bytecode, create runtime-specific wheel variants (e.g., wheel-pypy-3.9, wheel-cpython-3.11) with separate __pycache__ directories. Modern approach: PEP 517/518 build backend (pyproject.toml) lets you customize wheel content per target. For universal support, always include .py source—bytecode is an optimization, not a requirement.

Follow-up: How does importlib.machinery.ExtensionFileLoader handle bytecode for .pyd/.so extensions? Can you interleave bytecode and native code in a single wheel?

Your platform must support live code reloading (hot reload) for DevEx. Reloading triggers re-import, but bytecode caches the old code object. Users report stale behavior. How do you force bytecode invalidation on reload?

Bytecode caching prevents re-execution of module-level code. For hot reload: (1) clear sys.modules[module_name] before reimport to force fresh bytecode validation, (2) use importlib.reload() which invalidates cached bytecode, but doesn't clear class definitions—old class objects persist. For clean reload, (3) manually invalidate .pyc with os.remove(bytecode_path), (4) use importlib.invalidate_caches() to clear all import machinery caches before reimport, (5) set __pycache__ cleanup hooks in watchdog listeners, (6) for development, set PYTHONDONTWRITEBYTECODE=1 to disable bytecode altogether. Advanced: use importlib.util.spec_from_file_location() with force=True to bypass cache entirely. Note: bytecode reloading doesn't update class definitions in live instances—you need object.__setattr__ hacks or use a framework like Werkzeug that monkey-patches classes at reload time.

Follow-up: How does importlib.reload() interact with circular imports? Can you safely reload a module if it's already in a dependent's namespace?

Your team ships a Python 3.12 codebase with aggressive sys.flags.optimize=2 (-OO). Docstrings and assertions are stripped. Staging crashes with "missing docstring in API spec." How do you preserve docs while keeping bytecode optimizations?

-OO optimization strips __doc__ and removes assert statements, breaking introspection-based tooling (docs, specs, type validation). Solutions: (1) decouple docstrings from code: store in external .yaml/.json metadata files, load at import time via importlib.resources, (2) use __doc__ alternatives: store in class/function attributes (__doc__ before compile, frozen as __wrapped__.__doc__), (3) disable -OO for API/spec modules, only apply -O for performance-critical paths (compute kernels), (4) use environment-specific builds: development with -O0 (full docs), production with -OO for speed, stage with -O (compromise), (5) pre-extract and embed metadata: scan AST before compilation, store as module-level constants that survive -OO, (6) lazy-load docs: use lazy loading with importlib_resources for docs, keep them separate from bytecode. Recommended: only use -OO for genuinely hot code (data processing); keep -O for general application code to preserve docstrings for introspection.

Follow-up: How does ast.get_docstring() interact with compile() optimization flags? Can you extract and store docstrings at AST level before bytecode compilation?