Python Interview — Metaclasses and Class Creation

A microservice uses a custom metaclass to auto-register route handlers. At startup, all models are instantiated dynamically via `type()` to create 500+ classes. Introspection shows class creation takes 5 seconds, blocking service startup. How do you optimize?

Metaclasses run `__new__` and `__init__` per class—at 500 classes, that's 1000 function calls. Solutions: (1) lazy class creation—don't create all classes at startup; create on-demand when a route is accessed. Use `__getattr__` at module level to intercept class access. (2) cache the metaclass: compute class dict once, reuse via `type()` caching. (3) batch class creation: use `types.new_class()` instead of `type()` which handles descriptor setup more efficiently. (4) move registration to module import time (already lazy in Python) instead of explicit startup—when you `import models`, classes are auto-registered incrementally. Benchmark: `timeit` per 100 classes to find bottleneck. Profile with `cProfile` to see which metaclass method (`__new__`, `__init__`, `__call__`) takes time. For 500 routes, consider route tables (dict of name->handler) instead of dynamic classes—simpler and vastly faster. If metaclasses are necessary, minimize `__new__` logic: move complex initialization to `__init__`. Test in staging with real class count to ensure startup meets SLA (<1 second).

Follow-up: If metaclass `__new__` modifies the class dict, how do you ensure subclasses inherit the modifications correctly?

You're implementing an ORM using metaclasses. Each model field (Column, ForeignKey) is a descriptor. After defining 100 models with 20 fields each, introspection (`inspect.getmembers(Model)`) takes 2 seconds per model. How do you cache metadata without stale data?

Descriptor lookups are expensive: `inspect.getmembers()` walks the MRO and calls `__get__` on each descriptor. At 100 models * 20 fields = 2000 descriptors, introspection is O(n). Solutions: (1) cache metadata in metaclass `__new__`: store field list in `_fields = [...]` on class creation, reuse without introspection. (2) use `functools.cached_property` on the model to cache expensive introspection results—cache is per-instance but metadata is class-level (use class descriptor instead). (3) implement `__getattr__` to defer descriptor evaluation until actually accessed, not during introspection. (4) for frameworks, cache metadata at module load time via a registry: `registry['Model'] = {'fields': [...], 'indexes': [...]}`. Keep cache in sync: after model modification, invalidate cache or use version stamps. Test: measure introspection time before/after caching. If 2s per model is unacceptable, cache is necessary. For ORMs, SQLAlchemy caches metadata effectively via `declarative_base()`—study how it works. Pitfall: if cache is global and models change (hot reload), cache is stale. Use cache keys with version numbers or TTL. For production, assume models are static (defined at startup), so caching is safe.

Follow-up: How do you implement thread-safe cache invalidation when models are modified dynamically?

A metaclass implements schema validation: when a model is defined, metaclass validates all fields against a schema. Validation is complex (100 checks per field), and startup time with 100 models is 10 seconds. Validation results don't change. How do you avoid repeated validation?

Validation in metaclass `__new__` runs for every class definition. If validation is deterministic (same input = same output), cache results by input signature. Solutions: (1) compute a hash of field definitions: `schema_hash = hashlib.sha256(json.dumps(sorted(fields)).encode()).hexdigest()`, check cache before validation. (2) skip validation in production if schemas are trusted (validated during development only). Use `if __debug__:` guard—removed by Python optimizer in `-O` mode. (3) split validation: move complex checks to test suite (run once per CI), metaclass only does essential checks (fast). (4) lazy validation: don't validate in metaclass; defer to first model instantiation or explicit validation call. (5) parallel validation: if you have 100 models, validate in parallel using `multiprocessing.Pool` or `concurrent.futures.ProcessPoolExecutor`. For 10 seconds: likely 90% is validation. Measure with `cProfile`. If validation is the bottleneck, aggressive caching is justified. For production: assume schemas are validated during development; runtime metaclass should be minimal (registration only, no validation). Test: ensure schema changes are detected (hash changes), triggering revalidation.

Follow-up: How do you handle schema migrations when cached validation results are invalidated?

You're using a metaclass that tracks all instances of a class via `__call__`. The registry grows to 10M instances. Introspection on the class (e.g., `Model.instances`) now takes 500ms (iterating 10M objects). Memory is also high due to holding references. How do you scale the registry?

Holding 10M instance references keeps them alive, preventing garbage collection. Solutions: (1) use weak references: `self._instances = weakref.WeakSet()` so instances can be garbage-collected when no external references exist. Registry automatically shrinks as instances die. (2) implement LRU cache instead of unbounded registry: `from functools import lru_cache` on instance creation, limit to last N instances. (3) move registry to external database: instead of in-memory list, store instance IDs (integers, cheap) in Redis or SQLite. Query is O(1) via hash, not O(n) iteration. (4) implement batched iteration: if you need to iterate instances, do it in chunks with limits (`Model.instances[0:1000]` returns first 1000, next iteration asks for next batch). (5) use `__slots__` on instances to reduce per-object memory. At scale (10M objects), weak references + external database is standard. Test: ensure weak references work (instances are deleted), registry shrinks. Measure memory before/after weak refs—should drop dramatically. For introspection, benchmark iteration time: if 500ms is unacceptable, batch or database is necessary. Consider if you actually need a global instance registry—most applications don't.

Follow-up: How do you implement a weak reference registry that safely handles instances that define `__del__` finalizers?

A metaclass dynamically inherits from multiple base classes to mix in behavior. With 50 mixin classes and 10 model classes using different combinations, the MRO (Method Resolution Order) becomes complex. Debugging shows method calls are sometimes using wrong base class implementations. How do you ensure correct MRO?

Multiple inheritance MRO is determined by C3 linearization algorithm. With many mixins, MRO is non-obvious. Solutions: (1) inspect MRO explicitly: `print(Model.__mro__)` to see actual resolution order. Python will raise `TypeError` if MRO is ambiguous, so invalid MRO is caught early. (2) test method resolution: for each method called, verify it comes from expected base class: `Model.method.__module__` should match expected module. (3) use `super()` consistently—allows subclasses to properly override methods without hardcoding parent class names. (4) limit mixin depth: instead of 10 levels of inheritance, limit to 2-3 levels; combine multiple concerns into fewer base classes. (5) use composition over inheritance: instead of mixing in behavior via inheritance, use composition—pass behavior objects to __init__. For debugging: add instrumentation to metaclass: `print(f"MRO for {cls.__name__}: {[c.__name__ for c in cls.__mro__]}")` at class creation time. Test: write unit tests that verify method resolution (e.g., `assert Model.method_name == ExpectedClass.method_name`). Pitfall: `super()` is subtle in multiple inheritance; always test with multiple inheritance to catch errors. Best practice: avoid multiple inheritance if possible; if necessary, use well-known patterns (mixin hierarchy where each mixin doesn't inherit from others).

Follow-up: How do you implement mixins that safely compose without requiring specific MRO ordering?

You're implementing a plugin system using metaclasses. Plugins register themselves by inheriting from `BasePlugin`. After loading 100 plugins, the namespace is polluted: `vars(BasePlugin)` shows 100 entries. Accessing a non-existent plugin attribute returns a default value instead of raising AttributeError, making bugs harder to spot. How do you structure this?

Using class namespace for plugin registration is anti-pattern; it pollutes class scope and hides errors. Solutions: (1) use explicit registry (dict): `PLUGINS = {}; class BasePlugin: @classmethod def register(cls): PLUGINS[cls.__name__] = cls`. Access plugin via `PLUGINS['PluginName']` instead of `BasePlugin.PluginName`. (2) use a descriptor for plugin access: implement `__getattr__` to look in registry only when attribute is explicitly requested, not as class attribute. Raise AttributeError if not found. (3) separate registration from inheritance: don't require plugins to inherit from BasePlugin; use a registration decorator: `@register_plugin('name') class MyPlugin: ...` which adds to registry without inheritance. (4) implement `__getattribute__` to only allow known plugins, raising AttributeError for unknown ones. This preserves error visibility. For 100 plugins: explicit registry (dict) is best practice. Access is O(1) hash lookup, namespace is clean, errors are visible. Test: verify that accessing non-existent plugin raises AttributeError (not returns None or default). Measure: ensure plugin loading (<1 second for 100 plugins). For hot-reloadable plugins, invalidate registry cache when plugins are reloaded.

Follow-up: How do you implement a plugin system that supports dynamic loading/unloading without stale references in production?

Your data validation metaclass adds `__init_subclass__` hooks to models. When a user defines a model with a typo in a field name, the hook should catch it and raise a clear error. But currently, typos are silently ignored. How do you make errors visible?

`__init_subclass__` is called when a class is subclassed, allowing validation at definition time. To catch typos: (1) maintain a list of allowed field names and validate against it: `if field_name not in VALID_FIELDS: raise ValueError(f"Unknown field: {field_name}")`. (2) use `__dataclass_transform__` (Python 3.11+) to enable IDE autocompletion, which catches typos during development. (3) use strict mode: `raise_on_unknown_field = True` in metaclass config—any unknown field raises error. (4) implement `__getattr__` on metaclass to raise AttributeError for unknown fields (this applies to the class itself, not instances). For typos to be visible: (a) run tests with `-W error` (convert warnings to errors), (b) use static analysis (mypy, pylint) to catch unknown attributes, (c) enable IDE linting (pylint in VSCode). Best practice: use `dataclass` or `attrs` with `slots=True` and strict field definitions—these catch typos automatically. For custom metaclass: use `__init_subclass__` to validate and raise exceptions early (at class definition time, not runtime). Test: define a model with a typo field, verify that exception is raised during class definition (not later during instantiation).

Follow-up: How do you provide actionable error messages (e.g., "Did you mean field 'name'?") for typos in metaclass validation?

A metaclass enables dynamic method generation: for each database table, a metaclass generates `get_by_()` methods. With 500 tables and 50 fields per table, metaclass execution creates 25k methods. Memory footprint is high, and class creation is slow. How do you generate these methods efficiently?

Generating 25k methods at class creation time is expensive (function objects have overhead). Solutions: (1) lazy method generation: use `__getattr__` to generate methods on first access, cache result. First access to `get_by_name()` generates it and caches; subsequent calls use cached method. (2) use a single method with arguments instead of N methods: `get_by(field, value)` instead of `get_by_(value)`. Much simpler, single method object. (3) use functools.partial for method generation: instead of creating N method objects, create one method and bind via `partial`. Lightweight, but still per-method objects. (4) use lambda factories but with care: `lambda f: lambda x: self.get_by(f, x)` creates function objects efficiently. (5) use C extensions (Cython) for method generation—compilation is fast, memory overhead is minimal. Benchmark: measure memory before/after (25k methods should add <10MB if using `__slots__`). Profile: measure class creation time (should be <100ms for 500 tables). For practical purposes: lazy `__getattr__` is proven effective. First call to `get_by_name()` generates method, caches in instance `__dict__` (or class). Subsequent calls use cached method. This defers overhead until actually needed. Test: measure memory at startup (lazy), then access a method, check that memory grows slightly (single method object added). Compare to eager generation (memory high at startup). For 25k methods: lazy is 10-100x faster.

Follow-up: How do you cache generated methods safely when they depend on runtime state that might change?