Python Interview — Data Model and Dunder Methods

Production Scenario Interview Questions

Your ORM framework needs to support dynamic attribute access on 10K+ model instances per request. Engineers are implementing __getattr__ for lazy-loading relationships. Response time degrades 40% after rollout. What's the performance killer?

The problem: __getattr__ is called on every attribute miss, and without proper caching, repeated database lookups occur. __getattr__ (and __getattribute__) execute on every undefined attribute access—no bytecode optimization possible. Under load, this becomes a bottleneck. Solutions: (1) Cache resolved attributes in __dict__ after first access, so subsequent lookups hit __getattribute__ (faster), (2) use __slots__ to prevent dynamic __dict__ and reduce memory overhead per instance, (3) implement descriptor protocol (__get__/__set__) for lazy properties instead of __getattr__—descriptors are optimized at C level, (4) defer lazy loading: batch queries instead of per-instance lookups, (5) use __getattr_cache__() pattern with functools.lru_cache for resolved relationship keys. Benchmark: __getattr__ → 500ns per miss; descriptor → 50ns; __dict__ cache hit → 10ns. For 10K instances with 5 lazy fields, that's 250ms difference. Use dis.dis() on __getattr__ to verify no costly operations; move I/O to initialization or batch queries.

Follow-up: How does __getattr__ interact with slots? If a class defines __slots__ without __dict__, does __getattr__ still trigger? What about descriptor protocol precedence?

Your team implements a custom container: Counter-like class with __getitem__, __setitem__, __len__. Users report that len() returns 0 even after adding items. Iteration breaks too. What's the bug?

The problem: implementing __getitem__ doesn't automatically enable len() or iteration—you must also implement __len__ (and __iter__ for iteration support). Python's data model doesn't infer __len__ from __getitem__—each special method is independent. The class needs: (1) __len__() to return count, (2) __iter__() for iteration (or __getitem__ with integer indexing from 0), (3) __contains__() for `in` operator, (4) __bool__() to override len()-based truthiness. Correct implementation: class Counter(dict) or explicitly override __len__ returning len(self.items). Gotcha: if __len__ returns 0, bool(obj) is False, so `if counter:` fails. If you only implement __getitem__ and __iter__ but forget __len__, iteration works but len() fails. Python's collections.abc.Sequence/Mapping enforce these contracts at ABC level—inherit from them to catch missing methods early.

Follow-up: How does the collections.abc hierarchy enforce dunder method contracts? Can you have a valid __getitem__ without __iter__?

Your codebase has deeply nested data models: Customer → Orders → Items → Prices. Engineers use __repr__ for debugging, but reprs are 500KB+ strings. Memory profiler shows repr() calls consuming 2GB. How do you optimize?

The problem: __repr__ is called recursively on nested objects, generating massive strings. Naive implementation: __repr__ returns str(self.__dict__) which recursively calls __repr__ on all attributes. For deep structures, this explodes. Solutions: (1) implement bounded __repr__ with max_depth parameter and ellipsis for deep objects: repr(obj) → "Customer(...)" instead of full recursion, (2) use __repr_format__() helper with depth tracking to limit nesting, (3) lazy repr: cache repr strings in __dict__ to avoid recomputation, (4) for debugging-only: use lightweight __repr__ returning just id/type, defer to __str__ for detailed output, (5) implement repr_compact() variant for logging systems that need readable but bounded output. Example: `def __repr__(self): return f"{self.__class__.__name__}({self.id})" if self.__depth > 3 else f"...{type(self).__name__}"`. For monitoring systems, use sys._getframe() depth checks or threading.local() to track recursion depth and bail out early. Test with memory_profiler to verify repr doesn't allocate gigabytes.

Follow-up: How does __repr__ interact with gc cycles? If __repr__ references self, can circular refs leak memory? Does str() call __repr__ or __str__?

Your distributed cache client uses __hash__ on user-provided model objects to shard across 10K nodes. Cache misses spike because __hash__ returns different values on each call. What's wrong?

The problem: __hash__ must be deterministic—same object must return same hash every call. If __eq__ is implemented but __hash__ isn't, Python 3+ sets __hash__ = None (unhashable). If __hash__ is implemented but references mutable attributes, hash changes when object mutates, breaking cache keys. Rules: (1) if implementing __eq__, must implement __hash__, (2) hash must be constant for object lifetime, (3) if two objects are equal (__eq__ returns True), they must have the same hash, (4) never use mutable fields (lists, dicts) in __hash__, (5) hash only immutable fields or compute hash once at init and cache it. Debugging: print(hash(obj)) before/after mutation to catch hash changes. For distributed systems, hash must be stable across serialization/deserialization—use frozenset/tuple for composite keys, not lists. Testing: use -R flag to randomize hash seeds, ensure hashes remain consistent. For custom objects, implement __hash__ as hash(tuple(immutable_fields)). If object is mutable, don't implement __hash__ at all (leave as unhashable).

Follow-up: What's the relationship between __eq__ and __hash__? If you implement __eq__ without __hash__, what happens? Can you have __hash__ without __eq__?

Your financial system models Money(amount, currency). You need to support Money(100, 'USD') + Money(50, 'USD') = Money(150, 'USD') but Money(100, 'USD') + Money(50, 'EUR') should fail. How do you implement rich comparison + arithmetic dunder methods correctly?

Implement __add__, __sub__, __lt__, __eq__ with type-safety and currency validation. Key patterns: (1) __add__ checks currency match first, else raises ValueError, (2) return NotImplemented (not None) if operand types are incompatible—allows Python to try right operand's __radd__, (3) __eq__ returns False (not raises) for type mismatches, but __lt__ can raise if currencies differ (comparison requires consistency), (4) symmetric operations: __add__ and __radd__ must handle both Money + int and int + Money, (5) be consistent: if __eq__ is defined, implement __hash__ if objects are hashable, (6) avoid reflexive calls: don't call __add__ from __radd__ naively. Example: `def __add__(self, other): if not isinstance(other, Money): return NotImplemented; if self.currency != other.currency: raise ValueError("Currency mismatch"); return Money(self.amount + other.amount, self.currency)`. For __radd__, handle reversed operands: `def __radd__(self, other): return self.__add__(other)` works for commutative ops. Test with operator.add() to verify reflexivity and return values (None should never be returned, only NotImplemented or a result).

Follow-up: What's the difference between returning None and NotImplemented from a dunder method? How does Python's operator resolution chain work for __add__ vs __radd__?

Your framework auto-registers model classes in a global registry using __init_subclass__. Subclasses fail to register, and introspection shows __init_subclass__ is never called. How do you debug?

__init_subclass__ (PEP 487) is called when a class is subclassed, not instantiated. Common mistakes: (1) if the parent class doesn't define __init_subclass__, it inherits object.__init_subclass__ which does nothing—define it explicitly in base class, (2) __init_subclass__ must call super().__init_subclass__(**kwargs) to chain if there are mixins, (3) kwargs passed to class declaration (class Child(Parent, key=value)) are forwarded to __init_subclass__—if not handled, it raises TypeError, (4) __init_subclass__ is called at class definition time, not import time—if registration happens in __init__, subclasses defined in import-time code must call parent __init__ explicitly. Debugging: add print statements in __init_subclass__ with stack traces; verify super() call chain. Example of incorrect code: `class Base: pass` then `class Child(Base, name="test"): pass` raises TypeError because Base doesn't accept `name`. Fix: `class Base: def __init_subclass__(cls, **kwargs): super().__init_subclass__()`. For metaclass-based registration (avoid if possible), use __init_subclass__ instead—it's cleaner and doesn't require metaclass conflicts.

Follow-up: How do __init_subclass__ and metaclass __new__ interact? What's the execution order when both are present? Can __init_subclass__ access the final __dict__ of the subclass?

Your database query DSL supports chaining: Query().filter(x=1).select('id', 'name').order_by('id'). Each method returns self, but the final query object has wrong state. What's the dunder method issue?

The problem is likely related to __getattr__ or __setattr__ side effects in the DSL chain. If the DSL mutates internal state during chaining, the final object reflects all mutations. Issues: (1) if __getattr__ is used to forward method calls to internal builder, it may not trigger __setattr__—mutations on the builder don't update self, (2) if __setattr__ is implemented to track changes, side effects during method calls may bypass it, (3) forgetting to return self at end of chain methods breaks the chain, (4) if __call__ is used for query execution mid-chain, it may freeze state. Debugging: implement __repr__ to show internal state at each step, add logging in __setattr__/__getattr__ to track mutations. Example fix: ensure each method mutates self (or creates new instance if immutable DSL) and returns self. Use __enter__/__exit__ for context managers if DSL is stateful. For functional DSLs, return new instances instead of mutating—immutability prevents side effect bugs: `return Query(self.filters + [x=1])` instead of `self.filters.append(x=1); return self`. Test chain order: verify Query().A().B().C() == Query().C().B().A() (or document order dependency).

Follow-up: How do __enter__/__exit__ interact with __setattr__? If a context manager mutates __dict__ during __exit__, do those mutations persist after context exit?

You implement a custom numeric type: Decimal-like class with __add__, __mul__, etc. Tests pass locally, but in production with numpy arrays, operations fail silently or return wrong types. What's the issue?

NumPy arrays use C-level optimizations and bypass Python dunder methods when possible. When you do numpy_array + custom_obj, numpy may coerce your object to float/int instead of calling your __add__. The dunder method protocol assumes Python-level dispatch, but NumPy short-circuits this. Solutions: (1) register your type with numpy using numpy.ma.register_obj_type() or implement __array_interface__/__array_ufunc__, (2) implement __array_ufunc__ to intercept numpy operations: `def __array_ufunc__(self, ufunc, method, *inputs, **kwargs)` returns result or NotImplemented, (3) implement __array__ to convert to numpy array if needed, (4) wrap numpy arrays before operations: numpy doesn't know about your dunder methods, so coerce to your type first, (5) test with isinstance() to verify operand type before dispatch. Better practice: inherit from numpy.ndarray if you need numpy compatibility. Key insight: __add__ is only called if both operands are Python objects; numpy arrays take precedence. For custom numeric types used with numpy, test explicitly with numpy arrays, not just Python scalars. Use __array_priority__ to hint numpy about precedence (higher value = call your __array_ufunc__ first).

Follow-up: What does __array_priority__ do? How does __array_ufunc__ interact with __add__? Can you override numpy.add() for custom types?