Python Interview Questions

__slots__ and Memory Optimization

questions
Scroll to track progress

Production Scenario Interview Questions

Your platform has 10M user model instances in memory. Memory usage is 45GB. You add __slots__ to User class, expecting 50% reduction. Memory drops to 42GB—only 6% savings. What went wrong?

__slots__ saves memory by eliminating __dict__ per instance, but several factors limit savings: (1) if parent class doesn't define __slots__, child instances still have __dict__—__slots__ only works if entire inheritance chain uses it, (2) slot descriptors (one per slot) are stored on the class, not per instance—memory savings scale with number of instances, not slot count, (3) if you have 10 slots, each takes ~64 bytes on the class, but saves ~280 bytes per instance (depends on attributes), (4) if instances reference large objects (lists, dicts, strings), __slots__ doesn't affect their size, only the instance overhead. Solutions: (1) audit inheritance chain: ensure parent classes define __slots__, (2) profile memory with objgraph: measure per-instance size before/after, (3) use sys.getsizeof(instance) to verify overhead reduction, (4) consider only adding __slots__ to base classes—subclasses inherit savings, (5) profile to find memory hogs: if User has a list attribute, removing that saves more than __slots__. Example: `class User: __slots__ = ('id', 'name', 'email')` saves ~280 bytes per instance on 64-bit Python. For 10M instances: 10M * 280B = 2.8GB savings. If you only saved 3GB, investigate: check if parent class has __dict__, or if instances reference large objects. Use memory_profiler: `@profile; def create_users(): users = [User(...) for _ in range(1000000)]` and measure before/after __slots__.

Follow-up: How much memory does __slots__ save per instance on average? Does it affect attribute access speed? What happens if you inherit from a class without __slots__?

You implement a Time-series data point class with __slots__ for 100M points. Each point has 5 fields (timestamp, open, high, low, close). After deployment, users report they can't add custom attributes. They expect to do: `point.my_custom_field = value`. How do you handle dynamic attributes with __slots__?

__slots__ prevents dynamic attributes by design—instances don't have __dict__, so you can't add attributes after class definition. Issues: (1) users expect Python's flexibility to add attributes dynamically, (2) __slots__ prevents this completely, (3) if you provide __dict__ in __slots__, you lose memory savings. Solutions: (1) document that class is immutable (no dynamic attributes)—users shouldn't extend models, (2) if extensibility is required, include 'metadata' dict in __slots__: `__slots__ = ('timestamp', 'open', 'high', 'low', 'close', '_metadata'); def __init__(self): self._metadata = {}` then users do `point._metadata['custom'] = value`, (3) use __getattr__/__setattr__ to intercept attribute access and delegate to metadata dict, (4) provide extension mechanism: `add_custom_attribute(point, 'field', value)` function that stores in metadata, (5) reconsider __slots__: if users need flexibility, profile memory to verify savings justify the restriction. For time-series data (immutable), document clearly: "Model instances don't support dynamic attributes." Provide alternative: users can subclass if needed: `class ExtendablePoint(Point): __slots__ = (); ...` (subclass adds __dict__). Example with metadata: `__slots__ = (..., '_metadata'); point._metadata['custom_field'] = value; custom = getattr(point._metadata, 'custom_field', None)`. Testing: try adding attributes and verify error message is clear ("has no attribute" should say "can't set attribute; use _metadata instead").

Follow-up: If you add 'metadata' to __slots__, how much does that reduce memory savings? Can you use __getattr__/__setattr__ to delegate to metadata transparently?

Your ORM uses __slots__ for model instances. Serialization (pickle, JSON) breaks: `pickle.dumps(model)` crashes with "can't pickle _sre.SRE_Pattern object" (a regex compiled in the class). How do you handle __slots__ with serialization?

Pickling __slots__ instances requires special handling if slot values reference unpicklable objects. Issues: (1) if a slot holds a compiled regex, file handle, database connection, or other non-serializable object, pickle fails, (2) __slots__ instances don't automatically exclude unpicklable attributes, (3) __getstate__/__setstate__ aren't automatically called for __slots__ (unlike __dict__-based instances). Solutions: (1) implement __getstate__/__setstate__ to exclude unpicklable slots: `def __getstate__(self): return {name: getattr(self, name) for name in self.__slots__ if name != 'compiled_regex'}`, (2) reconstruct unpicklable slots in __setstate__, (3) use __reduce__ for fine-grained control over pickling, (4) use pickle protocol=4+ which handles __slots__ better, (5) for JSON serialization, implement custom encoder: `json.JSONEncoder` subclass, (6) consider dill library which handles __slots__ better than pickle, (7) lazy-load unpicklable fields: compile regex on first access, not at init. Example: `def __getstate__(self): state = {}; for name in self.__slots__: if name.startswith('_'): continue; # skip private; state[name] = getattr(self, name); return state; def __setstate__(self, state): for name, value in state.items(): setattr(self, name, value)`. Testing: pickle and unpickle, verify state is preserved. For cross-service serialization (JSON), avoid pickle—use pydantic BaseModel (auto-serializable) or manual __dict__ conversion.

Follow-up: How do __getstate__ and __setstate__ interact with __slots__? Does pickle protocol affect __slots__ handling? What's the difference between pickle and dill for __slots__?

You optimize a Node class with __slots__: `__slots__ = ('value', 'left', 'right')` for tree structures. Creating a tree with 1M nodes is 2x slower than without __slots__. Why is __slots__ making code slower?

__slots__ saves memory but can hurt performance in several ways: (1) descriptor access for slots (getting/setting) is slightly slower than __dict__ access in some CPython versions—descriptors add overhead, (2) slot descriptor lookup isn't optimized in all versions, (3) if __slots__ causes CPU cache misses (accessing scattered class descriptors), performance degrades, (4) attribute access patterns matter: if code frequently accesses attributes, descriptor overhead compounds. However, __slots__ typically doesn't make code 2x slower—something else is likely wrong. Investigation: (1) benchmark attribute access: create 1M instances and time `node.value = x; x = node.value`, (2) verify __slots__ is actually active: check `hasattr(Node, '__dict__')` should be False, (3) profile with cProfile to find actual bottleneck—likely not __slots__, (4) check if you're creating/destroying many instances (GC overhead), (5) verify inheritance chain—if parent doesn't have __slots__, overhead increases, (6) consider False Economy: __slots__ helps with 10M instances, not 1M. For 1M nodes, profile first. Solutions: (1) benchmark before/after with cachegrind to measure L1/L2 cache effects, (2) use __slots__ only for classes with millions of instances, (3) for tree structures, __slots__ may hurt if trees are deep (cache locality issues), (4) cache attribute lookups in hot loops: `v = self.value; l = self.left` vs repeated `self.value`, (5) profile with py-spy to visualize where time is spent. Likely: performance regression is not __slots__ but something correlated (e.g., different Python version, different test data).

Follow-up: Does descriptor access for __slots__ have measurable performance impact? When is __slots__ worth using? What's the trade-off between memory and performance?

Your codebase mixes classes with and without __slots__. Inheritance is complex: class A (no slots), class B(A) with __slots__, class C(B). A subclass assigns attributes that don't exist in __slots__ and they silently fail. Debugging is a nightmare. How do you enforce __slots__ consistently?

Inconsistent __slots__ usage causes subtle bugs. Issues: (1) if A has no __slots__, instances have __dict__; if B(A) adds __slots__, B still has __dict__ (from A), defeating memory savings, (2) if B has __slots__ but doesn't include 'x', then `b.x = 1` adds 'x' to __dict__ (from A) silently, (3) code is confusing: some attributes are in slots, others in __dict__. Solutions: (1) establish policy: either all classes use __slots__ or none (for consistency), (2) if using __slots__, enforce in base class: make base class abstract and require __slots__ in all subclasses, (3) use __slots__ = () in intermediate classes that add no attributes, (4) audit with script: find all classes without __slots__ and mark for refactoring, (5) use linting rules: pylint has "too-many-instance-attributes" and custom checks for __slots__, (6) add class comment: `# Class uses __slots__ for memory efficiency—do not add dynamic attributes`, (7) type hints help: if using mypy strict mode, undeclared attributes in __slots__ are caught, (8) test: try assigning unknown attribute and verify AttributeError. Refactoring: start with __slots__ = () (empty) in base, then fill with needed attributes in subclasses. Example: `class Base: __slots__ = (); pass; class Derived(Base): __slots__ = ('x', 'y')`. Verify: `hasattr(node_instance, '__dict__')` should be False for all nodes if __slots__ is used consistently. Audit: `grep -r "class.*__slots__" src/` to find all classes using __slots__, check consistency.

Follow-up: If parent doesn't have __slots__, do child's __slots__ still prevent __dict__? How do you detect instances with both __dict__ and __slots__? Can you refactor existing code to use __slots__ safely?

You optimize a large data model with __slots__ for 50M instances. Code uses dataclass features (eq, hash, repr). Mypy reports slot names aren't available for type checking. Type hints break after __slots__ is added. How do you use __slots__ with dataclasses and type checking?

@dataclass and __slots__ interact awkwardly: (1) @dataclass generates __init__, __repr__, __eq__ based on __annotations__, but doesn't auto-generate __slots__, (2) if you manually add __slots__, mypy doesn't see the connection—it warns about missing attributes, (3) Python 3.10+ has @dataclass(slots=True) which auto-generates __slots__ from annotations. Solutions: (1) use Python 3.10+: `@dataclass(slots=True)` automatically generates __slots__ from field annotations and handles type checking, (2) for Python <3.10, manually define __slots__ matching __annotations__: `class User: __annotations__ = {'id': int, 'name': str}; __slots__ = ('id', 'name')`, (3) use typing.get_type_hints(cls) at runtime to verify __slots__ matches annotations, (4) use pydantic BaseModel with frozen=True: `class User(BaseModel): id: int; name: str; model_config = ConfigDict(frozen=True)` auto-manages slots and type checking, (5) ignore mypy warnings if needed: `# type: ignore[attr-defined]` but prefer fixing. Example Python 3.10+: `@dataclass(slots=True); class User: id: int; name: str`. For Python <3.10, manually sync: `@dataclass; class User: __slots__ = ('id', 'name'); id: int; name: str`. Testing: verify mypy sees all slot attributes, run `mypy --strict` and confirm no errors. Use __all_slots__ = True hint for mypy compatibility (non-standard). For cross-version support, use attrs library: `@attrs.define(slots=True)` works on Python 3.7+ and auto-generates __slots__ with full type support.

Follow-up: Does @dataclass(slots=True) automatically generate __slots__ from annotations? How does attrs compare to @dataclass for __slots__? Can you use both @dataclass and manual __slots__?

Your data model grows over time. You need to add a new attribute to User. With __slots__, you need to modify __slots__ tuple, redeploy, and reload all servers. This breaks canary deployments. How do you evolve __slots__ without downtime?

Adding to __slots__ requires code changes and redeployment—not hot-patchable. Issues: (1) __slots__ is immutable (tuple), defined at class creation time, (2) you can't add to __slots__ at runtime (raises AttributeError), (3) subclasses can add slots, but doesn't help if base class needs new attributes, (4) canary deployments break if old and new server versions have different __slots__. Solutions: (1) plan __slots__ ahead: reserve extra slots for future growth: `__slots__ = (..., '_reserved_1', '_reserved_2', '_reserved_3')` and use them when needed, (2) use extensibility dict in __slots__: `__slots__ = (..., '_extensions')` and store new attributes there as dict, (3) for zero-downtime upgrades, use versioned class names: Class_v1, Class_v2, with migration path, (4) use __getattr__/__setattr__ to redirect to extensions dict transparently, (5) accept memory overhead: if hot-patching matters more, don't use __slots__—use normal __dict__, (6) separate versioning: old servers ignore new attributes, new servers read them from extensions dict during transition period. Example: `class User: __slots__ = ('id', 'name', '_extensions'); def __setattr__(self, name, value): if name in self.__slots__: super().__setattr__(name, value); else: self._extensions[name] = value`. During upgrade: old servers write to __dict__-like interface, new servers sync extensions to new slots. Best practice: if using __slots__, add reserve slots and document evolution plan. Test canary deployments with version skew to catch issues early.

Follow-up: Can you add slots at runtime to a class? How do you handle class versioning with __slots__? What's the clean migration path from __slots__ to non-slots?

Your metrics system tracks 100K event classes, each with __slots__ for memory efficiency. Profiling shows 20% CPU time in __setattr__ calls, more than the memory saved. The bottleneck is descriptor overhead in descriptor protocol. Is __slots__ worth it?

__slots__ isn't always worth the complexity. Cost-benefit analysis: (1) memory savings: ~280 bytes per instance per class, (2) CPU cost: descriptor overhead ~50-100ns per attribute access (varies by Python version and attribute size), (3) code complexity: maintaining __slots__ across inheritance, serialization, evolution, (4) for 100K instances with frequent attribute access, descriptor overhead can exceed memory savings benefit. Solutions: (1) profile carefully: use cProfile to measure __setattr__ time, tracemalloc for memory, (2) calculate break-even: if you have N instances and M attribute accesses per instance, descriptor overhead = N * M * 100ns, memory savings = N * 280 bytes. If descriptor cost > benefit, don't use __slots__, (3) use __slots__ for large datasets (10M+ instances) that are written-once or rarely accessed, (4) for hot-path attribute access (100K+ per second), benchmark carefully—descriptor overhead may dominate, (5) modern Python (3.11+) optimized descriptor access—re-benchmark if upgrading, (6) consider hybrid: use __slots__ only for top-level classes with massive instance counts, not for frequently-accessed intermediate objects. For your case: 100K events with frequent __setattr__ might be better without __slots__. Test: measure CPU time with/without __slots__, compare to memory savings. If CPU regresses >10%, abandon __slots__. Alternative: use __dict__ normally, use __slots__ only for the hottest classes if profiling justifies it. Document: "Used __slots__ because profiling showed X% memory savings at Y% CPU cost—trade-off justified for this use case."

Follow-up: How much does descriptor overhead vary by Python version? Does Python 3.11+ inline descriptor access? When should you avoid __slots__ despite memory pressure?

Want to go deeper?