PostgreSQL Interview — Table Partitioning Strategies

Orders table is 2TB, growing 50GB/month. Queries slow, vacuum takes 12 hours. Design a partitioning strategy: range, list, or hash. Trade-offs and implementation.

2TB table benefits from partitioning: Partition by range (time), list (region), or hash (distributor). For time-series orders, range partitioning by month/week is most common. Strategy: CREATE RANGE PARTITION BY created_at (one partition per month). Benefits: (1) Vacuum each partition independently (faster). (2) Query prune: WHERE created_at >= '2026-04' only scans that partition. (3) Retention: DROP partition to archive old data. Implementation: CREATE TABLE orders (id SERIAL, created_at DATE, amount DECIMAL) PARTITION BY RANGE (created_at). Then create partitions: CREATE TABLE orders_202604 PARTITION OF orders FOR VALUES FROM ('2026-04-01') TO ('2026-05-01'); CREATE TABLE orders_202603 PARTITION OF orders FOR VALUES FROM ('2026-03-01') TO ('2026-04-01'). Setup automatic partition creation (PG11+): Use a trigger or pg_partman extension: CREATE EXTENSION pg_partman; SELECT create_parent('public.orders', 'created_at', 'range', 'monthly'). Trade-offs: (1) Range: Ideal for time-series; queries must include partition key for prune. (2) List: For discrete categories (region, customer_type). (3) Hash: For even distribution; less queryable. Maintenance: Monitor partition sizes: SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) FROM pg_tables WHERE tablename LIKE 'orders_%' ORDER BY pg_total_relation_size DESC. Drop old partitions: DROP TABLE IF EXISTS orders_202601. Caution: Existing unpartitioned table requires downtime to convert; use pg_partman or pg_repack for online migration.

Follow-up: Explain partition pruning and constraint exclusion. How does the planner skip partitions?

Partitioned orders table queries slow when accessing multiple partitions (cross-partition joins). Design an optimization: indexes, partial indexes, or sub-partitioning.

Cross-partition queries require merging results from multiple partitions; no single index covers all. Query: SELECT * FROM orders WHERE status='pending' AND created_at >= '2026-03-01' AND created_at < '2026-04-07' scans 2 partitions (March + April), needs index on (status, created_at) on each. Optimization: (1) Local indexes: Create index on each partition: CREATE INDEX orders_202604_status ON orders_202604(status, created_at). Repeat per partition. Query planner uses partition indexes; concatenates results. (2) Global index (slower): Single index on parent table; covers all partitions but defeats partitioning benefit. (3) Partial indexes: CREATE INDEX orders_202604_pending ON orders_202604(created_at) WHERE status='pending'; smaller, faster. (4) Sub-partitioning: If dataset huge, partition orders by month, then by region (2-level): CREATE TABLE orders PARTITION BY RANGE (created_at) THEN PARTITION BY HASH (region_id). Sub-partitions allow finer granularity. (5) Materialized view for hot queries: CREATE MATERIALIZED VIEW pending_orders_recent AS SELECT * FROM orders WHERE status='pending' AND created_at >= NOW() - INTERVAL '30 days'; CREATE INDEX ON pending_orders_recent(created_at). Refresh regularly. Tuning: Monitor query plans: EXPLAIN SELECT * FROM orders WHERE status='pending' should show Append + Seqscan on each partition or Index Scan if index exists. If Seqscan on all partitions, create indexes. Recommended: Local indexes on common filters per partition.

Follow-up: Explain subpartitioning (nested partitions). When would you use it?

After partitioning, parent table queries mysteriously return no rows. Partitions have data but parent SELECT empty. Why and what went wrong?

Root cause: INSERT into parent table is failing, or partitions are not attached. PostgreSQL allows INSERT into parent; it routes to correct child. If INSERT fails (no matching partition), data silently doesn't go anywhere (constraint_exclusion). Check: SHOW constraint_exclusion (should be 'partition'). If 'off', planner doesn't prune and may not find partition match. Set: ALTER SYSTEM SET constraint_exclusion = 'partition'; SELECT pg_reload_conf(). Also check partition attachment: SELECT tablename, schemaname FROM pg_tables WHERE tablename LIKE 'orders_%' should show all partitions. Verify attachment: SELECT inhrelid::regclass FROM pg_inherits WHERE inhparent='orders'::regclass; if empty, partitions not attached. Reattach: ALTER TABLE orders_202604 INHERIT orders. For newly created partitions, ensure range covers inserted dates. If INSERT date > partition upper bound, it's rejected. Example: INSERT INTO orders (created_at, ...) VALUES ('2026-05-10', ...) fails if latest partition is FOR VALUES TO ('2026-05-01'). Fix: Create partition for May: CREATE TABLE orders_202605 PARTITION OF orders FOR VALUES FROM ('2026-05-01') TO ('2026-06-01'). Best practice: Auto-create partitions with pg_partman to avoid gaps. Verify: SELECT COUNT(*) FROM orders_202604; SELECT COUNT(*) FROM orders should return same total.

Follow-up: Explain constraint_exclusion settings: off, on, partition. When is each used?

Partition strategy: Range by month. One partition grows to 500GB (hot data). Others are 50GB (cold). Uneven partition sizes cause skew. Redesign for balance.

Time-based partitioning can lead to skewed sizes if data volume varies (e.g., busy season vs. slow season). Uneven partitions cause: (1) Vacuum on large partition takes much longer. (2) Index scans on large partition slower. (3) Storage planning difficult. Redesign: (1) Sub-partition hot data: Current: range by month. New: range by month for cold data (3 months old), range by week for hot data (< 3 months). Use trigger or pg_partman: SELECT partman.create_parent('public.orders', 'created_at', 'range', 'monthly', p_premake := 2); SELECT partman.set_partition_column_default('public.orders', 'created_at'); SELECT partman.run_maintenance('public.orders') with custom interval per age. (2) Hash sub-partition hot partition: CREATE TABLE orders_recent PARTITION BY HASH (customer_id) (PARTITION orders_recent_0 FOR VALUES WITH (MODULUS 4, REMAINDER 0), ...). Distributes 500GB across 4 partitions (125GB each). (3) Archive old data: Move cold partitions to separate table or storage: CREATE TABLE orders_archive (LIKE orders); ALTER TABLE orders_202603 INHERIT orders_archive; ALTER TABLE orders_202603 NO INHERIT orders. Compress: ALTER TABLE orders_202603 SET (toast_tuple_target = 128). (4) Monitor: SELECT tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) FROM pg_tables WHERE tablename LIKE 'orders%' ORDER BY pg_total_relation_size DESC to track sizes. Rebalance if variance > 2x.

Follow-up: Explain combining partitioning strategies (e.g., range + hash). When is hybrid partitioning useful?

Converting non-partitioned 1TB table to partitioned. Schema same, data needs migration. Design zero-downtime migration (minimal read/write impact).

Converting in-place is difficult (table structure changes). Zero-downtime migration requires logical replication or double-writing. Strategy: (1) Create new partitioned table (empty): CREATE TABLE orders_new PARTITION BY RANGE (created_at) ... (CREATE TABLE orders_new_202604 PARTITION OF orders_new FOR VALUES FROM ... ). (2) Pre-populate historical data during maintenance window (brief downtime). OR use logical replication: Setup replication stream into orders_new while orders gets writes. (3) Use pg_repack (online rebuild without full downtime): pg_repack -d mydb -t orders --tablespace tmp_space (requires extra disk space = table size). Downsides: Slower than offline migration, extended locking. (4) Alternative: Dual-write pattern: (a) App writes to both old (orders) and new (orders_new). (b) Backfill orders_new from orders: INSERT INTO orders_new SELECT * FROM orders WHERE created_at >= '2026-01-01' ORDER BY created_at (chunked: batch 1M rows at a time). (c) Verify counts: SELECT COUNT(*) FROM orders; SELECT COUNT(*) FROM orders_new should match. (d) Switchover: Rename: ALTER TABLE orders RENAME TO orders_old; ALTER TABLE orders_new RENAME TO orders. (5) For truly zero downtime, use foreign data wrapper (FDW) to shadow read-only copy, or Citus distributed table. Recommendation: Schedule brief maintenance window for final switchover; most production uses this. Pre-migration: Test on staging; measure copy time, rehearse rollback.

Follow-up: Explain pg_repack. How does it rebuild a table without full downtime?

Partition inheritance tree broken after maintenance. Parent table queries return partial results; child partition missing from inheritance. Recovery and prevention.

Partitions use PostgreSQL table inheritance; child is "inherited" from parent. If inheritance broken, child no longer receives parent queries. Root causes: (1) Accidental DROP ... INHERIT: ALTER TABLE orders_202603 NO INHERIT orders (intended?) removes from inheritance. (2) Corrupted pg_inherit system table (rare; indicates serious DB problem). (3) Partition detached for maintenance, not reattached. Recovery: (1) Verify inheritance: SELECT tablename FROM pg_tables WHERE tablename LIKE 'orders_%' AND schemaname='public' lists partitions. Check inheritance: SELECT inhrelid::regclass FROM pg_inherits WHERE inhparent='orders'::regclass. Missing partitions need re-attachment. (2) Re-attach: ALTER TABLE orders_202603 INHERIT orders. (3) Verify: Re-run above query; partition should appear in inheritance list. (4) Validate data consistency: SELECT COUNT(*) FROM orders; SELECT SUM(c) FROM (SELECT COUNT(*) c FROM orders_202604 UNION ALL SELECT COUNT(*) FROM orders_202603 ...) UNION_COUNTS should match. Prevention: (1) Never manually ALTER TABLE ... INHERIT without reason. (2) Use pg_partman for automation; handles attachment/detachment. (3) Backup inheritance config: SELECT * FROM pg_inherits WHERE inhparent='orders'::regclass regularly; store in version control. (4) Monitor: Automated checks post-maintenance: assert COUNT(*) from orders == SUM(COUNT(*)) from all partitions.

Follow-up: Explain pg_inherits system table. How does PostgreSQL track partition hierarchy?

Queries on partitioned table with UNION across partitions are slower than single partition. Why is parallel append not working? Diagnose and fix.

Parallel APPEND (scanning multiple partitions in parallel) was added PG11; older versions use sequential append. Issue: Parallel disabled or not effective for query. Check: EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE created_at >= '2026-03-01' AND created_at < '2026-04-07'. Plan shows: "Append" (sequential) vs. "Parallel Append" (parallel). If Append (sequential), parallel is disabled. Enable: ALTER SYSTEM SET max_parallel_workers_per_gather = 4; ALTER SYSTEM SET min_parallel_table_scan_size = '10MB'; reload. Check PG version: Parallel Append requires PG11+. If older, upgrade. Also verify parallel is enabled globally: SHOW max_parallel_workers (should be > 0). For your query to use parallel append: (1) Query must touch 2+ partitions. (2) Each partition must be large enough to warrant parallelism (min_parallel_table_scan_size). (3) Parallel workers available. Tuning: ALTER SYSTEM SET parallel_setup_cost = 1000; ALTER SYSTEM SET parallel_tuple_cost = 0.01 (lower costs encourage parallelism). Then: SELECT pg_reload_conf(). Rerun EXPLAIN; should show Parallel Append. Verify with ANALYZE: EXPLAIN (ANALYZE) SELECT ... and check execution time improves. Staff note: Parallel append overhead (~few ms) may not justify for small dataset; most useful for 1B+ row scans.

Follow-up: Explain parallel append concept. Why is it faster for multi-partition scans?