PostgreSQL Interview Questions

Statistics and Cost Estimation

questions
Scroll to track progress

Planner chose nested loop over hash join. Nested loop scans inner table N times; hash join faster. Why planner got it wrong? Statistics issue or cost model miscalibration?

Nested loop chosen when planner estimates inner scan cheap or outer scan small. Check: EXPLAIN SELECT a.*, b.* FROM a JOIN b ON a.id=b.a_id WHERE a.status='active'. Plan shows join method and cost. Root causes: (1) Outdated statistics: ANALYZE a; ANALYZE b to refresh. Then replan. (2) Outer table row count underestimated: If planner thinks a.status='active' matches 100 rows (actual 1M), chooses nested loop. Fix: Increase STATISTICS: ALTER TABLE a ALTER COLUMN status SET STATISTICS 1000; ANALYZE a. (3) Inner scan cost overestimated: Random_page_cost too high (default 4.0; for SSD, use 1.1): ALTER SYSTEM SET random_page_cost = 1.1; SELECT pg_reload_conf(). (4) Join selectivity wrong: Planner thinks JOIN returns few rows. Multivariate stats help: CREATE STATISTICS stat_a_b (dependencies) ON status, id FROM a, b (captures correlation). (5) Memory constraints: Hash join needs work_mem. If work_mem too small, falls back to nested loop. Check: SHOW work_mem (default 4MB; increase to 1GB for large joins). Fix: ALTER SYSTEM SET work_mem = '1GB'; SELECT pg_reload_conf(). (6) Force join order: SET join_collapse_limit = 1; EXPLAIN ... limits join reordering. If specific order is better, manual CTE: WITH a_filtered AS (SELECT * FROM a WHERE status='active') SELECT a_filtered.*, b.* FROM a_filtered JOIN b ON a_filtered.id=b.a_id (CTE evaluated first, result size known for planner). Verify improvement: EXPLAIN (ANALYZE) SELECT ... should show "Hash Join" with lower actual time than original nested loop. Tune systematically: (1) ANALYZE. (2) Lower random_page_cost. (3) Increase work_mem. (4) Re-run EXPLAIN. Most cases fixed by ANALYZE or cost tuning.

Follow-up: Explain join_collapse_limit and constraint_exclusion. How do they affect planning?

Statistics outdated after large INSERT (1M rows). Planner estimates stale; queries suddenly slow. When does autovacuum/ANALYZE run? How to force refresh?

Autovacuum runs ANALYZE periodically (default autovacuum_analyze_scale_factor=0.1; triggers after 10% table growth). For 1M insert into 10M row table (10% growth), ANALYZE should auto-run. But: (1) Autovacuum may be delayed if autovacuum_naptime=60s and batch insert happens between runs. (2) Autovacuum disabled: SHOW autovacuum. (3) autovacuum_max_workers too low; queue backlog. Check: SHOW autovacuum_max_workers (default 3). Increase if many tables. Force immediate ANALYZE: ANALYZE table_name (blocks until complete; use ANALYZE VERBOSE to see progress). For specific column: ANALYZE table_name (column_name). Parallel ANALYZE (PG13+): ANALYZE (PARALLEL 4) uses 4 workers. After INSERT, queries should re-plan: EXPLAIN SELECT ... FROM table WHERE ...` will show new estimates. If still wrong: (1) Increase STATISTICS: ALTER TABLE table ALTER COLUMN col SET STATISTICS 1000; ANALYZE. (2) Create multivariate statistics: CREATE STATISTICS stat_name (dependencies) ON col1, col2 FROM table. Then ANALYZE. (3) Verify stats: SELECT avg_width, n_distinct FROM pg_stats WHERE tablename='table' AND attname='col'. n_distinct=-1 means all unique (planner assumes worst case). Re-analyze with higher sample. Prevention: (1) Set autovacuum parameters per table: ALTER TABLE table SET (autovacuum_vacuum_scale_factor=0.01, autovacuum_analyze_scale_factor=0.005) (triggers after 1% growth). (2) Schedule explicit ANALYZE during maintenance window: ANALYZE big_table every night. (3) Monitor: Log ANALYZE runs: ALTER SYSTEM SET log_autovacuum_min_duration = 0; SELECT pg_reload_conf() (log all autovacuum).

Follow-up: Explain STATISTICS and how to interpret pg_stats output.

Column has skewed distribution (95% of rows have value X, 5% values Y-Z). Planner assumes uniform distribution; estimates wrong. Design histogram strategy.

Uniform distribution assumption fails for skewed data. Example: status column (active=95%, inactive=5%). WHERE status='inactive' should match few rows, but planner estimates more (assumes ~50/50). Fix: (1) Increase STATISTICS to capture histogram buckets: ALTER TABLE orders ALTER COLUMN status SET STATISTICS 1000; ANALYZE orders. Higher STATISTICS = more histogram buckets (max 10000). (2) Inspect histogram: SELECT histogram_bounds FROM pg_stats WHERE tablename='orders' AND attname='status'. Bounds show bucket edges. (3) For very skewed, use most-common-values (MCVs): SELECT most_common_vals, most_common_freqs FROM pg_stats WHERE tablename='orders' AND attname='status'. MCVs list top values + frequencies (e.g., active=0.95). (4) If MCVs incomplete, increase statistics and re-ANALYZE. (5) Alternative: Extended statistics (correlation): CREATE STATISTICS stat_status (dependencies) ON status FROM orders; ANALYZE. Captures correlation with other columns. (6) For very extreme skew (some value 99.9%), consider separate table or flag: CREATE TABLE orders_active AS SELECT * FROM orders WHERE status='active'; CREATE INDEX idx_active ON orders_active(id). Queries on active data use separate table, planner accurate. Verification: EXPLAIN (ANALYZE) SELECT * FROM orders WHERE status='inactive' should show "Rows=1k estimated=1k, actual=1k" (planner accurate now). If still wrong despite STATISTICS: Use index hints or materialized view to force specific plan. Best practice: For production tables with skewed columns, always set STATISTICS=1000 and regularly ANALYZE.

Follow-up: Explain MCV (most-common-values). How does PostgreSQL use MCV for selectivity estimation?

Correlation between columns: age and income (older = more income). Planner treats independently; joins on both underestimated. Use extended statistics to capture correlation.

Extended statistics capture correlations between columns; improves multi-column filters. Example: SELECT * FROM customers WHERE age > 50 AND income > 100000. If age and income uncorrelated: Planner estimates rows = (rows where age>50) × (rows where income>100k). If correlated: Actual rows fewer (fewer high-income young people). Extended statistics fix this: (1) Create: CREATE STATISTICS stat_age_income (dependencies) ON age, income FROM customers. DEPENDENCIES captures correlation strength. (2) Analyze: ANALYZE customers. (3) View: SELECT * FROM pg_stats_ext WHERE objname LIKE 'stat%' shows dependencies (e.g., age dependence on income = 0.85, meaning age strongly correlated with income). (4) Verify benefit: EXPLAIN (ANALYZE) SELECT * FROM customers WHERE age > 50 AND income > 100000. Rows should be more accurate (lower estimate if positive correlation). Alternative statistics: (1) NDISTINCT: CREATE STATISTICS stat_ndist (ndistinct) ON col1, col2 FROM table. Captures distinct value combinations (GROUP BY cardinality). (2) FUNCTIONAL_DEPENDENCY: CREATE STATISTICS stat_fd (functional_dependency) ON col1, col2 FROM table. If col1 determines col2, planner avoids JOIN bloat. Tuning: (1) Combine multiple: CREATE STATISTICS stat_multi (dependencies, ndistinct) ON col1, col2, col3 FROM table. (2) Monitor: SELECT * FROM pg_statistics_ext_data WHERE staoid=(SELECT oid FROM pg_statistics_ext WHERE stxname='stat_age_income'). Shows collected stats. (3) Maintain: Re-ANALYZE monthly if data distribution changes. Benefit: Queries with multi-column filters plan more accurately; JOIN plans better. Recommended for OLAP queries (aggregations on multiple columns) or complex filters.

Follow-up: Explain NDISTINCT and FUNCTIONAL_DEPENDENCY statistics. When create each?

Cost model miscalibration: seq_page_cost=1, random_page_cost=4 (defaults). Your SSD has same speed for both. Lower random_page_cost to 1.1, but now all queries use indexes (bloat indexes, slower). Tune cost model to balance.

Cost model trade-offs: Low random_page_cost encourages index use; high seq_page_cost discourages seq scan. Tuning depends on workload. Measurement: (1) Baseline: ALTER SYSTEM SET seq_page_cost = 1; ALTER SYSTEM SET random_page_cost = 4; SELECT pg_reload_conf(). Run slow queries: EXPLAIN (ANALYZE) SELECT ... FROM big_table WHERE col=val. Measure time. (2) Test index preference: ALTER SYSTEM SET random_page_cost = 1.1; SELECT pg_reload_conf(). Rerun same query. If faster, SSD prefers index. If slower (index overhead), revert. (3) Workload-specific: OLTP (point lookups): Lower random_page_cost (1.0-1.5) encourages indexes. OLAP (full scans): Higher random_page_cost (3.0-4.0) encourages seq scan. Mixed: Compromise (random_page_cost = 2.0). (4) Storage type: HDD: random_page_cost=4 (seeks expensive). SSD: random_page_cost=1.1-1.5 (random ~same as sequential). NVMe: random_page_cost=1.0 (all access fast). Calibration: (1) Measure actual I/O speed: fio --name=random --ioengine=psync --rw=randread --bs=8k --numjobs=1 --runtime=10 --filename=/var/lib/postgresql/data/base/16384 (8KB random read latency). vs. fio --name=seq --ioengine=psync --rw=read --bs=1m --numjobs=1 --runtime=10 --filename=/var/lib/postgresql/data/base/16384 (1MB seq read). Ratio = random_page_cost. (2) Apply: ALTER SYSTEM SET random_page_cost = calculated_ratio; SELECT pg_reload_conf(). (3) Other costs: cpu_tuple_cost (default 0.01, per-row CPU), cpu_index_tuple_cost (default 0.005, index row CPU), cpu_operator_cost (default 0.0025, operator CPU). Adjust if CPU-bound (high CPU costs discourages CPU-expensive plans like sort). (4) Memory: work_mem affects sort/hash plans. If work_mem low, expensive plans penalized. Tuning: (1) Profile workload: EXPLAIN (ANALYZE, BUFFERS) on slow queries. (2) Adjust one parameter at a time. (3) Measure total latency, not just one query. (4) Re-ANALYZE after cost changes (stats unchanged, but plans recomputed). Recommendation: SSD workload default random_page_cost=1.5; OLAP workload default 3.0. Measure and iterate.

Follow-up: Explain all cost parameters: seq_page_cost, random_page_cost, cpu_*. How interact?