PostgreSQL Interview — Vacuum, Autovacuum, and Bloat Management

Your largest table grows 500MB/day but disk usage grew 2TB in a month. Bloat is 60%. Autovacuum logs show frequent, aborted runs. Design a recovery plan: which vacuum options, when to run, how to prevent recurrence.

Bloat (dead tuples occupying disk) indicates autovacuum is losing the race. First, inspect bloat: SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size, ROUND(100.0 * pg_relation_size(schemaname||'.'||tablename) / pg_total_relation_size(schemaname||'.'||tablename), 2) as bloat_ratio FROM pg_tables WHERE schemaname='public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC. Check autovacuum logs: SHOW log_autovacuum_min_duration should be 0 (log all). Aborted runs mean the table is modified during vacuum—concurrent writes. Recovery: (1) Increase autovacuum aggressiveness for this table: ALTER TABLE large_table SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_insert_scale_factor = 0.01, autovacuum_naptime = '5s'). (2) Manual recovery: VACUUM FULL large_table (locks table, rewrites). (3) If downtime allowed: use pg_repack extension to rebuild without full lock. (4) Long-term: Set aggressive autovacuum parameters globally or per-table based on churn rates.

Follow-up: Why does VACUUM FULL use 2x disk temporarily? How would you VACUUM FULL on a 500GB table without running out of disk?

Autovacuum runs constantly on one table but never finishes. Progress: 10% after 2 hours. Analyze why it's slow, then propose either tuning or an alternative strategy (e.g., partitioning).

Slow vacuum indicates either: (1) Large table with many dead tuples, (2) I/O bottleneck, (3) Maintenance_work_mem too small causing repeated scans, or (4) Concurrent queries holding back xmin. Diagnose: SELECT pid, query, state, query_start FROM pg_stat_activity WHERE state = 'active' AND query LIKE '%VACUUM%'. Check progress in newer PG versions: SELECT * FROM pg_stat_progress_vacuum. Tuning: Increase maintenance_work_mem—vacuum's sort buffer. ALTER SYSTEM SET maintenance_work_mem = '4GB'; SELECT pg_reload_conf(). For blocked xmin, find long-running txns: SELECT pid, usename, xmin_age := age(backend_xmin) FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY xmin_age DESC; kill if idle: SELECT pg_terminate_backend(pid). Alternative: Partition the table by date or hash. Vacuum each partition independently, allowing parallelization. Use pg_stat_statements to find queries with high sequential I/O; add indexes to reduce table scans. Staff note: Slow vacuum on massive tables (>1TB) is a common production pattern; partition or use CONCURRENTLY option (PG13+).

Follow-up: Explain VACUUM FULL vs. VACUUM AGGRESSIVE vs. VACUUM FREEZE. When is each appropriate?

Your monitoring detects visible bloat in pg_replication_slots: the oldest slot holds back xmin, preventing vacuum. Explain the trade-off, then show how to recover without losing replication.

Replication slots consume WAL and prevent xmin advancement because standby may need to replay old transactions. If slot is inactive (standby disconnected), disk fills as WAL accumulates. Check: SELECT slot_name, slot_type, active FROM pg_replication_slots. Inactive slots cause XID/LSN bloat. Recovery without data loss: (1) If standby is permanent, ensure it's running and caught up: SELECT * FROM pg_stat_replication. (2) Restart standby if it disconnected: systemctl restart postgresql@12-main on standby. (3) If slot is abandoned, drop it: SELECT pg_drop_replication_slot('slot_name'). (4) For built-in replication, set slot_retention_type='keep_min_bytelen' to auto-drop slots after N days: ALTER SYSTEM SET slot_retention_type = 'keep_min_bytelen'; ALTER SYSTEM SET wal_retention_period = '7 days'. (5) Monitor WAL disk: du -sh pg_wal/; alert if > 100GB. After recovery: vacuum the database: VACUUM ANALYZE. Recheck bloat.

Follow-up: If a slot is months old and inactive, and xmin is frozen, can vacuum reclaim space? What's the recovery path if you can't drop the slot?

A DELETEd 2M rows from a 10M-row table. Table size didn't shrink post-vacuum. Bloat is now 20%. Autovacuum ran but didn't reclaim. Diagnose and fix.

Deletes don't return disk to the OS; they mark tuples dead. VACUUM reclaims dead space for future INSERTs/UPDATEs but doesn't shrink the file unless VACUUM FULL is used. Check if vacuum actually ran: SELECT schemaname, tablename, last_vacuum, last_autovacuum, vacuum_count FROM pg_stat_user_tables WHERE tablename='big_table'. If last_vacuum is recent, vacuum ran but bloat persists because: (1) Concurrent writes are re-filling the space faster than vacuum can reclaim. (2) VACUUM is marking space but not consolidating (normal; only VACUUM FULL does). (3) Indexes are bloated (VACUUM FULL rebuilds indexes; VACUUM doesn't). Fix: (1) Run VACUUM FULL (exclusive lock): VACUUM FULL big_table. (2) Rebuild indexes: REINDEX TABLE CONCURRENTLY big_table. (3) For online cleanup without downtime: use pg_repack: pg_repack -d dbname -t big_table. (4) Prevention: Increase autovacuum frequency for this table, set work_mem aggressively, monitor DELETE patterns.

Follow-up: Why doesn't VACUUM FULL reclaim disk to the OS? How do you measure index bloat separately?

Your analytics cluster vacuums during business hours and blocks queries. Redesign: propose a vacuum strategy that doesn't block reads or writes.

Standard VACUUM locks table (ShareLock), blocking writes but allowing reads. Heavy VACUUM FULL or on large tables blocks everything. Non-blocking strategies: (1) VACUUM CONCURRENT (PG13+): VACUUM (ANALYZE) CONCURRENTLY big_table—slower but non-blocking. Scans with SnapshotAny to avoid locking. (2) Schedule vacuum during maintenance window (off-hours): SCHEDULE_VACUUM_TASK in cron for 2AM. (3) Increase autovacuum_naptime and scale_factor to spread vacuum load: ALTER TABLE table SET (autovacuum_naptime = '5m'). (4) Use replication: Promote standby for maintenance, vacuum there, then replicate back (async, avoids primary blocking). (5) Partition the table; vacuum one partition at a time. (6) For large deletes: soft-delete with flag column; keep data, VACUUM only deletes snapshot. (7) Use pg_repack for online rebuild: non-blocking, handles bloat. Best practice: Combine VACUUM CONCURRENT on analytics cluster during business hours + full VACUUM on standby during maintenance window.

Follow-up: Compare VACUUM vs. VACUUM FULL vs. pg_repack in terms of lock mode, speed, and disk usage. Which would you use for each scenario?

Autovacuum is disabled on a critical table for performance reasons. Manual vacuum happens weekly. One day, xmin inflation is detected. Walk me through the root cause and the emergency recovery without data loss.

Disabled autovacuum + infrequent manual vacuum allows xmin inflation. Long-running transactions (backup, analytics queries) prevent xmin advancement. Xmin inflation becomes critical at age > 1 billion XIDs (warning), > 2 billion (corruption risk). Check: SELECT age(datfrozenxid) FROM pg_database WHERE datname=current_database(). Root cause: Look for idle-in-transaction sessions: SELECT pid, usename, state, query_start, xmin FROM pg_stat_activity WHERE state='idle in transaction'. Emergency recovery: (1) Kill all idle-in-transaction sessions: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state='idle in transaction'. (2) Immediately run VACUUM FREEZE: VACUUM FREEZE ANALYZE big_table. (3) Force datfrozenxid update: VACUUM FREEZE on all tables in DB. (4) Re-enable autovacuum (if disabled): ALTER SYSTEM SET autovacuum=on; SELECT pg_reload_conf(). (5) Set idle_in_transaction_session_timeout to auto-kill idle: ALTER SYSTEM SET idle_in_transaction_session_timeout='5min'. Prevention: Autovacuum should never be disabled; if disabled, implement manual vacuum every 12-24h + monitoring.

Follow-up: If xmin age exceeds 2 billion despite VACUUM FREEZE, what's the last-resort recovery? How close to XID wraparound are you?

You're tuning autovacuum for a write-heavy table: 100k inserts/sec, constant small UPDATEs. Current autovacuum_scale_factor=0.1 (default). Design tuning parameters and explain the trade-offs between vacuum frequency and I/O overhead.

High write rate generates dead tuples quickly; default scale_factor (0.1) triggers vacuum after 10% of table is dead, which is too infrequent. Custom tuning: ALTER TABLE write_heavy_table SET (autovacuum_vacuum_scale_factor=0.01, autovacuum_vacuum_insert_scale_factor=0.005, autovacuum_naptime='1s', autovacuum_max_workers=4). This triggers vacuum after 1% bloat and 0.5% inserts. Trade-offs: (1) More frequent vacuum = higher I/O and CPU, but prevents bloat and keeps xmin fresh. (2) Aggressive maintenance_work_mem speeds each vacuum but uses more RAM. (3) Multiple autovacuum workers parallelize across tables but contend on I/O. Monitoring: Track vacuum latency and frequency. Set log_autovacuum_min_duration=0 to log all. SELECT schemaname, tablename, last_autovacuum, autovacuum_count, analyze_count FROM pg_stat_user_tables WHERE tablename LIKE '%write_heavy%'. Ideal target: Vacuum runs every 1-5m, completes in <5s. If vacuum takes >30s, reduce scale_factor further or increase maintenance_work_mem. Staff pattern: Measure CPU/I/O overhead of vacuum; if < 5% overall I/O, aggressive tuning is safe.

Follow-up: Why does VACUUM ANALYZE bloat less than VACUUM alone? How does autovacuum_analyze_scale_factor interact with autovacuum_vacuum_scale_factor?