PostgreSQL Interview — pg_stat_statements and Performance Monitoring

CPU 90%; don't know which queries. Setup pg_stat_statements to identify expensive queries. Create, monitor, top queries by CPU/time/calls.

pg_stat_statements tracks query execution stats. Setup: (1) Install: CREATE EXTENSION pg_stat_statements. (2) Verify: SELECT * FROM pg_stat_statements LIMIT 1. (3) Top queries by CPU (total_exec_time): SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10. Shows queries consuming most CPU. (4) Top by mean_exec_time (slowest individual calls): SELECT query, calls, mean_exec_time, max_exec_time FROM pg_stat_statements WHERE calls > 100 ORDER BY mean_exec_time DESC LIMIT 10 (filter out rare slow queries). (5) Top by calls (most frequent): SELECT query, calls, mean_exec_time FROM pg_stat_statements ORDER BY calls DESC LIMIT 10. Frequent queries may be slow. (6) Top by I/O (rows read/returned): SELECT query, rows, total_exec_time FROM pg_stat_statements WHERE rows > 10000 ORDER BY rows DESC LIMIT 10. Large scans. Understanding output: (1) query: Normalized query (parameters replaced with $1, $2 for grouping). (2) calls: Number of times executed. (3) total_exec_time: Sum of all execution times (ms). (4) mean_exec_time: Average time per call (ms). (5) max_exec_time: Longest single execution. (6) rows: Rows returned by query. (7) blk_read_time, blk_write_time: I/O time (if track_io_timing=on). Tuning: (1) Set track_io_timing: ALTER SYSTEM SET track_io_timing = on; SELECT pg_reload_conf(). Enables I/O stats (slight overhead). (2) Set shared_preload_libraries: ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'; SELECT pg_reload_conf(); RESTART (enables stats collection at start). (3) Limits: ALTER SYSTEM SET pg_stat_statements.max = 10000 (max queries tracked; default 5000). (4) Retention: Stats reset on restart; persist by exporting: COPY (SELECT * FROM pg_stat_statements) TO '/tmp/pg_stat_dump.csv' WITH CSV HEADER periodically. Recommended actions: (1) Find slowest query, EXPLAIN, optimize (add index, rewrite). (2) Find most-called query, optimize (even 1ms saved × 10k calls = 10s total saved). (3) Monitor ratio: total_exec_time / calls should be reasonable (SLA-dependent). Staff practice: Check pg_stat_statements daily in production; alert if top query exceeds threshold.

Follow-up: Explain queryid and normalized queries. Why use queryid instead of full query text?

pg_stat_statements reset on restart (data lost). Design monitoring pipeline: export stats periodically, store in time-series DB (InfluxDB/Prometheus), alert on anomalies.

Stats lost on restart; need external persistence. Pipeline: (1) Export script (cron job): */10 * * * * psql -c "COPY (SELECT NOW(), * FROM pg_stat_statements) TO STDOUT WITH CSV" | influx write --bucket pg_metrics --format csv --header "#datatype timestamp,tag,field,field,field,..." (every 10min). (2) InfluxDB (time-series): Stores CPU/calls/latency over time. (3) Query history: SELECT TIME_BUCKET('1h', time), query, AVG(mean_exec_time) FROM pg_stats_history GROUP BY 1, 2 ORDER BY 1 DESC (averages per hour). (4) Anomaly detection: Alert if query latency spikes 2x: SELECT query, mean_exec_time FROM pg_stat_statements WHERE mean_exec_time > (SELECT AVG(mean_exec_time) * 2 FROM pg_stats_history WHERE query= AND time > NOW() - INTERVAL '7 days'). (5) Alert rules (Prometheus/Grafana): pg_stat_statements_mean_exec_time{query="SELECT * FROM orders"} > 100 for 5m (alert if query >100ms for 5min). (6) Dashboard: Visualize top 10 queries over time (Grafana). Setup: (1) Install pg_stat_statements on all servers. (2) Centralized export: Cron job exports from each server to central InfluxDB. (3) Scrape metrics: Prometheus scrapes InfluxDB or custom exporter. (4) Alert: Grafana rules fire alerts. Alternative (simpler): Use managed monitoring (Datadog, New Relic) which auto-collects pg_stat_statements. Data retention: (1) Keep raw stats for 7 days (high resolution). (2) Aggregate to hourly for 30 days. (3) Archive to S3 for compliance. Analysis: (1) Compare week-over-week: SELECT query, time, mean_exec_time FROM pg_stats_history WHERE EXTRACT(dow FROM time)=5 ORDER BY time DESC (Fridays; same day-of-week reduces noise). (2) Regression detection: If top 5 queries changed, might indicate schema/data changes or app bug. Investigate and alert dev team. Recommendation: For production, external monitoring is mandatory; on-DB stats only for ad-hoc investigation.

Follow-up: Design anomaly detection for query performance. How to distinguish normal variation from real regression?

pg_stat_statements memory full: "SELECT * FROM pg_stat_statements" returns thousands of queries (parameter variations). pg_stat_statements.max reached. Cleanup strategy.

pg_stat_statements max capacity reached when too many unique queries (or parameter variants). Root causes: (1) Dynamic queries with hardcoded values (should use prepared statements): BAD: SELECT * FROM orders WHERE id=123 (different query per ID). GOOD: SELECT * FROM orders WHERE id=$1 (prepared, same query). (2) High cardinality parameters (timestamps, UUIDs): Every request different query. (3) pg_stat_statements.max too low. Solutions: (1) Increase max: ALTER SYSTEM SET pg_stat_statements.max = 20000; SELECT pg_reload_conf(); RESTART (requires restart). (2) Reset outdated entries: SELECT pg_stat_statements_reset() resets all (aggressive; loses history). Selective reset (PG13+): SELECT pg_stat_statements_reset(userid, dbid, queryid) (resets specific query). (3) Normalize queries (app-side): Ensure app uses prepared statements; all IDs/timestamps become $1, $2. (4) Aggregate old stats (before reset): Export to file: COPY (SELECT * FROM pg_stat_statements WHERE query LIKE '%old_query%') TO '/tmp/stats_backup.csv' WITH CSV HEADER; SELECT pg_stat_statements_reset(). (5) Query fingerprinting: Use external tool (pgBadger, pgSQL Lint) to normalize queries and aggregate manually. Monitoring: (1) Alert if pg_stat_statements full: SELECT COUNT(*) FROM pg_stat_statements; -- alert if > 90% of max. (2) Track growth: Export count periodically to identify trend. Prevention: (1) Enforce prepared statements in app (ORM configs). (2) Use statement pooling (PgBouncer) to normalize variant queries. (3) Set pg_stat_statements.max conservatively; monitor for capacity. (4) Schedule weekly reset if needed (trade off history for fresh stats). Recommendation: For production, aim for <5k unique queries; if exceeded, likely app code issue (dynamic queries). Fix app, not symptoms.

Follow-up: Explain prepared statements and why they help pg_stat_statements. Security and performance benefits?

Query latency varies: same query 5ms one day, 50ms next day. pg_stat_statements shows average masked variation. Design monitoring for outliers/variance, not just averages.

pg_stat_statements shows mean_exec_time and max_exec_time, but not distribution (variance, percentiles). Missing latency variance hides outliers. Example: Query avg=10ms, but p99=100ms (users see slowness 1% of time). Improve monitoring: (1) Track percentiles (p50, p95, p99): Custom extension or external collector. Query: SELECT query, max_exec_time FROM pg_stat_statements WHERE query LIKE '%orders%' ORDER BY max_exec_time DESC (max gives ceiling, but not percentile distribution). (2) Time-series aggregation: Export to Prometheus/InfluxDB; calculate percentiles: histogram_quantile(0.99, rate(pg_stat_statement_mean_exec_time[5m])) (99th percentile over 5min window). (3) Bucket histogram: CREATE TABLE query_latency_buckets (query_id INT, bucket_ms INT, count INT); INSERT INTO query_latency_buckets SELECT queryid, FLOOR(mean_exec_time/10)*10, COUNT(*) FROM pg_stat_statements_history GROUP BY queryid, bucket_ms (group latencies into buckets, see distribution). (4) Outlier detection: SELECT query, max_exec_time FROM pg_stat_statements WHERE max_exec_time > PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY mean_exec_time) * 10 ORDER BY max_exec_time DESC (queries where max > 10× 95th percentile; likely outliers). Root causes of variance: (1) Cache misses: Buffer pool eviction; first query after restart slow. (2) Locks: Contention with other queries; latency unpredictable. (3) I/O: Disk busy; seq scans slow. (4) CPU: System load; context switches increase latency. (5) Index bloat: Query plan changes; sometimes fast, sometimes slow. Remediation: (1) Increase shared_buffers to keep working set in memory. (2) Reduce lock contention (see locking topic). (3) Monitor iostat; alert if high %. (4) Set consistent CPU isolation (cgroups, NUMA settings). (5) Rebuild indexes regularly (REINDEX CONCURRENTLY). Recommended: For SLAs, track p99/p95 latency, not just mean. Alert if p99 exceeds threshold.

Follow-up: Explain percentile-based latency monitoring. How to set SLAs based on p99?

pg_stat_statements overhead: Measuring stats adds 5% CPU overhead. Selective measurement: enable for slow queries only, disable for fast queries.

pg_stat_statements overhead is minimal (usually <2%), but at extreme scale (100k+ queries/sec), 5% is significant. Selective tracking: (1) Full overhead analysis: Measure before/after: ALTER SYSTEM SET shared_preload_libraries = ''; SELECT pg_reload_conf(); RESTART (disable pg_stat_statements). Run workload 10 min, record CPU. Re-enable, repeat. If CPU difference > 2%, investigate. (2) Sampling approach: Track only fraction of queries: Not built-in PG, but: (a) Use PgBouncer with query logging: query_wait_timeout = 3600; log_connections = 1; log_disconnections = 1; log_client_queries = 1 (logs all queries; filter externally). (b) Application-level sampling: App samples 10% of queries: if (rand() < 0.1) log_query(query). (3) Custom extension: Create thin wrapper: CREATE EXTENSION pg_stat_statements_light (pseudo, not real) that tracks only queries > 10ms. (4) Disable for development: ALTER SYSTEM SET shared_preload_libraries = ''; -- production only: ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'. (5) Tune overhead: (a) Reduce pg_stat_statements.max (fewer entries = less memory). (b) Increase pg_stat_statements_interval (skip stats updates more often; trade resolution for CPU). (c) Disable track_io_timing on busy servers: ALTER SYSTEM SET track_io_timing = off. Acceptable overhead: Most agree <2% is fine for monitoring benefits. If >5%, re-evaluate; possibly: Use lightweight alternative (pg_stat_kcache, pg_stat_monitor extensions) or external profiler (perf, py-spy). Recommendation: Default enable pg_stat_statements in production; monitor overhead quarterly. If problem, use sampling or external tools.

Follow-up: Explain alternatives to pg_stat_statements: pg_stat_kcache, pg_stat_monitor. Differences?