PostgreSQL Interview — Connection Pooling and PgBouncer

App opens 500 connections but PostgreSQL max is 100. Connections idle, eating memory. Design a pooling strategy: PgBouncer vs. Application pooling vs. Database pooling. Which and why?

Too many connections cause memory bloat (each connection: ~5MB) and context switch overhead. Solutions: (1) Application pooling (HikariCP, pgbench, sqlalchemy pool): App maintains connection pool, reuses connections. Downside: Per-app pool; if 5 apps each need 100 conns, scale doesn't improve. (2) PgBouncer: Lightweight multiplexing layer between app and DB. Apps connect to PgBouncer (port 6432), PgBouncer maintains fewer DB connections (100) and multiplexes. Single point of failure but highly configurable. (3) Database pooling: Built-in connection pooling (PG doesn't have; Oracle has). Recommendation: For most production, PgBouncer at TCP layer + application pooling is hybrid best practice. Setup: (1) Install PgBouncer: apt-get install pgbouncer. (2) Configure /etc/pgbouncer/pgbouncer.ini: [databases] mydb = host=localhost port=5432 dbname=mydb. [pgbouncer] pool_mode=transaction max_client_conn=500 default_pool_size=20 reserve_pool_size=5 reserve_pool_timeout=3. (3) Start: systemctl start pgbouncer. (4) Test: psql -h 127.0.0.1 -p 6432 -U user -d mydb. Tuning: pool_mode=transaction (reuse after commit) vs. session (per-connection) vs. statement (per-statement; aggressive). For most apps, transaction mode. Monitor: psql -h 127.0.0.1 -p 6432 -U pgbouncer -d pgbouncer -c "SHOW CLIENTS" to see client connections.

Follow-up: Explain pool_mode values: session, transaction, statement. Trade-offs in terms of latency and correctness?

PgBouncer in transaction mode. App suddenly sees "prepared statement already exists" errors on failover. Why and how to handle?

Transaction mode reuses connections per transaction; prepared statements are server-side, tied to connection. If app prepares statement on conn1, commits, then next txn goes to conn2 (different PgBouncer connection to DB), the statement doesn't exist on conn2. Error occurs. Solution: (1) Use named prepared statements at connection init, not per-txn. Better: Unprepare after txn: DEALLOCATE PREPARE stmt_name before commit so next conn doesn't see it. (2) Switch to session mode: pool_mode=session in pgbouncer.ini; each app session sticks to one DB connection. Downsides: Higher connection count (fewer reuse). (3) Use unnamed prepared statements: EXECUTE unnamed_prep_stmt; unnamed are cleaned after statement finishes. (4) Disable server-side prepared statements in app config (e.g., psycopg2: prepared_statements=false). (5) Use pgbouncer's prepared_statement_cache_size=0 to disable caching on PgBouncer side. Best practice: For transaction mode, always DEALLOCATE prepared statements before txn ends, or use unnamed statements. For apps requiring prepared state across txns, use session mode. Alternatively, maintain statement prep in app memory (ORMs often do this) and re-prepare if needed.

Follow-up: Explain prepared statement lifecycle in transaction vs. session pool mode. How does pgbouncer cache statements?

PgBouncer pool is saturated; all 20 connections in use. New client connects but waits indefinitely in PgBouncer queue. Design timeout and backpressure strategy.

Saturated pool causes queuing; clients wait for a connection to free. Indefinite wait can lead to client timeout and cascading failures. Design strategy: (1) Set client_idle_timeout: client_idle_timeout = 300 in pgbouncer.ini (clients idle >300s disconnected). (2) Set query_timeout to kill long txns: query_timeout = 60 (kills queries >60s). (3) Set wait_timeout for queued clients: wait_timeout = 10 (queued clients timeout after 10s). (4) Monitor queue depth: SHOW QUEUES in pgbouncer admin console (psql -U pgbouncer -d pgbouncer -h 127.0.0.1 -p 6432). If queue growing, scale: Increase default_pool_size: default_pool_size = 30 or increase max_db_connections on PG: ALTER SYSTEM SET max_connections = 200. (5) Backpressure at app: Use circuit breaker (e.g., Hystrix) to fail fast if pool timeout. (6) Monitoring: Track pool utilization; alert if > 80%. Queries for diagnostics: SELECT * FROM pg_stat_activity WHERE wait_event_type = 'Activity' AND wait_event = 'ClientRead' to see app wait time. Best practice: Size pool such that peak load uses 70-80% capacity, leaving headroom. Typical formula: pool_size = max_concurrent_txns / avg_txn_duration * safety_margin.

Follow-up: Explain client_idle_timeout vs. query_timeout vs. wait_timeout. When would each fire?

PgBouncer in statement mode. Transaction context lost (autocommit, work_mem). App fails. Diagnose and fix without losing multiplexing benefits.

Statement mode closes transactions after each statement; session variables (work_mem, timezone, search_path) reset. If app expects them to persist across statements, failures occur. Example: SET work_mem='256MB'; SELECT * FROM big_table works on stmt mode, but if SELECT goes to different conn, work_mem is default. Root cause: Statement mode is stateless; ideal for 1-statement microservices, not multi-statement sessions. Fix: (1) Switch to transaction mode: pool_mode=transaction in pgbouncer.ini (most production use this). (2) If statement mode required (for extreme multiplexing): Set session vars per statement: SET work_mem='256MB'; SELECT * FROM big_table (SET and SELECT in same statement via prepared statement or via pipelining). (3) Use pgbouncer RESET on connection re-use: server_reset_query = 'RESET ALL' to reset vars between clients (slower but safe). (4) Disable query pipelining: disable_query_rewrite = off (default on; allows server to optimize queries). Best practice: Use transaction mode for standard OLTP; reserve statement mode for specific high-throughput scenarios (e.g., OLAP aggregations where no session state needed).

Follow-up: Explain transaction mode vs. statement mode overhead. Why would you sacrifice session state for statement mode?

PgBouncer behind a load balancer. Multiple PgBouncer instances pooling to same DB. Connection storms during deployments (rapid reconnects). DB connection limit exceeded. Design resilience.

Multiple PgBouncers connecting to same DB during app deployment cause thundering herd: All instances reconnect simultaneously, exhausting DB connections. Design resilience: (1) Connection reserve and release: Each PgBouncer should not use max capacity; set default_pool_size=15 per PgBouncer, max_db_connections=100 on DB. If 5 PgBouncers × 15 = 75 (safe). (2) Staggered restarts: Deploy one PgBouncer at a time; wait for connections to stabilize before next. for i in 1 2 3 4 5; do systemctl restart pgbouncer@$i; sleep 30; done. (3) Connection draining: Before restart, drain: psql -U pgbouncer -d pgbouncer -h 127.0.0.1 -p 6432 -c "PAUSE; WAIT 5; RESUME" (pauses new connections, waits for in-flight txns, resumes). (4) Client retry logic: App retries on connection refused with backoff. (5) Monitor: SELECT * FROM pg_stat_activity during deploy; alert if connection count > 80% of max. (6) Increase DB capacity: ALTER SYSTEM SET max_connections = 300 during peak deploy window, lower after. Staff pattern: Reserve 20% of max_connections for system processes; don't saturate.

Follow-up: Explain PAUSE/RESUME in pgbouncer. What happens to queued clients during PAUSE?

PgBouncer auth_type=md5 but app requires password (no peer auth in container). Client sends password, PgBouncer rejects with "wrong password". Root cause and fix.

PgBouncer auth_type=md5 hashes password and compares; must match DB user password. If mismatch, auth fails. Root cause: (1) User doesn't exist in DB: CREATE USER app_user WITH PASSWORD 'password'. (2) Mismatched password: PgBouncer vs. DB. (3) Unset user in PgBouncer config: [databases] mydb missing user= directive. Fix: (1) Verify user in DB: SELECT usename, usecanlogin FROM pg_user WHERE usename='app_user'. (2) Set password in PgBouncer config: [databases] mydb = host=localhost user=app_user password=password. (3) Set auth_type: auth_type = md5 (PgBouncer) and verify pg_hba.conf on DB allows md5: host all all 0.0.0.0/0 md5. (4) For container environments without peer auth, use password: auth_type = plain (plain-text, use only over TLS). (5) Use .pgpass file for credentials (more secure): echo "localhost:5432:mydb:app_user:password" > ~/.pgpass; chmod 600 ~/.pgpass. Then PgBouncer config: user=app_user (no password; reads .pgpass). Alternative: Use SCRAM-SHA-256 for stronger hashing (PG10+): auth_type = scram-sha-256. Best practice: Never hardcode passwords in pgbouncer.ini; use .pgpass or secrets manager.

Follow-up: Explain auth_type=plain vs. md5 vs. scram-sha-256. Security trade-offs?

PgBouncer performs well but queries are slightly slower than direct DB connection. App latency up 5ms. Is pooling the culprit? How to measure overhead?

PgBouncer adds latency due to context switch, queuing, connection pickup. 5ms overhead is typical but measurable. Diagnose: (1) Benchmark direct vs. pooled: time for i in {1..1000}; do psql -h localhost -U user -d mydb -c "SELECT 1" > /dev/null; done (direct). vs. time for i in {1..1000}; do psql -h 127.0.0.1 -p 6432 -U user -d mydb -c "SELECT 1" > /dev/null; done (pooled). Difference is pooling overhead. (2) Check PgBouncer stats: psql -U pgbouncer -d pgbouncer -c "SHOW STATS"; look at avg_query_time, avg_wait_time. If avg_wait_time > 1ms, queue is backing up. (3) Measure pool saturation: SHOW POOLS; if active_connections close to pool_size, increase pool. Optimization: (1) Reduce pool_mode overhead: Use transaction mode (standard) vs. statement mode. (2) Local PgBouncer: Run PgBouncer on same server as app (Unix socket): listen_addr = /tmp/.pgbouncer.sock. (3) Tune network: Use TCP_NODELAY: tcp_nodelay = true in pgbouncer.ini. (4) Increase pipelining: pipelining_mode = transaction. For most apps, 5ms is acceptable trade-off for connection multiplexing (1 PgBouncer 100 conns vs. 500 app conns). If unacceptable, re-evaluate architecture; consider application-level pooling instead.

Follow-up: Explain pipelining_mode and how it reduces latency. When would you enable/disable it?