PostgreSQL Interview — Logical Replication and CDC Patterns

Replicate 3 specific tables (orders, customers, products) to a data warehouse (Snowflake) in real-time. Physical replication replicates entire DB (waste). Design logical replication strategy: publisher/subscriber, filters, initial snapshot.

Logical replication streams only table/schema changes, unlike physical (whole DB). Setup: (1) Publisher (primary): Enable logical decoding: ALTER SYSTEM SET wal_level = logical; ALTER SYSTEM SET max_replication_slots = 10; ALTER SYSTEM SET max_wal_senders = 10; SELECT pg_reload_conf(); RESTART PostgreSQL. (2) Create publication: CREATE PUBLICATION dw_pub FOR TABLE orders, customers, products. (3) Subscriber (Snowflake via Postgres FDW or Debezium): Create subscription: CREATE SUBSCRIPTION dw_sub CONNECTION 'dbname=mydb host=primary user=replicator password=pwd' PUBLICATION dw_pub WITH (copy_data=true, synchronous_commit=local). copy_data=true does initial snapshot; synchronous_commit ensures durability. (4) Monitor: SELECT * FROM pg_replication_slots (should show logical slot for subscription). SELECT * FROM pg_stat_replication (shows subscriber connected). (5) Verify data flowing: On subscriber (Snowflake): Check row counts after subscription. On publisher: INSERT INTO orders VALUES (...); SELECT * FROM pg_logical_emit_message(false, 'test', 'message'); verify appears on subscriber. Alternative: Use Debezium (Kafka Connect) for CDC: Publisher sends to Kafka, Snowflake consumes. Better for complex pipelines. Limitations: (1) Logical replication slower (decodes every change). (2) No support for TRUNCATE by default. (3) Initial sync (copy_data=true) can be slow for large tables. Optimization: Pre-populate subscriber manually, then enable replication without copy_data.

Follow-up: Explain wal_level=logical. Why can't you use logical replication with wal_level=replica?

Logical replication lag growing: Publisher ahead, subscriber (Snowflake) behind. WAL accumulating. Root cause: Slow initial COPY during copy_data=true. Optimize without downtime.

Initial data copy (copy_data=true) can be slow for large tables, especially over network (e.g., publisher in AWS, Snowflake in Databricks). Root causes: (1) Large table (>100GB); copying across network is I/O-bound. (2) Slow Snowflake cluster (small warehouse). (3) Network latency/bandwidth. Solutions: (1) Disable copy_data, pre-populate Snowflake manually: (a) Export from publisher: COPY orders TO '/tmp/orders.csv' WITH CSV (or use AWS RDS export to S3). (b) Load into Snowflake: COPY INTO orders FROM @~/orders.csv. (c) Create subscription without copy_data: CREATE SUBSCRIPTION dw_sub ... WITH (copy_data=false). (d) On publisher, enable logical decoding at known LSN before subscription: SELECT * FROM pg_create_logical_replication_slot('dw_slot', 'pgoutput') to anchor LSN. (2) Incremental copy for large tables: Partition initial copy by PK ranges: CREATE TABLE orders_snapshot AS SELECT * FROM orders WHERE id >= 1 AND id < 1000000; -- load to Snowflake; COPY orders_snapshot TO '/tmp/orders_1.csv'. Then load in parallel batches to Snowflake. (3) Speed up network: If AWS, use VPC endpoints; if cross-region, reduce batch sizes. (4) Monitor progress: SELECT * FROM pg_stat_replication_slots shows confirmed_flush_lsn; compare with current WAL position: SELECT pg_current_wal_lsn(). Lag = current_lsn - confirmed_flush_lsn. (5) Increase subscriber capacity: Resize Snowflake warehouse. Prevention: For tables >50GB, pre-load outside replication; use copy_data=false.

Follow-up: Explain pg_create_logical_replication_slot. Why anchor to LSN before copy?

Logical replication broke after schema change on publisher: ALTER TABLE orders ADD COLUMN discount DECIMAL. Subscriber error: "column discount not found". Recovery without restart.

Logical replication doesn't auto-sync schema changes. DDL (ALTER, CREATE, DROP) not replicated; data-only (INSERT, UPDATE, DELETE). After schema change on publisher, subscriber lags. Recovery: (1) Manual sync: On subscriber (Snowflake), apply same DDL: ALTER TABLE orders ADD COLUMN discount DECIMAL. (2) Check subscriber status: SELECT * FROM pg_subscription_rel WHERE srsubid=(SELECT oid FROM pg_subscription WHERE subname='dw_sub'). If relation state='d' (data required), subscriber waiting for data after schema change. (3) Re-sync: Disable subscription: ALTER SUBSCRIPTION dw_sub DISABLE. Re-enable and re-copy: ALTER SUBSCRIPTION dw_sub ENABLE; ALTER SUBSCRIPTION dw_sub REFRESH PUBLICATION (PG14+; re-does initial sync). For older PG: Drop and recreate subscription with copy_data=true. (4) During re-sync, writes may be missed. Prevent: Use synchronous_commit=remote_apply on publisher to ensure subscriber catches up before publisher continues. (5) Alternate: Use Debezium/Kafka for CDC; auto-handles schema evolution (captures DDL in metadata). Best practice: (1) Coordinate schema changes during maintenance window. (2) Apply DDL to subscriber before publisher. (3) Use pg_event_trigger to auto-replicate DDL (custom). (4) Document manual DDL sync procedure. Prevention: Schema versioning; test DDL in dev environment on both publisher and subscriber before prod.

Follow-up: Explain why logical replication doesn't replicate DDL. Design custom trigger to propagate schema changes.

Conflict during logical replication: Publisher inserts row id=123 into orders; subscriber already has id=123 (from external source). Replication halts with conflict. Resolution strategy.

Conflicts occur when subscriber is writable (not read-only). Logical replication assumes subscriber is read-only; writes cause divergence. Root cause: Subscriber (Snowflake) accepts writes from external sources (ETL, app); these conflict with publisher replication. Prevention: (1) Make subscriber read-only (ideal). For Snowflake: Use roles/permissions: GRANT SELECT ON orders TO reporter_role; REVOKE INSERT, UPDATE, DELETE ON orders FROM reporter_role. (2) If subscriber must be writable: Use two-way replication (bidirectional logical replication) with conflict resolution: Not natively supported in PG; requires Debezium + Kafka with conflict handlers. (3) Conflict detection: Use application-level versioning (e.g., last_modified timestamp): INSERT INTO orders (...) VALUES (...) ON CONFLICT (id) DO UPDATE SET ... WHERE orders.last_modified < EXCLUDED.last_modified. (4) Resolution strategies: (a) Publisher wins (discard subscriber writes). (b) Subscriber wins (ignore replication). (c) Last-write-wins (compare timestamps). (d) Custom logic (merge). Recovery from conflict: (1) Manually inspect subscriber: SELECT * FROM orders WHERE id=123. (2) Decide: Keep publisher version or subscriber version. (3) Reconcile: DELETE FROM orders WHERE id=123; -- or UPDATE to match publisher. (4) Restart subscription: ALTER SUBSCRIPTION dw_sub DISABLE; ALTER SUBSCRIPTION dw_sub ENABLE. (5) Verify sync: SELECT COUNT(*) FROM orders ON publisher; SELECT COUNT(*) FROM orders ON subscriber should match. Best practice: Subscriber should be read-only aggregate/cache; never write directly.

Follow-up: Explain two-way replication and conflict resolution strategies. When would you use each?

Replication slot not advancing: publisher WAL accumulates, disk fills. Slot confirmed_flush_lsn is stuck at old value. Subscriber appears connected but no progress. Root cause and fix.

Stuck slot indicates subscriber not sending flush confirmations or replication broke. Check: (1) Slot status: SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn, NOW() - pg_postmaster_start_time() as uptime FROM pg_replication_slots WHERE slot_type='logical'. If active=true but confirmed_flush_lsn old, subscriber is stalled. (2) Subscriber status: On subscriber, check if subscription is enabled: SELECT subname, subenabled FROM pg_subscription WHERE subname='dw_sub'. If subenabled=false, subscription disabled (replication paused). Enable: ALTER SUBSCRIPTION dw_sub ENABLE. (3) Network/connection: psql -h publisher -U replicator -d mydb -c "SELECT * FROM pg_stat_replication" | grep dw_sub. If no row, subscriber disconnected. (4) Subscriber log: Check Snowflake/subscriber logs for errors (schema mismatch, constraint violation). (5) WAL accumulation: Check WAL disk: du -sh /var/lib/postgresql/data/pg_wal/. If > 100GB, start archiving immediately: pg_archivecleanup /mnt/wal_archive/ $(psql -t -c "SELECT pg_walfile_name(restart_lsn) FROM pg_replication_slots WHERE slot_name='dw_slot'"). Fix: (1) If subscriber stalled: Restart subscriber. (2) If schema mismatch: Fix schema on subscriber, restart subscription. (3) If slot broken beyond repair: SELECT pg_drop_replication_slot('dw_slot') and recreate subscription (will re-copy). (4) Prevent: Set max_slot_wal_keep_size: ALTER SYSTEM SET max_slot_wal_keep_size = '10GB'; auto-drops slot if WAL kept > 10GB. Monitor: Alert if confirmed_flush_lsn hasn't advanced in 5min.

Follow-up: Explain replication slot concepts: restart_lsn, confirmed_flush_lsn, inactive slots. What happens if slot drops?

Design CDC (Change Data Capture) pipeline: PostgreSQL publisher -> Kafka (via Debezium) -> Snowflake. Compare to native logical replication. Trade-offs: complexity, latency, fault-tolerance.

Native logical replication vs. Debezium+Kafka: Native Logical Replication: Pros: Simple, low latency (~milliseconds), built-in. Cons: Tightly coupled subscriber, no multi-destination, schema changes manual. Debezium+Kafka: Pros: Decoupled (Kafka buffer), multi-destination (any Kafka consumer), schema registry (auto-DDL), fault-tolerant (Kafka persistence), scalable. Cons: Complex (3+ components), higher latency (Kafka batch), more operational overhead. Architecture (Debezium): (1) PostgreSQL with logical decoding (wal_level=logical). (2) Debezium PostgreSQL connector (Kafka Connect) reads logical WAL, publishes to Kafka topic. (3) Kafka Consumer (e.g., Snowflake Kafka Connector) consumes, writes to Snowflake. Setup: (1) Kafka cluster (Docker or cloud). (2) Kafka Connect with Debezium plugin: docker-compose up. (3) Create Debezium connector: curl -X POST http://localhost:8083/connectors -H "Content-Type: application/json" -d '{"name":"postgres-connector", "config":{"connector.class":"io.debezium.connector.postgresql.PostgresConnector", "database.hostname":"localhost", "database.port":5432, "database.user":"replicator", "database.password":"pwd", "database.dbname":"mydb", "database.server.name":"postgres-prod", "table.include.list":"public.orders,public.customers", "plugin.name":"pgoutput"}}'. (4) Snowflake Kafka Connector consumes from Kafka, writes to Snowflake. (5) Monitor: curl http://localhost:8083/connectors/postgres-connector/status. Trade-offs: (1) Latency: Native ~5ms, Debezium ~50-500ms (Kafka batch). (2) Complexity: Native simple, Debezium requires Kafka ops. (3) Multi-destination: Debezium supports many (Elasticsearch, S3, Snowflake, etc.); native only one subscriber. (4) Schema evolution: Debezium handles DDL automatically (schema registry); native requires manual. Recommendation: Use native logical replication for simple 1-to-1 replication (low ops); use Debezium for complex pipelines, multi-destination, or when fault-tolerance critical (finance, analytics).

Follow-up: Explain Debezium architecture: connector, Kafka, Avro schema registry. How does it handle DDL?