Location queries slow: "Find all restaurants within 5km of lat/lon". Currently, app fetches all restaurants, calculates distance in Python. 50k restaurants = 50k calcs. Evaluate PostGIS: setup, indexes, query performance.
PostGIS provides spatial data types and functions. Setup: (1) Install: CREATE EXTENSION postgis; CREATE EXTENSION postgis_topology. (2) Alter table to store geometry: ALTER TABLE restaurants ADD COLUMN location GEOGRAPHY(Point, 4326). GEOGRAPHY uses latitude/longitude (4326 = WGS 84). (3) Populate: UPDATE restaurants SET location = ST_GeogFromText('POINT(' || lon || ' ' || lat || ')'). (4) Create spatial index: CREATE INDEX idx_restaurants_geom ON restaurants USING GIST(location) (GiST tree for geometry). (5) Query: SELECT name, location FROM restaurants WHERE ST_DWithin(location, ST_GeogFromText('POINT(-122.4 37.8)'), 5000) ORDER BY ST_Distance(location, ST_GeogFromText('POINT(-122.4 37.8)')) (returns restaurants within 5km, sorted by distance). Performance: GiST index makes query O(log N) instead of O(N). Benchmark: EXPLAIN (ANALYZE) SELECT ... WHERE ST_DWithin(..., 5000); should show "Index Scan" with low cost. Typical: <50ms for 50k restaurants vs. 500ms app-side. Tuning: (1) Adjust GIST fillfactor: CREATE INDEX ... USING GIST(location) WITH (fillfactor=70). (2) Use BRIN for very large datasets (millions): CREATE INDEX ... USING BRIN(location) (less accurate but faster). (3) Denormalize for hot queries: SELECT ST_X(location) as lon, ST_Y(location) as lat to extract coords in query. Alternatives: Google Maps API or app-side geo-hashing (slower, flexible). Best practice: PostGIS for location queries if tight latency SLA.
Follow-up: Explain GEOGRAPHY vs. GEOMETRY in PostGIS. When use each?
Time-series data (metrics): 100k metrics/sec (CPU, memory, temperature) from 10k sensors. Vanilla PostgreSQL storage is 10TB/month. TimescaleDB reduces to 1TB/month (90% compression). Implement hypertable, evaluate compression.
TimescaleDB: Extension for time-series optimization. Setup: (1) Install: CREATE EXTENSION timescaledb. (2) Create hypertable (partitioned by time): CREATE TABLE metrics (time TIMESTAMPTZ, sensor_id INTEGER, temperature FLOAT, PRIMARY KEY (time, sensor_id)) PARTITION BY RANGE (time) WITH (OIDS=FALSE). Then: SELECT create_hypertable('metrics', 'time', if_not_exists => TRUE). (3) Enable compression: ALTER TABLE metrics SET (timescaledb.compress = ON, timescaledb.compress_chunkinterval = '24 hours'); SELECT compress_chunk(i) from show_chunks('metrics'). (4) Ingest: INSERT INTO metrics VALUES (now(), 123, 45.5), (now(), 456, 46.2), ... (bulk insert). (5) Query: SELECT * FROM metrics WHERE time >= NOW() - INTERVAL '1 day' AND sensor_id=123 ORDER BY time DESC LIMIT 10 (fast; hypertable prunes chunks). Performance: Vanilla PG: Sequential scan 10TB = ~minutes. TimescaleDB with compression: Scan 1TB = ~seconds. Compression trades CPU for storage (on-the-fly decompression). Tuning: (1) Chunk interval: Default 1 week; for 100k/sec, tune to 1 day: timescaledb.compress_chunkinterval = '24 hours'. (2) Compression algorithm: Default zstd; tune: ALTER TABLE metrics SET (timescaledb.compress, timescaledb.compress_codec = 'lz4'). (3) Compression delay: Only compress old chunks (not recent): SELECT add_compress_chunks_policy('metrics', INTERVAL '7 days'). (4) Retention: Auto-drop old chunks: SELECT add_retention_policy('metrics', INTERVAL '30 days') (drop >30 days old). Monitoring: SELECT * FROM timescaledb_information.hypertables; SELECT * FROM timescaledb_information.chunks to view hypertable/chunk stats. Staff note: TimescaleDB essential for high-cardinality time-series; vanilla PG struggles >100k rows/sec.
Follow-up: Explain hypertables and chunks in TimescaleDB. How does time-based partitioning improve performance?
Database grown to 10TB. Single node max out. Citus enables horizontal scaling: distribute across 5 nodes. Design architecture: coordinator, workers, sharding key, distributed transactions.
Citus: Extension for distributed PostgreSQL. Architecture: 1 coordinator (routes queries), N workers (store shards). Setup: (1) Install Citus on all nodes: CREATE EXTENSION citus. (2) On coordinator, add workers: SELECT * from citus_add_node('worker-1', 5432); SELECT * from citus_add_node('worker-2', 5432) etc. (3) Create distributed table: SELECT create_distributed_table('orders', 'customer_id') (shard by customer_id). Coordinator distributes rows across workers based on hash(customer_id). (4) Query: SELECT * FROM orders WHERE customer_id=123 routes to single worker (fast). SELECT COUNT(*) FROM orders fans out to all workers, aggregates (slower). (5) Transactions: Single-shard txns (shard key filtered) are 1-phase commit (fast). Multi-shard txns (no filter or multiple shards) use 2-phase commit (slower, risky). Design around single-shard txns. Sharding strategy: (1) Shard by customer_id for multi-tenant (isolates customer data per shard). (2) Shard by region for geographic (locality). (3) Avoid shard skew (one shard >10x data of others). Rebalancing: Use rebalance_table_shards() to move shards between workers. Downside: Requires re-key hash calc (downtime). Alternatives: Distributed joins (slow), reference tables (replicated on all workers). Trade-offs: (1) Horizontal scale-out (10x capacity). (2) Complex ops (5+ nodes). (3) Single-point-failure coordinator (use HA setup). (4) Distributed transactions slow (2-phase). Recommendation: Use Citus only if >10TB single node and scaling queries are simple (mostly single-shard or aggregations). For complex joins across shards, vanilla PG + read replicas may be better.
Follow-up: Explain single-shard vs. multi-shard txns in Citus. Why are multi-shard slow?
Mixed workload: Transactional (orders, payments - small, consistent throughput) + Analytical (reports, aggregations - large, batch). One DB not optimal. Design separate OLTP/OLAP with data pipeline.
OLTP (Online Transaction Processing): Fast, small queries; requires high concurrency and consistency. OLAP (Online Analytical Processing): Batch, large scans; tolerates latency, requires aggregation. Architecture: (1) OLTP cluster (primary + standby + read replicas): For orders, payments, customer data. Small machines, lots of RAM for connection pool. (2) OLAP cluster (standalone or distributed): For analytics queries (Citus, TimescaleDB, or separate warehouse like Snowflake). (3) Data pipeline: ETL or CDC from OLTP -> OLAP. Setup: (1) OLTP: Standard PG setup with replication + PgBouncer connection pool. (2) OLAP: Separate cluster or Snowflake/BigQuery; periodic COPY from OLTP (nightly or via CDC). (3) ETL: Use dbt/Airflow/Fivetran to transform and load. (4) Reference data: Small tables (products, customers) replicated to both; larger tables (orders) only in OLAP. Queries: (1) Transactional: SELECT * FROM orders WHERE id=123; INSERT INTO orders ... on OLTP cluster. (2) Analytical: SELECT DATE(order_date), SUM(amount) FROM orders GROUP BY 1 on OLAP (or Snowflake). Tuning: (1) OLTP: Optimize for point lookups (indexes, small result sets). (2) OLAP: Optimize for full scans (partitions, compression, columnar storage). (3) Replication lag: OLAP may lag OLTP by minutes/hours; acceptable for analytics. (4) Query routing: App routes OLTP queries to OLTP cluster, analytics queries to OLAP. Trade-offs: (1) Complexity (2 clusters). (2) Data consistency (OLAP lags). (3) Cost (2x infrastructure). Benefit: Both workloads perform optimally. Recommended architecture for mature products.
Follow-up: Explain ETL vs. CDC for OLTP->OLAP pipeline. Latency and consistency trade-offs?
PostGIS queries become slow as dataset grows (1B geometries). Spatial indexes bloat. Design optimization: partitioning, denormalization, or alternative (Elasticsearch geo, H3 indexing).
1B geometries strain PostGIS: GiST indexes bloat, queries slow. Optimization: (1) Spatial partitioning: Divide geography (e.g., by country, region) into separate tables. Queries specify region first: SELECT * FROM restaurants_usa WHERE ST_DWithin(...) UNION SELECT * FROM restaurants_ca WHERE ST_DWithin(...). App routes based on query bbox. (2) Denormalize: Store pre-computed grid cell (H3 or S2): ALTER TABLE restaurants ADD COLUMN h3_cell VARCHAR; UPDATE restaurants SET h3_cell = h3_latlng_to_cell(lat, lon, 10). Query: SELECT * FROM restaurants WHERE h3_cell IN (h3_get_disk_rings(query_cell, 1)) AND ST_DWithin(...) (filter by grid cell first, then precise distance). Faster: Grid filter reduces scan from 1B to ~100k. (3) Separate geo service: Use Elasticsearch or custom geo index (Solr, Meilisearch) for spatial queries; store document ID only. PG stores business data (name, rating); geo service handles "restaurants near me". (4) Tile-based: Pre-compute and cache results by tile (e.g., zoom level 12; ~1M tiles globally). Query returns cached tile. (5) Approximate: Use BRIN on lat/lon separately (not proper spatial index but faster): CREATE INDEX idx_lat_brin ON restaurants USING BRIN(lat); CREATE INDEX idx_lon_brin ON restaurants USING BRIN(lon). Filter by lat/lon ranges, then distance check. Trade-offs: (1) Partitioning: Adds complexity (multi-table logic). (2) H3/grid: Adds denormalization maintenance. (3) Separate geo: Operational overhead (geo service ops). (4) Approximate: Less accurate (all within bbox, not strict distance). Recommended: H3 grid + PostGIS for point lookups; for global massive scale, Elasticsearch geo shards by region.
Follow-up: Explain H3 indexing and grid cells. Advantages over pure spatial indexes?
TimescaleDB compression caused queries to slow (decompression overhead). Disable compression for hot data (recent days). Design retention policy: hot (uncompressed, quick), warm (compressed), cold (archived/S3).
3-tier retention strategy: Hot (recent, fast), Warm (compressed, medium), Cold (archived, slow). Setup: (1) Hot tier (0-7 days): Uncompressed hypertable chunks in PostgreSQL. SELECT add_retention_policy('metrics', INTERVAL '30 days') (deletes >30 days; recent 7 are hot). (2) Warm tier (7-30 days): Compressed. SELECT add_compress_chunks_policy('metrics', INTERVAL '7 days') (compress >7 days old). Queries on warm still work (transparent decompression) but slower. (3) Cold tier (>30 days): Archive to S3 (expensive storage). Use timescaledb_compression_code = 'citus_copy' with S3 backend (or pg_partman extension for manual archive). Query: (1) Hot: SELECT * FROM metrics WHERE time >= NOW() - INTERVAL '1 day' (scans uncompressed chunks; fast). (2) Warm: SELECT * FROM metrics WHERE time >= NOW() - INTERVAL '14 days' (decompresses chunks; slower but cached). (3) Cold: Archive query requires restore from S3; time range query like SELECT * FROM metrics WHERE time >= '2026-02-01' requires manual S3 download. Tuning: (1) Adjust hot/warm boundaries based on query patterns. If many queries on 7-14 day range, increase hot tier. (2) Compression codec: zstd (good ratio, slower); lz4 (fast, worse ratio). Choose based on CPU vs. I/O trade-off. (3) Monitor: SELECT * FROM timescaledb_information.chunks shows is_compressed status. Automated: SELECT compress_chunk(i) FROM show_chunks('metrics', OLDER_THAN => INTERVAL '7 days') via cron. Downside: Complex (3 tiers), requires monitoring. Upside: 10-100x cost savings for time-series (scale-out vs. scale-up).
Follow-up: Explain TimescaleDB S3 backend (cold storage). Cost and latency trade-offs?