PostgreSQL Interview — TOAST and Large Object Storage

Table with 10KB average row size: Scan takes 5x longer than expected. TOAST (Out-of-line storage) suspected. Diagnose: What's being toasted? How to measure and optimize?

TOAST: PostgreSQL stores large values (>~2KB) out-of-line on disk, replaces in-tuple with pointer. Slow scans indicate excessive TOAST access (decompression, seeking). Diagnose: (1) Check table schema: SELECT attname, attlen FROM pg_attribute WHERE attrelid='table_name'::regclass AND attlen > 0. attlen=-1 = variable-length (might TOAST). (2) Estimate TOAST size: SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size, pg_size_pretty((SELECT pg_total_relation_size(reltoastrelid) FROM pg_class WHERE relname=tablename)) as toast_size FROM pg_tables WHERE tablename LIKE 'my_table'. If toast_size significant (>heap_size/10), TOAST is heavy. (3) Find heavy columns: SELECT attname, (SELECT count(*) FROM (SELECT (each((hstore(t)))).key FROM table_name t) AS x WHERE x.key=attname) as col_freq FROM pg_attribute WHERE attrelid='table_name'::regclass (heuristic). Better: Measure directly: SELECT SUM(LENGTH(col1)), SUM(LENGTH(col2)) FROM table_name to find biggest columns. Root causes: (1) Text/BYTEA fields too large: ALTER TABLE products ADD COLUMN description TEXT with 100KB descriptions per row. (2) JSON/JSONB with large docs: ALTER TABLE events ADD COLUMN metadata JSONB storing 50KB metadata. Fix: (1) Reduce column size: Truncate or compress: UPDATE products SET description = LEFT(description, 5000) (keep first 5K). (2) Externalize: Move to separate table: CREATE TABLE product_descriptions (product_id INT, description TEXT); ALTER TABLE products DROP COLUMN description. Query via JOIN when needed. (3) Compression: Enable TOAST compression: ALTER TABLE products ALTER COLUMN description SET STORAGE EXTENDED (EXTENDED compresses before storing). (4) Use EXTERNAL storage: ALTER TABLE products ALTER COLUMN description SET STORAGE EXTERNAL (stores uncompressed; faster for rarely-accessed). Measurement: EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM table_name LIMIT 100. Compare: With TOAST decompression: 1s. Without (narrower columns or JOIN): 100ms. Speedup often 5-10x for TOAST-heavy tables.

Follow-up: Explain STORAGE strategies: PLAIN, EXTENDED, EXTERNAL, MAIN. When use each?

TOAST compression enabled but queries still slow: Decompression overhead on every scan. Design strategy: compress once, cache decompressed, or denormalize to avoid TOAST entirely.

TOAST compression trades CPU for storage: Every access decompresses. For hot data accessed frequently, decompression cost exceeds benefit. Solutions: (1) Disable compression for hot columns: ALTER TABLE products ALTER COLUMN description SET STORAGE EXTERNAL (uncompressed storage, fast access). Keep compression for cold (rarely-accessed) data. (2) Denormalize to avoid TOAST: Instead of storing 100KB description in products table, store only summary (100 bytes): ALTER TABLE products ADD COLUMN description_summary TEXT; UPDATE products SET description_summary = LEFT(description, 100). Move full description to separate table accessed only when needed: CREATE TABLE product_full_descriptions (product_id INT PRIMARY KEY, description TEXT); INSERT INTO product_full_descriptions SELECT id, description FROM products. Query: SELECT p.* FROM products p WHERE p.description_summary LIKE '%word%' (fast, no TOAST). For full text: SELECT f.description FROM product_full_descriptions f WHERE f.product_id=id (fetch only if needed). (3) Caching layer: Cache decompressed values in application (Redis): value = redis.get('product_description:' + id) or db.query('SELECT description FROM products WHERE id=' + id). (4) Use LO (Large Objects) instead of TOAST: Large Objects are separate storage with explicit fetch: CREATE TABLE products (id SERIAL, lo_id OID); INSERT INTO products VALUES (1, lo_create(...)); SELECT lo_get(lo_id) FROM products WHERE id=1. LO avoids inline TOAST; slower but only fetch when needed. (5) Archive cold data: Move old/rarely-accessed rows to separate table: CREATE TABLE products_archive LIKE products; INSERT INTO products_archive SELECT * FROM products WHERE last_accessed < NOW() - INTERVAL '1 year'; DELETE FROM products WHERE last_accessed < NOW() - INTERVAL '1 year'. Hot table stays small/fast. Trade-off: Denormalization adds complexity (JOIN queries, sync issues). Recommend: For analytics (infrequent access), compress. For OLTP (frequent access), denormalize or use EXTERNAL storage.

Follow-up: Explain Large Objects (LO) vs. TOAST. When use each?

TOAST table corrupted (or accidentally deleted): SELECT * FROM table_name returns error "cannot find TOAST relation". Recovery and prevention.

Corrupted TOAST indicates: (1) Accidental DROP: DROP TABLE table_name_toast (never do this). (2) Filesystem corruption (rare). (3) pg_upgrade bug (rare). Recovery: (1) If TOAST just dropped: PITR (restore from backup) is only reliable fix. If backup >hours old, data loss. (2) Attempt recovery (risky): Restore TOAST schema: ALTER TABLE table_name SET WITH OIDS (re-enable OID, may trigger TOAST creation, but this doesn't guarantee recovery). Better: (a) Dump table (minus TOAST data): pg_dump -t table_name --data-only mydb | sed '/^-- TOASTs/,/^$/d' > table_data.sql (rough; may lose data). (b) Create new table: CREATE TABLE table_name_new (...); COPY table_name_new FROM 'table_data.sql'. (3) PITR if backup available: Restore to point before corruption: recovery_target_time = '2026-04-07 10:00:00' in recovery.conf. (4) Forensics: Check pg_class and pg_attribute for TOAST references: SELECT relname, reltoastrelid FROM pg_class WHERE reltoastrelid > 0 AND reltoastrelid NOT IN (SELECT oid FROM pg_class). Shows orphaned TOAST. Prevention: (1) Never DROP TOAST tables manually. PostgreSQL auto-manages. (2) Backup/PITR: Test recovery quarterly. (3) Filesystem monitoring: Alert on unexpected file deletions. (4) Access control: Restrict ALTER TABLE / DROP to DBA role only. (5) Use read-only replicas for sensitive data; corruption doesn't spread. Recommendation: TOAST corruption is severe; focus on backup/PITR strategy. Test recovery plan annually.

Follow-up: Explain TOAST relation structure. How does PostgreSQL manage TOAST internally?

Performance regression after upgrade: Table with TOAST now slower. Root cause: TOAST COMPRESSION disabled (or compression defaults changed). Investigate and fix.

Post-upgrade TOAST compression defaults may change. PG versions differ: PG12: EXTENDED (compress). PG14: EXTENDED still default. Check: (1) Upgrade notes: Review release notes for TOAST/compression changes. (2) Verify STORAGE: SELECT attname, attstorage FROM pg_attribute WHERE attrelid='table_name'::regclass AND attstorage != 'p'. 'p'=PLAIN, 'e'=EXTENDED (compress), 'x'=EXTERNAL, 'm'=MAIN. (3) If changed to EXTERNAL (no compression), re-enable: ALTER TABLE table_name ALTER COLUMN col SET STORAGE EXTENDED; VACUUM FULL table_name. (4) Measure before/after: SELECT pg_size_pretty(pg_total_relation_size('table_name')). Should shrink after VACUUM FULL (compression applied). Root cause: (1) Explicit ALTER during upgrade: ALTER TABLE ... SET STORAGE EXTERNAL accidentally applied. (2) Extension version mismatch: TOAST config stored in extension (rare). Fix: (1) Re-apply compression: ALTER TABLE table_name ALTER COLUMN col SET STORAGE EXTENDED. (2) Rebuild: VACUUM FULL table_name (rewrites entire table, applies new storage). (3) Monitor improvement: Compare scan times before/after VACUUM. (4) Rebuild all TOAST columns: FOR col IN (SELECT attname FROM pg_attribute WHERE attrelid='table_name'::regclass AND attstorage='x') DO ALTER TABLE table_name ALTER COLUMN col SET STORAGE EXTENDED; END. (5) One-off VACUUM FULL, then ongoing ANALYZE for stats refresh. Prevention: (1) Test on staging after upgrade. (2) Compare STORAGE settings pre/post: SELECT attname, attstorage FROM pg_attribute ... (before); (after). (3) Document TOAST config in runbook. (4) Use PG version-specific defaults in config file: ALTER TABLE ... SET STORAGE EXTENDED explicitly per version. Recommendation: After major version upgrade, validate TOAST compression config for large-data tables. Performance regression common if overlooked.

Follow-up: Explain VACUUM FULL's impact on TOAST. Why required after STORAGE change?

Large Object (LO) usage: 100GB database, 90GB is LO blobs (document images, PDFs). Query performance and storage optimization strategy.

Large Objects (LOs): PostgreSQL's file storage for large unstructured data (images, PDFs, videos). Alternative to TOAST (inline storage). Diagnosis: (1) List LOs: SELECT loid, octet_length FROM pg_largeobject GROUP BY loid HAVING COUNT(*) > 100. (2) Total LO size: SELECT pg_size_pretty(SUM(octet_length)) FROM pg_largeobject. (3) Orphaned LOs: SELECT loid FROM pg_largeobject GROUP BY loid EXCEPT SELECT loid FROM table_name WHERE lo_column_type = 'lo'. Delete: SELECT lo_unlink(loid). Optimization: (1) Move LOs to external storage (S3, GCS): (a) Store LO_ID in DB, upload blob to cloud. (b) Query: Fetch LO_ID, retrieve from S3. (c) Benefit: DB smaller, cloud storage cheaper. (2) Compress LOs: Use gzip in application: compressed_data = gzip.compress(blob); lo_id = lo_create(compressed_data). On retrieval: blob = gzip.decompress(lo_get(lo_id)). (3) Prune old LOs: Archive images >1 year old to cold storage (S3 Glacier): SELECT loid, last_access FROM pg_largeobject_stat WHERE last_access < NOW() - INTERVAL '1 year' (if stats enabled). (4) Enable LO access logging: ALTER SYSTEM SET log_statement = 'all'. Track LO queries; optimize hot LOs. (5) Vacuum LOs: VACUUM ANALYZE pg_largeobject. Removes dead tuples. Monitoring: (1) Alert if LO directory size > threshold. (2) Track orphaned LO count: SELECT COUNT(*) FROM pg_largeobject WHERE loid NOT IN (SELECT lo_col FROM table_name). (3) Performance: LO queries slow if many large objects scanned. Use indexes if possible. Recommendation: For >10GB LOs, move to external storage (S3) + keep reference in DB. Reduces DB size, improves backup speed, enables CDN delivery.

Follow-up: Design architecture for storing large files: PostgreSQL LOs vs. external storage (S3) vs. object database. Trade-offs?