System Design Interview — CDN Architecture and Edge Caching

Your e-commerce platform reports a 3.2-second TTFB for users in Southeast Asia, while us-east-1 users see 140ms. Your origin is in us-east-1. Product catalog updates happen every 10 minutes, and you cannot tolerate stale reads longer than 5 minutes. Current CDN is CloudFront with 50% cache hit ratio. What's your strategy?

Diagnosis: 3.2s is network latency dominated. SE Asia is 8,000+ miles from us-east-1. 5-minute staleness window allows aggressive edge caching. 50% hit ratio suggests either poor TTL tuning or cache-busting patterns.

Solution Architecture: (1) Add Cloudflare or Akamai edge-POP in Singapore/Tokyo for lower latency (~100ms instead of 400ms). (2) Set aggressive TTLs: product static content (images, CSS) = 1 week, product metadata = 5 minutes, checkout flows = 1 minute. (3) Implement cache versioning—append version hash to URLs (product-catalog-v2.json). (4) Use Cache-Control headers with ETag/Last-Modified for conditional requests. (5) Enable Origin Shield (CloudFront feature) to reduce origin load and improve hit ratio. (6) Segment caches: static assets → 30d TTL, product listings → 5m TTL, user-specific → 0s TTL.

Expected Outcome: TTFB drops to 200-300ms for SE Asia. Cache hit ratio climbs to 75-80%. Origin load cut by 40%.

Follow-up: Your product metadata is in DynamoDB. Cache misses at edge currently trigger an origin request that reads DynamoDB + renders JSON (400ms). How would you handle a cache stampede if Origin Shield suddenly expires and 50K concurrent users hit the same product listing?

During Black Friday, you see 2 million requests/sec globally. CloudFront bills jump 40%. Your director asks: why not just use S3 + Lambda@Edge for static content? Your current architecture uses CloudFront → ALB → EC2. What's your answer?

The Trap: S3 + Lambda@Edge sounds cheaper, but it's a false economy. Lambda@Edge charges per 1M requests ($0.60/1M) + data transfer out ($0.085/GB). At 2M req/sec with 50KB average response, that's 100GB/sec = $8.5/sec or $735K/day at peak. CloudFront's regional caches reduce this: 80% hit ratio = 1.6M cached requests (billed at tier pricing, ~$0.01/1M regional) + 400K origin requests (backend EC2).

Real Strategy: (1) Keep CloudFront but upgrade to Shield Advanced ($3K/month flat fee) to defeat DDoS without throttling. (2) Split origins: static assets → S3 CloudFront origin, dynamic content → ALB. (3) Use CloudFront's price class (US/Europe only, not all edges) during non-peak. (4) Implement request coalescing at origin shield—if 1000 concurrent cache misses for same object hit origin shield simultaneously, coalesce to 1 origin request. (5) Cache static assets with long TTLs, version via filenames.

Cost Reality: Optimized CloudFront setup costs $200K-300K for peak. Switching to Lambda@Edge costs $500K+. The origin cost is identical either way.

Follow-up: What if you have 15M+ SKUs and product images are stored in S3. Cold products (rarely viewed) don't get cached. On first request, image fetch takes 800ms. How would you reduce that initial fetch latency without bloating origin costs?

Your media platform serves video streaming to 50M users daily. Bandwidth bill is $2M/month (Netflix-level cost). You're considering bringing in a second CDN (Akamai) for traffic shaping. What's the architecture and trade-offs?

Multi-CDN Strategy: (1) CloudFront handles 60% of traffic (US/EU, better cache hit). Akamai handles 40% (APAC, better latency profile). (2) Use GeoDNS (Route53 with geo-proximity routing) to steer traffic: if user in Japan, send to Akamai edge; if in US, send to CloudFront. (3) Origin setup: both CDNs pull from same S3 bucket, but Akamai gets prioritized S3 VPC endpoint + dedicated bandwidth reservation.

Cost-Benefit: Adding Akamai costs $300K/month but reduces bandwidth bills by $500K/month (they negotiate rates based on volume). Net savings: $200K/month. Trade-off: operational complexity—you now manage two CDN configurations, origin behavior differs slightly between providers, cache eviction policies differ.

Clever Optimization: Use Akamai's "NetSession" adaptive bitrate where users get lower-res video during congestion. CloudFront doesn't have this. Video bitrates drop 20% in Asia during peak hours → -10% bandwidth bill in that region.

Follow-up: Your Akamai and CloudFront caches diverge—some users get outdated manifest files causing playback errors. How would you maintain consistency without a distributed cache consensus protocol?

An API endpoint returns JSON with TTL: 60 seconds. During peak, this endpoint serves 500K req/sec. Staleness is acceptable, but you notice 50% of traffic misses the edge cache after 60 seconds expire. Why? And how do you fix it without changing the TTL?

Root Cause: CloudFront's cache behavior is per-URL, not per-origin-header-value. If your origin returns Cache-Control: max-age=60, CloudFront respects it. But if requests arrive at random times throughout the minute, you get a "cache wave"—requests cluster around the 60-second boundary when objects expire simultaneously. You then get a thundering herd hitting origin within milliseconds.

Solution: (1) Use CloudFront's query string whitelist feature to generate cache keys that don't fragment the cache. (2) Implement "stale-while-revalidate" header: Cache-Control: max-age=60, stale-while-revalidate=300. Serve stale copies for 5 minutes while background refresh happens. (3) Add a cache key based on request arrival time modulo 60 (at edge): if request arrives at :45 seconds, set key to "endpoint-45" so all requests in that 10-second window share the same cache. (4) Use CloudFront's cache behaviors with custom cache policies—separate "fresh" (60s max-age) from "stale" (300s stale-while-revalidate) buckets.

Result: Cache hit ratio climbs from 50% to 88%. Origin load cut by 75%.

Follow-up: You implement stale-while-revalidate. Now your origin receives 200K simultaneous revalidation requests (background refreshes) at the 60-second mark. Origin can't handle this spike (CPU spikes 95%). How do you smooth this out?

You have a SPA (Single Page Application) served by a CDN. HTML file itself is cached (max-age=3600, 1 hour). A critical bug in the app is discovered—you need to roll back immediately. But users with cached HTML won't see the fix for 1 hour. What's your strategy?

The Problem: Long-lived HTML caches are production disasters. Any bug or security fix is blocked by cache TTL. You cannot instantly invalidate 50M cached copies globally.

Solution: (1) Never cache HTML with long TTLs. Set max-age=0 with ETag validation: Cache-Control: max-age=0, must-revalidate. CloudFront still holds the object, but on every request, it validates with origin via conditional request (If-None-Match: etag). If etag matches, origin returns 304 Not Modified in 50ms. (2) Use CloudFront cache invalidation API for emergency pushes: POST to /invalidate with path pattern "/*.html". Invalidates all HTML in ~5 minutes. (3) For instant deployment: use version hash in JS bundle filenames (app-v2.a1b2c3d4.js). New HTML references new bundle URL, so even cached HTML loads new JS. (4) Implement "immutable" caching for versioned assets: js/app-v2.a1b2c3d4.js gets max-age=31536000 (1 year) because filename contains hash.

Result: Bug fix deploys instantly. Users get new HTML (via 304 revalidation) within 100ms. New JavaScript loads immediately due to filename versioning.

Follow-up: Your ETag validation adds 50ms per request to origin. At 500K req/sec for HTML, that's 25M origin requests/day for 304 responses (wasted origin capacity). How would you shift this cost to the edge?

Your origin is behind a WAF (Web Application Firewall) that blocks requests with POST bodies > 10MB. But CDN edge nodes cache GET responses just fine. An attacker notices they can craft a large POST request that reaches origin, triggering a WAF block, then the 403 response gets cached at edge. Legitimate users then see 403 for 5 minutes. What's happening and how do you fix it?

The Attack: This is a cache poisoning attack. Attacker sends POST with 15MB body → WAF blocks it (403) → edge caches the 403 response → all subsequent requests (even legitimate ones) hit the cached 403 for the TTL duration.

Defense Layers: (1) Configure CloudFront to never cache non-2xx responses: set cache behaviors to "Only cache responses with 200, 301, 302, 404, 405, 410, 414, 501 status codes." This prevents 403/500 from being cached. (2) Use Origin Shield with request filtering—if a request looks malformed (huge POST), block it at Origin Shield instead of forwarding to origin. (3) Set error page TTL separate from success TTL: unsuccessful responses get max-age=0 (never cached), while 200 responses get max-age=300. (4) Implement rate limiting at edge: if a user sends >5 requests/sec with abnormal headers, CloudFront drops the request before reaching origin.

Result: Cache poisoning eliminated. WAF blocks don't propagate. Edge shields origin from malicious probing.

Follow-up: What if the attacker sends a legitimate-looking GET request that takes 30 seconds to process (algorithmic complexity attack), and that 200 response gets cached for 5 minutes? Now the 30-second delay propagates to all users hitting that edge.

You're designing a CDN strategy for a gaming company. Players download game assets (textures, models, 2-50GB per client). Download must complete in <15 minutes to avoid player churn. Your origin bandwidth is limited to 10Gbps globally. Players are in 150+ countries. How would you handle this scale?

Challenge: 10Gbps origin can serve ~1,250 simultaneous downloads (8MB/s per download). Peak hours might have 50K concurrent players. Traditional CDN won't work—cost alone ($100K+/month for 50Gbps+ capacity) is prohibitive.

Solution: P2P + CDN Hybrid: (1) Use P2P overlay network (Akamai NetSession or similar) for peer-to-peer sharing. Player in Brazil downloads first 1GB from CDN, then becomes seed. Other players in Brazil download remaining 1GB from peer (free for origin). (2) Implement BitTorrent-like chunking: break asset into 100MB chunks, hash each chunk. Players download chunks in parallel from multiple peers. (3) CDN strategy: keep only "hot" popular assets in edge caches, tier the rest. New game release → all chunks cached globally for first week (3x cost), then aggressive eviction. (4) Origin serves as "minimum viable seed"—if P2P graph is sparse (new regions), origin provides fallback.

Scale Math: 50K players downloading 10GB each with 8 chunks/player: peer-to-peer cuts origin load by 80%. Origin serves only "first request" per region + fallback. Bandwidth cost drops to $15K-20K/month instead of $100K.

Follow-up: Your P2P network has a flawed trust model—players share corrupted chunks (accidentally from disk errors, or deliberately). How would you detect and quarantine corrupted assets without hashing every chunk on every peer?

You operate a news platform with 100M monthly readers. Viral stories spike from 0 to 1M requests/minute in seconds (like "Breaking: Major Incident"). Your origin can handle 10K req/min before collapsing. CDN cache starts empty, so first 100K requests hit origin before cache fills. How do you survive the spike without pre-warming?

The Stampede Problem: Cache stampede happens when a suddenly-popular object has zero cache fills. First request hits origin, origin takes 200ms to render, returns object with max-age=60. Meanwhile 50K requests queue for the same object. If origin can only handle 10K/min, you get request timeouts and 503 errors.

Solution: Graduated Response: (1) Implement "request coalescing" at edge: if 10 simultaneous cache misses hit CloudFront for the same story URL, CloudFront makes only 1 request to origin, then serves the response to all 10 waiting requests. (2) Use "probabilistic early expiration" (TTL prefetch): set max-age=60 but trigger background revalidation at :45 seconds. This spreads revalidation load across 15 seconds instead of hammering origin at :60. (3) Add a load-shedding layer: if origin latency exceeds 500ms, CloudFront serves stale cached copy (from previous story) + a banner "This story is temporarily unavailable due to high demand." (4) For critical stories, use Lambda@Edge to pre-render a lightweight version (headline + link to full story) while full article renders in the background.

Result: Story survives 1M req/min with <1% error rate. Origin stays under capacity.

Follow-up: You implement request coalescing. Now you have a different problem: if the origin request fails (timeout, 500 error), all 10 coalesced requests fail simultaneously. How do you prevent cascading failures?