System Design Interview — API Gateway and Service Mesh Patterns

You have 50 microservices across US, EU, AP regions. API Gateway must authenticate requests (OAuth), rate limit (100K RPS peak), and route to the right service. Your current setup: AWS API Gateway + Lambda authorizers. During a traffic spike last week, authorizer lambda cold-started (200ms overhead × 10K concurrent requests = 2s of queuing). P99 latency spiked to 5s. What's a better architecture?

Lambda authorizers are not suitable for high-throughput auth (cold starts at scale, per-request invocation tax). Your fix: (1) Move auth to stateless gateway service (nginx/Envoy/Kong). Use JWT validation (local, no network call): gateway parses JWT token, validates signature using public key (cached in memory, TTL 1 hour). Latency: <1ms (no network call). (2) Replace Lambda + API Gateway with service mesh gateway (Istio ingress gateway or AWS ALB with auth module). This gives you: auth (JWT validation), rate limiting (token bucket per IP/user), request routing, request logging, all in one layer. Avoid Lambda for every request. (3) Rate limiting: use distributed rate limiter (Redis cluster). Each gateway instance increments counter in Redis: `rate_limit_key:user_id → atomic_increment`. Atomic guarantee via Redis Lua script: `INCR user:123:minute < 100000 ? allow : deny`. Cost: API Gateway + Lambda = $3.5K/month (at 10K req/min). Nginx ingress gateway = $1K/month (self-hosted) or $2K/month (managed). Redis for rate limiting = $300/month. Total: ~$2.3K/month (30% cheaper). Architecture: (a) Clients send requests to ALB (regional). (b) ALB → Nginx instances (stateless, scales horizontally). (c) Nginx: validate JWT (local), check rate limit (Redis call), route to service or return 429 if rate-limited. (d) Service-specific auth (OAuth, API keys) handled by services if needed (defense in depth). Expected result: P99 latency <100ms (auth check <1ms + rate limit check <5ms + service call). Cold start problems eliminated.

Follow-up: Your JWT key rotation policy requires updating every 7 days. Nginx instances cache keys for 1 hour. During rotation, how do you prevent rejecting valid JWTs with old key IDs?

You're implementing canary deployments for your microservices. Your Payments service v2 has a bug that only manifests under load (race condition). You set up traffic splitting: 5% to v2, 95% to v1. But you only caught the bug after 2 hours of canary (affected 0.05% of transactions, ~1500 payments). Your SLA is <0.01% error rate. How do you set up automated canary failure detection?

Your problem: latency to detect canary failure was too long (2 hours). You need automated, real-time canary validation. Setup: (1) Error rate detection: canary validation rule triggers if error rate on v2 > 2× baseline (v1 error rate is 0.001%, so v2 threshold is 0.002%). Metrics pulled every 10 seconds from Prometheus. If threshold crossed, auto-rollback v2 immediately. (2) Latency detection: if P99 latency on v2 > P99 on v1 by >50ms, auto-rollback. (3) Business metric detection: for Payments, track "payment_success_rate". If it drops >1% during canary, rollback. (4) Manual override: on-call engineer can rollback with one button. Automation: Istio or AWS Flagger orchestrates canary. Configuration example: 5% traffic to v2, metrics every 10s, rollback threshold: error_rate > 2x or latency_p99 > 150ms. Expected result: canary failure detected in <1 minute, rollback automatic. For your bug: detected at 5 minutes (300 payments affected), vs 2 hours (1500 affected). 5x improvement. Implementation: (a) Prometheus scrapes service mesh (Istio Envoy proxies export metrics). (b) Flagger controller watches Prometheus, makes canary decisions. (c) Istio updates traffic weights: v1 100% → v1 95%, v2 5% → (if error detected) → v1 100% immediately. Cost: Prometheus + Flagger = $500/month + engineering time (2 weeks to set up). Trade: requires investing in observability (if your service mesh doesn't export metrics, this fails). Prerequisite: structured logging (every payment logs success/failure with latency). If logs are unstructured, manual detection is hard.

Follow-up: Your v2 canary fails, but v2 already processed 100 payments in-flight. You roll back. How do you ensure those 100 in-flight v2 requests finish cleanly without double-charging?

Your API Gateway is the single entry point for 50 services. You want to implement circuit breaker pattern: if a service (Orders) is degraded (P99 latency >2s), temporarily route traffic to a fallback (cached response or degraded UI). Currently, no circuit breaker, so a slow Orders service backs up the entire API Gateway. How do you implement this?

Circuit breaker pattern has 3 states: Closed (normal), Open (failing), Half-Open (testing recovery). Implementation: (1) Service mesh (Istio or Linkerd) handles circuit breaking natively. Configure: consecutive_errors=5, timeout=2s. If Orders service returns 5 errors in a row or times out after 2s, Open the circuit. For 30 seconds, route all Orders requests to fallback. Then Half-Open: send one test request. If it succeeds, close circuit. If it fails, reopen. (2) For non-service-mesh, use client library (Hystrix, Resilience4j) in API Gateway. Code: when calling Orders service, wrap in circuit breaker: `CircuitBreaker cb = CircuitBreaker(timeout=2s, failure_threshold=5). cb.execute(() → orders_service.get_order(order_id)). onSuccess(response) → return response. onFailure() → return fallback(order_id)`. Fallback strategy depends on use case: (a) For read requests (get order): return cached response (Redis, 10-minute TTL). (b) For write requests (create order): fail fast with 503 Service Unavailable (tell client to retry). (c) For non-critical reads (recommendations): return empty list (degrade gracefully). (3) Metrics: monitor circuit breaker state per service. Alert if any circuit is Open for >5 minutes (indicates real outage, not transient). Cost: service mesh = $5K/month operational overhead (Istio is complex). Client libraries = free but require code changes in gateway. Recommendation: service mesh if you're operating 50+ services (you'll need it for observability anyway). Standalone: if <15 services, use client library. Expected outcome: Orders service degradation doesn't propagate to entire platform. Users see cached order data (2-minute stale) instead of 30-second timeout. Availability up to 99.95% (from 99.9%).

Follow-up: Your fallback cache (Redis) fails during Circuit Breaker fallback. Now you've lost both real-time Orders data and cache. What's your recovery strategy?

You need to route API requests based on request content (not just hostname). For example: `GET /api/recommendations` with header `X-AB-Test: v2` routes to Recommendations v2 service, but `X-AB-Test: v1` routes to v1 service. Your current API Gateway (AWS API Gateway + Lambda) can't do content-based routing efficiently (all traffic goes through same Lambda). What's a better approach?

Content-based routing at API Gateway is better handled by dedicated reverse proxy (nginx, Envoy, Kong) than by Lambda. Lambda is stateless, per-request compute, expensive for simple routing logic. Better architecture: (1) Deploy nginx (or Envoy) as ingress gateway. Configure routing rules: `if header X-AB-Test == v2 then route to recommendations-v2:8080, else route to recommendations-v1:8080`. This is O(1) lookup in nginx (regex matching), sub-millisecond latency. (2) Use nginx map module for efficient header routing: `map $http_x_ab_test $backend { v1 recommendations-v1:8080; v2 recommendations-v2:8080; default recommendations-v1:8080; }`. Then `proxy_pass $backend;`. (3) For more complex routing (multiple headers, regex patterns), use service mesh (Istio VirtualService): `match: [headers: {X-AB-Test: {exact: v2}}] → recommendations-v2`. Cost: nginx self-hosted = $0/month (OSS). Managed (Kong Cloud) = $500/month. Istio = $2K/month. AWS API Gateway Lambda = $3.5K/month. Recommendation: nginx if you need simple routing (A/B testing, header-based). Istio if you need advanced routing + observability across 50 services. Expected latency: nginx routing <1ms. Lambda routing ~50-100ms (cold start + handler execution). Scale: nginx can handle 10K req/sec per instance (linear scaling). If you need 100K req/sec, deploy 10 nginx instances behind load balancer. Caveat: nginx route changes require deployment (update config, reload). For frequent A/B test changes, add dynamic routing: nginx subscribes to changes from config service (every 10 seconds poll for updates). Zero-downtime config reload.

Follow-up: Your A/B test configuration lives in a database. Nginx needs to query it for every request to determine routing. This adds latency. How do you optimize?

You have a GraphQL API gateway that normalizes requests from 30 backend services into a unified schema. During peak traffic (10K concurrent users, 50K RPS of GraphQL queries), your gateway processes queries like `query { user { orders { items { product { name } } } } }`. This recursively fetches data from 5 services, causing cascading calls and high latency. How do you optimize this?

This is the N+1 query problem at scale. Your nested GraphQL query triggers 1 user lookup + 5 order lookups + 20 item lookups + 20 product lookups = 46 backend calls for a single GraphQL query. Solution: (1) DataLoader pattern: batch and deduplicate requests. User dataloader collects all user IDs needing fetch, sends one batch RPC to user service instead of 46 individual calls. Dataloader internally batches within a "request context" (single GraphQL query). Result: 46 calls → ~3-5 batched calls per service. (2) Implement depth limiting: reject queries deeper than 5 levels (`query { user { orders { items { product { variants { options } } } } } }`). Deep queries often expose N+1 problems. (3) Cache aggressively at gateway: user profile (cache 1 hour), product catalog (cache 6 hours), orders (cache 5 minutes). This reduces backend calls by 70%. (4) Query timeout: set gateway timeout to 500ms. If query takes longer, return partial response (null for unfetched fields). Trade: clients get incomplete data but gate way stays healthy. (5) Monitor slow queries (P99 >500ms), add to "slow query log", retroactively optimize with batching. Cost: DataLoader library = free (code addition, ~2 days work). Cache layer (Redis) = $300/month. Monitoring = $200/month. Expected result: P99 latency 50K RPS drop from 2s → 200ms (10x improvement). Alternative (if you can't use DataLoader): graphql-js libraries have built-in support. For example, Apollo Server's DataLoader integration is standard. Implementation: `const userLoader = new DataLoader(batchLoadUsers); const userFieldResolver = () => userLoader.load(user_id);`. DataLoader deduplicates and batches automatically.

Follow-up: A client requests 50K items in a single GraphQL query (depth=2). DataLoader batches 50K calls into 1 backend request. Backend crashes due to overload. How do you protect yourself?

Your API Gateway serves both public API (external customers) and internal API (internal tools). Public API has SLA of 99.9%, internal API is best-effort. But they share the same gateway infrastructure. A spike in internal tool usage once consumed all gateway resources, causing public API to degrade. How do you isolate them?

Resource isolation between public and internal API requires separate infrastructure or dedicated resource pools. Option 1 (simplest): Separate gateways. Public API Gateway (AWS API Gateway in dedicated VPC) + Internal API Gateway (internal nginx, in private subnet). Each has its own autoscaling group, database connections, caches. Public API can scale independently. Cost: 2 gateways, but isolation is worth it. Option 2 (medium complexity): Single gateway with resource quotas. Kubernetes QoS (Quality of Service) classes: mark internal API requests with lower priority (BestEffort class), public API requests with higher priority (Guaranteed class). Kubernetes scheduler ensures Guaranteed pods get resources first. If cluster is under resource pressure, evict BestEffort pods (internal API), keep Guaranteed pods (public API) running. Downside: requires orchestration (Kubernetes), operational complexity. Option 3 (advanced): Shared gateway with weighted rate limits. Rate limit internal API at 5K req/sec, public API at 50K req/sec. Internal API requests queue behind public. This ensures public API always has bandwidth. Implementation: use distributed rate limiter (Redis) with weighted tokens: public_api_token_bucket_size=50K/sec, internal_api_token_bucket_size=5K/sec. Each request increments its bucket. Cost: Redis = $300/month. Trade: internal API sees higher latency during public API spikes (expected, acceptable). Recommendation: Option 1 (separate gateways) if you have capital. It's cleanest separation. Option 2 (K8s) if you're already using K8s. Option 3 (weighted rate limiting) if you want lowest cost and can accept internal degradation. For your case (public SLA critical), I'd recommend Option 1: separate public and internal gateways. Public gateway = $2K/month (managed service), internal gateway = $500/month. Total = $2.5K/month. Split cost is worth insurance against internal tools taking down public API.

Follow-up: Your internal and public gateways need to share user authentication logic. If you replicate the auth code, you risk divergence. How do you share auth without tight coupling?

You're running a multinational SaaS platform. You have API Gateway in US (primary), EU (secondary). During US outage, you want to failover to EU gateway automatically. But your clients have hardcoded US endpoint in DNS. How do you route them to EU without requiring client changes?

This requires global load balancing + health checks. Solution: (1) Route 53 (AWS) or Cloudflare: create failover policy. Primary: US API endpoint (health check every 5 seconds). If unhealthy, failover to EU endpoint. (2) Health check: simple—HTTP GET to `/health` endpoint on US gateway. If returns 500 or times out, mark US as unhealthy. Threshold: 3 consecutive failures = fail. (3) DNS TTL: set low (30 seconds). When US fails and DNS switches to EU, clients re-resolve within 30s and get EU IP. (4) Data consistency: US and EU gateways must have replicated data (especially for stateful services like sessions). Use active-active replication (every write to US also syncs to EU in <1 second). Or use distributed session store (Redis cluster spanning US-EU with replication). Cost: Route 53 = $0.50 per million queries (~$20/month). Health checks = $0.50 per health check × 2 checks (US, EU) = $0.40/month. Replication (RTO) = depends on your DB (DynamoDB multi-region ~$2K/month extra). Expected failover time: 30 seconds (2-3 health check failures + DNS TTL). Caveat: not true zero-downtime. Some in-flight requests from US clients will fail if US is completely down. But new requests within 30s start routing to EU. For critical systems (payments), also implement client-side retry logic: if US fails, client retries with EU endpoint (manual or automatic). Alternative (zero-downtime, more complex): GeoDNS. Route 53 geolocation routing: US clients → US endpoint, EU clients → EU endpoint. Requires client location awareness (IP geolocation). On US outage, all traffic reroutes to nearest healthy endpoint (requires failover policy + geolocation). More expensive but better UX (users don't experience DNS TTL delay).

Follow-up: Your EU gateway receives 10K RPS from US failover traffic, but EU gateway is only provisioned for 2K RPS. It gets overwhelmed and returns 502 errors. How do you handle traffic surge from failover?

You're designing a cost-optimized API Gateway for a price-sensitive startup. You have 20 microservices, 5K RPS peak traffic, and limited budget ($500/month). Your current AWS API Gateway costs $3.5K/month. You need auth, rate limiting, request logging, and request routing. How do you redesign to reduce cost while maintaining feature parity?

AWS API Gateway is expensive at scale ($0.35 per million API calls). For cost optimization, move to self-hosted nginx/Envoy on EC2/Kubernetes. Architecture: (1) Deploy nginx on 2 EC2 t3.medium instances ($30/month each = $60/month total). Auto-scaling group based on CPU/memory (scale 2-4 instances during peak). (2) Configure nginx: (a) Authentication: JWT validation (local, no network call, <1ms). Use auth0 or similar for token issuance. (b) Rate limiting: use nginx limit_req module (token bucket, local state). Define rate per upstream service (500 req/sec for user-service, 1000 req/sec for product-service). (c) Request logging: nginx logs all requests to file, ship to CloudWatch ($1/month via Datadog or self-hosted ELK). (d) Routing: nginx location blocks per service. `location /api/users { proxy_pass http://users-service:8080; }`. (e) Canary deployments: use nginx map variable + weight routing. Send 10% traffic to new version, 90% to old. (3) Cost breakdown: 2 EC2 instances ($60/month) + DNS ($0.50/month) + load balancer ($16/month) + data transfer ($20/month) = ~$96.50/month. Total: under $500/month budget. (4) Trade-off: nginx requires ops knowledge (configuration, debugging). AWS API Gateway is managed (updates, scaling automatic). For a startup, self-hosted is acceptable if you have ops capacity. (5) Fallback: use AWS API Gateway free tier (1M API calls free per month) + nginx on EC2 for non-critical APIs. Expected result: cost down to $100-200/month (vs $3.5K), feature parity maintained. Scale to 100K RPS with same cost (add more nginx instances, auto-scale). Downside: operational overhead (monitoring nginx health, debugging config issues), but saves $3.3K/month (worth it for startups).

Follow-up: Your nginx instance crashes during peak traffic. There's only 1 instance alive (1 of 2), and it's overwhelmed (CPU >90%). New requests get 502 errors. Your auto-scaler is too slow (takes 2 minutes to spin new instance). How do you recover faster?