AWS Interview — DynamoDB Partition Keys and Hot Key Patterns

Your DynamoDB table is throttling with `ProvisionedThroughputExceededException` despite having 10,000 Write Capacity Units (WCU) provisioned. CloudWatch metrics show WCU consumption at 200/10000 on average, but spikes every 10 seconds hit throttle limits. You have a single partition key: `userId`. Explain the root cause and fix.

The issue is a hot partition key. DynamoDB distributes traffic across partitions based on the partition key. If all requests are for a small set of userIds (e.g., 5 users out of millions), those requests map to the same partition, which has its own throughput limit (typically 1000 WCU per partition, or 3000 RCU per partition). Even though you provisioned 10,000 WCU globally, individual partitions have limits. If partition 1 serves 5 hot users and they consume 5000 writes/second, but the partition's limit is 1000 WCU (1000 writes/sec), requests are throttled. Meanwhile, partitions serving other users are underutilized. To diagnose: (1) Enable CloudWatch metrics for DynamoDB at partition level — this is not built-in, so use CloudTrail to log requests and analyze by partition key. (2) Check DynamoDB Contributor Insights — `aws dynamodb put-contributor-insights --table-name my-table --contributor-insights-action ENABLE`. This shows hot partition keys. (3) Fix strategies: (a) Add a sort key or secondary attribute to distribute writes — instead of just userId, use `userId#timestamp` or `userId#requestId` to create multiple items per user, spreading writes across many partition keys. However, this complicates queries. (b) Use DynamoDB Streams + Lambda to write to multiple tables with different partition keys, or use DAX (DynamoDB Accelerator) to cache hot data. (c) Implement write sharding — before the partition key, append a random number: `userId#shard-0`, `userId#shard-1`, ..., `userId#shard-99`. This distributes 100x traffic across partition keys, spreading load. On reads, query all shards (100 queries) and aggregate results. (d) For known hot keys, use provisioned throughput in "pay-per-request" mode (on-demand), but this is more expensive. Most practical: Option (c) write sharding. Implement in your application: prepend a random shard (0-99) to the partition key when writing, and on reads, query all 100 shards. Python example: `pk = f"userId#{user_id}#shard-{random.randint(0, 99)}"`. This spreads a hot user's 5000 WCU across 100 partitions (50 WCU each), well below partition limits.

Follow-up: You implement write sharding (userId#shard-0 to shard-99). Throttling stops. But queries are now 100x slower (you must query all 100 shards). Users complain. How do you fix query performance?

You have a DynamoDB table with `companyId` as partition key and `timestamp` as sort key. You store telemetry events. You want to query all events for a company in the past hour. Scans are slow (returning 100K+ items, but you only need 1K). What's the optimal query strategy?

The issue is using a generic Scan when you should use a Query with a sort key condition. The difference: (1) Scan — reads all items in the table, applies filters, returns matching results. If table has 1M items and you only want 1K, Scan reads all 1M (expensive). (2) Query — uses the partition key to find the partition, then uses the sort key to find items within that partition. Much faster. For your use case: `aws dynamodb query --table-name telemetry --key-condition-expression "companyId = :cid AND #ts BETWEEN :start AND :end" --expression-attribute-names '{"#ts": "timestamp"}' --expression-attribute-values '{":cid": {"S": "company-123"}, ":start": {"N": "1700000000"}, ":end": {"N": "1700003600"}' --limit 1000`. This queries the partition for companyId, then filters by timestamp range, and stops at 1000 items, consuming only 1000 RCU instead of 100K. To optimize further: (1) Use a Global Secondary Index (GSI) if you need different query patterns — e.g., if you also query by userId, create a GSI with `userId` as partition key. (2) Add a TTL attribute to automatically delete old events — DynamoDB automatically removes items after expiration, reducing table size. `aws dynamodb update-time-to-live --table-name telemetry --time-to-live-specification AttributeName=expirationTime,Enabled=true`. (3) Use DynamoDB Streams to export old events to S3 for long-term storage, keeping the table small for frequent queries. (4) Partition by time — instead of just companyId, use a composite key like `companyId#date` (e.g., "company-123#2024-11-20"). This creates separate partitions per day, speeding up recent queries and allowing efficient archival of old partitions. For telemetry/timeseries data, the composite key approach (companyId#date) is most common in production.

Follow-up: You use composite key companyId#date for partitions. Queries for single-day data are fast. But when querying across multiple days (7 days of data), you must query 7 separate partitions. Is there a way to query across days without 7 separate Query calls?

You have a DynamoDB table storing user sessions with `sessionId` as partition key. Sessions are read frequently but expire after 24 hours. You enabled TTL to auto-delete expired sessions. However, you're still seeing expired sessions being returned in queries. Why, and how do you fix it?

DynamoDB TTL is eventual — items are deleted in the background, typically within 48 hours of expiration. Queries can still return expired items because: (1) TTL deletion is asynchronous — a background process scans the table and deletes expired items, but it's not immediate. (2) Between expiration and deletion, the item is still readable. To handle this: (1) On read, check the TTL attribute in your application and discard if expired — `if item.expirationTime <= current_time: skip_item`. (2) Use DynamoDB Streams to capture deleted items and sync deletions to a cache (e.g., ElastiCache) if you need faster consistency. (3) For critical use cases, store the expiration check in your query filter: `aws dynamodb query --table-name sessions --filter-expression "#exp > :now" --expression-attribute-names '{"#exp": "expirationTime"}' --expression-attribute-values '{":now": {"N": "1700000000"}}'`. This filters expired items server-side before returning. However, it consumes RCU for all scanned items, even expired ones, so it's not efficient. (4) Combine: TTL for eventual cleanup + application-side filtering for immediate consistency. This way, expired items are eventually cleaned up (saving storage), but applications never see them. Implementation: (1) Set TTL on table: `aws dynamodb update-time-to-live --table-name sessions --time-to-live-specification AttributeName=expirationTime,Enabled=true`. (2) In application, after reading: `if int(item.get("expirationTime", 0)) <= time.time(): skip_to_next_item`. (3) For a session cache, prefer ElastiCache with natural TTL (TTL is strict in Redis) over DynamoDB for session storage. Most production setups use Redis for sessions (strict TTL) and DynamoDB for long-term data (eventual TTL).

Follow-up: You implement application-side filtering for expired sessions. Queries are correct now, but you're doing an extra check on every item. For a table with 1M sessions and 10K queries/second, this adds latency. How do you optimize?

Your DynamoDB table uses `customerId` as partition key and `orderId` as sort key. You want to find all orders for a customer where the order total is > $1000. A Query returns all orders for the customer, then you filter in application code. This Query consumes 10,000 RCU even though only 100 matching orders exist. How do you reduce RCU consumption?

The issue is filtering after the query. DynamoDB Query returns all items matching the partition/sort key condition, and you pay RCU for all of them, even if you filter most away in the application. To reduce RCU: (1) Use a FilterExpression on the Query to filter server-side — `aws dynamodb query --table-name orders --key-condition-expression "customerId = :cid" --filter-expression "orderTotal > :amt" --expression-attribute-values '{":cid": {"S": "cust-123"}, ":amt": {"N": "1000"}}'`. This still reads all items, but DynamoDB does the filtering before returning, reducing network I/O. However, RCU is still consumed for all items scanned, not just returned items. (2) Create a Global Secondary Index (GSI) on `customerId` + `orderTotal` if you frequently query by customer and amount. However, DynamoDB doesn't support range queries on non-key attributes natively in GSI. Workaround: Create a GSI with `customerId` as partition key, `orderTotal` as sort key: `aws dynamodb create-global-secondary-index --table-name orders --index-name OrdersByAmount --key-schema '{AttributeName: customerId, KeyType: HASH}, {AttributeName: orderTotal, KeyType: RANGE}' --projection ProjectionType=ALL --provisioned-throughput ReadCapacityUnits=1000,WriteCapacityUnits=1000`. Then query the GSI: `aws dynamodb query --table-name orders --index-name OrdersByAmount --key-condition-expression "customerId = :cid AND orderTotal > :amt" --expression-attribute-values '{":cid": {"S": "cust-123"}, ":amt": {"N": "1000"}}'`. This scans only items matching both conditions. (3) Re-architect the table to store high-value orders in a separate table or partition — e.g., `customerHiValueOrders` with high-value orders only. This avoids scanning all orders. (4) Use DynamoDB Streams + Lambda to maintain a cache in ElastiCache — pre-compute and cache results for frequent queries. For most cases, option (2) GSI with sort key on `orderTotal` is the right approach. It requires additional write throughput provisioned to the GSI, but query efficiency is much better.

Follow-up: You create a GSI with orderTotal as sort key. Queries are now efficient. But when you update an order's total (a frequently occurring event), writes to the GSI are slow. Why?

Your DynamoDB table uses on-demand (pay-per-request) billing. Traffic is bursty — most hours you have 100 requests/second, but during flash sales, 10,000 requests/second for 2 minutes. Average daily spend is $200, but on flash sale days, spend spikes to $5000. You want predictable costs. Should you switch to provisioned throughput?

On-demand is convenient but expensive for bursty workloads because you pay for peak usage. Let's do the math: (1) On-demand pricing: $1.25 per million WCU, $0.25 per million RCU (approximate). (2) Provisioned pricing: $0.00013 per WCU-hour, $0.000026 per RCU-hour. For a bursty workload: (1) On-demand — you pay for all 10,000 WCU during the flash sale, even if average is 100 WCU/sec. Daily cost example: 100 WCU * 86400 sec = 8.64B WCU * $1.25 / 1M = $10.8 baseline, plus spike of 10,000 WCU * 120 sec = 1.2B WCU * $1.25 / 1M = $1.5. Total ~$12 per day. (2) Provisioned — you'd provision for peak (10,000 WCU) to handle flash sales: 10,000 WCU * 24 hours * $0.00013 = $31.2 per day. Provisioned is more expensive for this pattern. However, if the spike is short-lived and predictable, consider: (1) Keep on-demand for baseline + use auto-scaling during off-peak to temporarily reduce provisioned throughput to 0, then scale up during flash sales. (2) Use DynamoDB Auto Scaling — automatically adjusts provisioned throughput based on CloudWatch metrics. `aws autoscaling register-scalable-target --service-namespace dynamodb --resource-id table/my-table/index/GSI1 --scalable-dimension dynamodb:table:WriteCapacityUnits --min-capacity 100 --max-capacity 40000`. (3) Use Lambda + SQS to buffer writes — incoming requests queue in SQS, Lambda consumers write to DynamoDB at a controlled rate. This smooths bursty traffic. (4) For flash sales, pre-scale provisioned throughput minutes before, then scale down after: use a Lambda scheduled event (CloudWatch Events) to scale up at known flash sale times. Best approach for your scenario: Keep on-demand, but architect for cost optimization: (1) Use write sharding or batch writes to reduce individual item throughput. (2) Aggregate writes — instead of writing every event immediately, queue and batch write every 100ms. (3) Use DAX to cache hot data, reducing writes. (4) Consider switching to provisioned if flash sales are truly predictable and occur regularly; the $31/day cost might be acceptable for predictability.

Follow-up: You implement auto-scaling with provisioned throughput (100-40,000 WCU). During flash sales, auto-scaling triggers and scales up to 40,000 WCU. But requests are still throttled. Why isn't auto-scaling preventing throttling?

You store user preference data in DynamoDB with `userId` as partition key. Preferences are small (< 1 KB). Users update preferences frequently (10 updates/second per active user). You have 100,000 active users. Cost is high (~$5000/month). How do you reduce costs while maintaining performance?

Cost is high because each small write (< 1 KB) is billed as 1 WCU. With 100K users * 10 updates/sec = 1M writes/sec, you need 1M WCU/sec = 86.4B WCU/day. At $0.00013 per WCU-hour, that's ~$280/day or $8400/month. Cost reduction strategies: (1) Batch writes — instead of updating preferences immediately, accumulate updates in memory and batch write every 100ms. E.g., queue 1000 updates, then use BatchWriteItem to write 25 items per request (DynamoDB batch limit). This reduces API calls 40x. `aws dynamodb batch-write-item --request-items '{"my-table": [{"PutRequest": {...}}, ...]}'`. Batching reduces RCU/WCU consumption per logical operation. (2) Use write sharding within a batch — partition updates by shard and batch per shard, spreading load across partitions. (3) Switch to on-demand billing if batch latency is acceptable (e.g., 100ms delay for preference updates is acceptable for most UX). On-demand might be cheaper for bursty patterns or more expensive for constant load; compare with provisioned. (4) Cache in application memory or ElastiCache — load preferences on user login, cache for the session, and write back on logout or periodically. This reduces write frequency to 1-2 writes per session (minutes) instead of 10 updates/sec. (5) Use a separate fast write layer — write preferences to a high-speed log (e.g., Kinesis) with Lambda consumers batching to DynamoDB asynchronously. Users read from cache, eventual consistency is acceptable. (6) Combine — batch updates + ElastiCache + async writes. Most effective: Option 4 (cache preferences). If preferences are accessed frequently, cache them in session or Redis. Write back periodically (every 5-10 min) or on session end. This reduces DynamoDB writes from 10 updates/sec to 1 write per session (maybe 10 min = 1-2 WCU/sec for 100K users). Cost drops to ~$50/month from $8400/month. Trade-off: eventual consistency (if a user logs in another session, cached preferences might be stale). For user preferences, eventual consistency is typically acceptable.

Follow-up: You cache preferences in Redis and batch writes to DynamoDB every 10 minutes. Cost drops to $50/month. But you notice users in one session don't see updates made in another session for up to 10 minutes. Users complain about the staleness. How do you improve consistency without going back to expensive real-time writes?

You're designing a DynamoDB table for a recommendation engine. You need to store user-item interactions (userId, itemId, score, timestamp). You want to query: "Get top 10 items for a user" and "Get top 10 users who interacted with an item". With a single table design, partition key userId and sort key itemId, the first query is efficient but the second requires scanning all partitions. How do you optimize both queries?

The single-table design with (userId, itemId) allows efficient queries for "get top items for user" but not "get top users for item". To support both query patterns: (1) Create a Global Secondary Index (GSI) with reversed keys — partition key itemId, sort key userId. `aws dynamodb create-global-secondary-index --table-name interactions --index-name ByItem --key-schema '{AttributeName: itemId, KeyType: HASH}, {AttributeName: userId, KeyType: RANGE}' --projection ProjectionType=ALL --provisioned-throughput ReadCapacityUnits=1000,WriteCapacityUnits=1000`. (2) Query the main table for "top items per user": `aws dynamodb query --table-name interactions --key-condition-expression "userId = :uid" --scan-index-forward false --limit 10`. (3) Query the GSI for "top users per item": `aws dynamodb query --table-name interactions --index-name ByItem --key-condition-expression "itemId = :iid" --scan-index-forward false --limit 10`. (4) To include score in queries and sort by score, add a composite sort key: instead of just itemId, use `itemId#score` (e.g., "item-123#95.5"). However, DynamoDB doesn't support custom sort order; sort keys are sorted lexicographically. To sort by score descending, either: (a) Store negative score (`-score`) to reverse sort order, (b) Use application-side sorting after retrieval, (c) Add score as an attribute and filter/sort in application code. (5) For top-N queries by score across users/items, consider pre-computing aggregates — use Lambda + DynamoDB Streams to maintain a separate table of top-10-items and top-10-users, updated in real-time. This trades real-time accuracy for query efficiency. (6) For production recommendation engines, use specialized services like Amazon Personalize (handles recommendations) or Elasticsearch (fast sorting and filtering) instead of DynamoDB. If staying with DynamoDB, the GSI with reversed keys is standard. Accept that "top-N by score" requires application sorting or a separate pre-computed index.

Follow-up: You create the GSI and query both patterns efficiently. But during a surge in recommendations (traffic spikes to 10K QPS), the GSI provisioned throughput is exhausted first (you scaled it to 2000 RCU), while the main table still has capacity. Why is the GSI hitting limit first?