AWS Interview — S3 Consistency, Replication, and Lifecycle

Your application writes an object to S3, then immediately reads it back in the same request, but gets `NoSuchKey`. 50ms later, the read succeeds. This is S3 standard storage (not S3 Intelligent-Tiering). Walk through S3's consistency model to explain why.

S3 offers strong read-after-write consistency for new object PUTs (since 2020), but the issue is likely: (1) Your application is reading a different key than it wrote, or (2) There's a network issue (partial write), or (3) You're using S3 replication or versioning with eventual consistency on reads. If writing to key `s3://bucket/foo` and immediately reading `s3://bucket/foo`, you should get strong consistency. The 50ms delay suggests eventual consistency, which happens when: (1) Cross-region replication is enabled — primary region writes are immediate, but replicated regions have eventual consistency (~few seconds). (2) You're reading before the replication completes. (3) S3 is failing over to another availability zone during the write, causing brief visibility delay. For same-region, same-key PUT-then-GET: strong consistency is guaranteed. If you're still seeing the issue, debug: (1) Enable S3 access logging — `aws s3api put-bucket-logging --bucket my-bucket --bucket-logging-status file://logging.json`. (2) Check logs for the exact PUT and GET timestamps. (3) Verify you're writing and reading the same key — `aws s3api head-object --bucket my-bucket --key foo`. (4) Check if versioning is enabled — `aws s3api get-bucket-versioning --bucket my-bucket`. If versioning is on and replication is on, consistency is eventually consistent. (5) Disable replication temporarily to confirm: `aws s3api delete-bucket-replication --bucket my-bucket`. If the 50ms delay disappears, replication is the culprit. Most modern S3 usage doesn't hit this due to strong consistency, but high-frequency, low-latency patterns can expose timing windows.

Follow-up: You disable replication and strong consistency is maintained. You re-enable replication to the same region and the issue returns. Why does same-region replication cause eventual consistency?

You have S3 replication enabled from bucket-primary (us-east-1) to bucket-replica (eu-west-1). You write an object to bucket-primary. Seconds later, you list objects in bucket-replica and don't see it. Your application depends on this replica being immediately available. How do you fix this?

S3 replication is eventually consistent, not synchronous. Replication latency is typically seconds but can be minutes under heavy load. If your application depends on immediate replica availability, you have several options: (1) Change your application to read from bucket-primary first, then fall back to bucket-replica. This avoids the race condition. (2) Enable S3 Replication Time Control (RTC) — guarantees 99.99% of replicated objects arrive within 15 minutes. `aws s3api put-bucket-replication --bucket bucket-primary --replication-configuration file://rtc-config.json`. RTC adds cost (~$0.02 per 1,000 replicated objects) but provides SLA. (3) Use S3 Replication Metrics — `aws s3api put-bucket-replication --bucket bucket-primary --replication-configuration file://rto-config.json` with metrics enabled. Then monitor `ReplicationLatency` in CloudWatch and set alarms if replication is falling behind. (4) Pre-populate the replica — if the data is predictable, write to both buckets in your application code (dual-write pattern), but this violates the single-source principle. (5) Use DynamoDB to track replication status — after PUTting to bucket-primary, add an entry to a DynamoDB table. Your read logic checks if the object is replicated by querying DynamoDB before reading bucket-replica. Config example for RTC: `{"Role": "arn:aws:iam::ACCOUNT:role/s3-replication-role", "Rules": [{"Status": "Enabled", "Priority": 1, "Filter": {"Prefix": ""}, "Destination": {"Bucket": "arn:aws:s3:::bucket-replica", "ReplicationTime": {"Status": "Enabled", "Time": {"Minutes": 15}}}}]}`. This approach ensures replicas are available within 15 minutes with an SLA backing it.

Follow-up: You enable RTC, but replication latency varies wildly (1 second to 30 seconds). You've seen objects replicate in 1 second, then later ones take 30 seconds with identical size. What's the cause?

Your S3 bucket has a lifecycle policy that moves objects from Standard to Glacier after 90 days. You archived an object 90 days ago. It was moved to Glacier. Today, you need that object urgently. When can you read it, and how do you restore it?

Objects in Glacier are archived and cannot be read directly — you must restore them first. Restoration is asynchronous and has three tiers of speed: (1) Expedited (1-5 minutes) — $0.03 per GB, (2) Standard (3-5 hours) — $0.01 per GB, (3) Bulk (5-12 hours) — $0.0025 per GB. Steps: (1) Initiate a restore using `aws s3api restore-object --bucket my-bucket --key my-object --restore-request '{"Days": 7, "GlacierJobParameters": {"Tier": "Expedited"}}'`. This requests the object be restored for 7 days (after which it reverts to Glacier). (2) Check restore status with `aws s3api head-object --bucket my-bucket --key my-object | jq '.Restore'`. You'll see `ongoing-request=true` until the restore completes. (3) Once restore is complete (1-5 minutes for Expedited), you can GET the object normally: `aws s3api get-object --bucket my-bucket --key my-object output-file`. (4) The restored copy exists in Standard storage for the specified duration (7 days above), then reverts to Glacier. During this window, you're charged for both Glacier (archival) and Standard storage. Cost consideration: Expedited is expensive. If you can wait, Standard (3-5 hours) costs 1/3 as much and usually completes overnight. Lifecycle best practice: Set expiration or deletion rules to prevent objects languishing in Glacier indefinitely. E.g., delete objects after 1 year in Glacier. This prevents surprise restoration costs. For the lifecycle policy: `aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json` with rules like `{"Transitions": [{"Days": 90, "StorageClass": "GLACIER"}], "Expiration": {"Days": 365}}`.

Follow-up: You restore an object from Glacier using Expedited tier. It's ready for download within 2 minutes. You download it, verify the data, and 1 minute later, your application tries to read it again and gets an error. Why?

You're using S3 Intelligent-Tiering for cost optimization. Objects move through Access (frequent), Infrequent (30 days no access), and Archive (90 days no access) tiers automatically. One object is archived, but you need to read it urgently today. Is it immediately readable, or do you need to restore it like Glacier?

S3 Intelligent-Tiering's Archive tier works like Glacier — objects are archived and must be restored before reading. However, the restore process is different: (1) Intelligent-Tiering Archive tier has automatic restore windows — if you access an archived object, S3 automatically retrieves it within 3 hours (asynchronously), but the access itself fails. (2) You must request expedited restore (12 hours guaranteed) or immediate retrieve (not possible for Intelligent-Tiering Archive tier). For urgent access to an Intelligent-Tiering archived object: (1) Initiate restore with `aws s3api restore-object --bucket my-bucket --key my-object --restore-request '{"Days": 7, "Tier": "Expedited"}'`. Note: Tier values are "Expedited", "Standard", or "Bulk" for Intelligent-Tiering. (2) Wait for restoration (hours). (3) Read the object. Alternatively, use Deep Archive tier (separate from Intelligent-Tiering) for true "rarely accessed" archives. Intelligent-Tiering is best for unpredictable access patterns where you don't know if an object will be accessed frequently or rarely. The system automatically tiers based on actual access. For predictable access patterns (e.g., "quarterly reports" always accessed within 30 days), manually use Standard + Glacier lifecycle to avoid Intelligent-Tiering overhead. Intelligent-Tiering costs $0.0125 per 1,000 objects per month for monitoring, plus storage, so for large catalogs of unpredictable data, it's cost-effective. For stable workloads, explicit transitions are cheaper. Example: `aws s3api put-bucket-intelligent-tiering-configuration --bucket my-bucket --id my-config --intelligent-tiering-configuration '{"Id": "config1", "Filter": {"Prefix": ""}, "Status": "Enabled", "Tierings": [{"Days": 90, "AccessTier": "ARCHIVE_ACCESS"}]}'`.

Follow-up: You configure Intelligent-Tiering with Archive tier. You monitor an object and notice it moved to Archive (no access for 90 days) as expected. You then access it, and a few seconds later, S3 begins to retrieve it in the background. While retrieval is in-progress, what happens if you try to read the object again?

Your S3 bucket has versioning enabled. You have 1000 versions of the same object stored. You want to clean up old versions to save storage costs. You set up a lifecycle policy to delete noncurrent versions after 30 days, but the versions aren't being deleted. What's wrong?

Lifecycle policies for noncurrent versions only work if versioning is enabled. However, the rule must be configured correctly to target noncurrent versions. Debug: (1) Verify versioning is enabled: `aws s3api get-bucket-versioning --bucket my-bucket`. You should see `Status: Enabled`. (2) Check the lifecycle policy with `aws s3api get-bucket-lifecycle-configuration --bucket my-bucket`. Look for a rule with `NoncurrentVersionTransitions` or `NoncurrentVersionExpiration`. If these are missing, the policy isn't targeting noncurrent versions. (3) Correct configuration example: `{"Rules": [{"Id": "delete-old-versions", "Status": "Enabled", "NoncurrentVersionExpiration": {"NoncurrentDays": 30}}]}`. This deletes noncurrent versions (previous versions) 30 days after being superseded. (4) Apply the policy: `aws s3api put-bucket-lifecycle-configuration --bucket my-bucket --lifecycle-configuration file://lifecycle.json`. (5) Note: Lifecycle evaluation runs once per day at midnight UTC. It can take up to 24 hours for a rule to take effect. If you just created the rule, wait 24 hours before checking. (6) Check if the bucket has a MFA Delete protection — if MFA Delete is enabled, you can't delete versions without MFA credentials, even via lifecycle policy. `aws s3api get-bucket-versioning --bucket my-bucket | grep MFADelete`. If MFA Delete is on, disable it: `aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled,MFADeleteStatus=Disabled` (requires root account or IAM with mfa-delete permission). After fixing the policy and waiting 24 hours, old versions should delete automatically.

Follow-up: You confirm versioning is enabled, the lifecycle policy is correct with NoncurrentVersionExpiration set to 30 days, and MFA Delete is off. Still no deletions after 48 hours. You manually list versions with `aws s3api list-object-versions --bucket my-bucket --key my-object` and see 1000 versions, oldest from 60 days ago. What's preventing deletion?

You replicate objects from bucket-source (us-east-1) to bucket-destination (us-west-2) using S3 replication. The source bucket has versioning and a lifecycle policy that deletes old versions. After replication is set up, old versions in the source are deleted by the lifecycle policy. Are corresponding versions deleted from the destination?

No. S3 replication replicates objects and their versions, but lifecycle policies that delete objects or versions are not replicated. Here's the behavior: (1) When you PUT a new version to bucket-source, replication copies it to bucket-destination. (2) When lifecycle policy deletes an old version from bucket-source, that deletion is NOT replicated to bucket-destination. The old version remains in bucket-destination. This can lead to destination bucket being larger than source over time (more version history). To handle this: (1) Apply the same lifecycle policy to bucket-destination independently. This way, both buckets delete old versions on the same schedule. (2) Alternatively, use "Replica Modification Sync" (if available in your region) to replicate deletion events. This ensures deletions in source are mirrored to destination. To configure: `aws s3api put-bucket-replication --bucket bucket-source --replication-configuration file://config.json` with `"DeleteMarkerReplication": {"Status": "Enabled"}`. However, this only replicates deletion markers (when you DELETE a key), not lifecycle-driven version deletions. (3) For true parity, manually apply lifecycle to bucket-destination: `aws s3api put-bucket-lifecycle-configuration --bucket bucket-destination --lifecycle-configuration file://same-lifecycle.json`. This is the most reliable approach. Important caveat: If you're using replication for disaster recovery, ensure the destination bucket's lifecycle policy is intentionally different (e.g., delete slower or not at all) to retain backups longer than the source. This adds cost but prevents accidental permanent data loss if source is corrupted.

Follow-up: You apply the same lifecycle policy to both source and destination buckets. After 30 days, versions are deleted from both. But you discover that bucket-destination still has 50 GB more data than bucket-source. Why?

You're implementing a cost optimization strategy: store object metadata in DynamoDB and the actual data in S3 Standard for 30 days, then transition to Glacier for cold storage. You transition an object to Glacier, but when you query DynamoDB for that object's metadata, your application tries to read it from S3 before checking if it's in Glacier. This fails and causes application errors. How do you fix the architecture?

The issue is that your application doesn't account for Glacier objects being unavailable without restoration. Fix: (1) Store object storage class in DynamoDB metadata — add a field like `storage_class: "GLACIER"` when you transition the object. (2) Before reading from S3, check the storage_class: if "GLACIER", initiate a restore and return a "pending" status to the user, or queue it for restoration. (3) Alternatively, proactively detect transitions — use S3 lifecycle event notifications: (a) Enable S3 EventBridge notifications for lifecycle transitions. (b) When an object transitions to Glacier, an event is sent to EventBridge. (c) Your Lambda/SQS consumer updates the DynamoDB record with storage_class: "GLACIER". (d) Your read path checks this before attempting a GET. (4) For urgent retrievals, use expedited restore (1-5 minutes) and cache the restored object in S3 Standard temporarily. (5) For predictable "cold" objects, consider using S3 Intelligent-Tiering instead of manual transitions — it handles these cases automatically. Implementation example: (1) Create S3 lifecycle policy to transition to Glacier after 30 days. (2) Enable EventBridge for S3 object transitions: `aws s3api put-bucket-notification-configuration --bucket my-bucket --notification-configuration file://eventbridge-config.json`. (3) In DynamoDB record, add storage_class field: `{"pk": "object-123", "storage_class": "STANDARD", "created": "2024-01-01"}`. After 30 days, DynamoDB record is updated: `{"storage_class": "GLACIER"}`. (4) In your read handler: if storage_class is GLACIER, initiate restore and return "restoring" status; otherwise, GET from S3. This separates concerns and makes your application resilient to storage class transitions.

Follow-up: You implement the architecture above with DynamoDB tracking storage_class. But your EventBridge rule doesn't update DynamoDB reliably — sometimes the update is delayed or missed. What's a more robust solution?