Your organization uses CloudFormation for infrastructure. 500+ AWS resources in 50 templates. You want to migrate to Terraform. Manual refactoring would take months. Design efficient migration strategy.
Use tools + manual migration phased approach: 1) Tool-assisted generation: use `CloudFormation to Terraform` converters (AWS has resources, but limited. Third-party: `cloudformation-to-terraform` scripts). They generate boilerplate HCL. Quality varies - expect 60-80% automation, 20-40% manual refinement. 2) Phased migration: don't convert all 50 templates at once. Start with 5 non-critical templates. Pilot migration. 3) For each template: 1a) Use converter to generate .tf files. 1b) Review generated HCL for quality (naming, structure). 1c) Refactor into modules. 1d) Test: `terraform plan` should match CloudFormation state. 1e) Migrate state: use `terraform import` for each resource. 1f) Validate: full deploy in staging environment. 1g) Remove CloudFormation template after validation. 4) Parallelization: 5 teams each migrate 10 templates simultaneously. 5) Timeline: estimate 2 weeks per team * 5 teams = 10 weeks total. 6) Validation: after each team's migration, run full state validation: actual resources match. 7) Rollback: keep CloudFormation templates 4 weeks after migration (quick rollback if needed). 8) Documentation: capture lessons learned. Document which templates migrated successfully, which had issues.
Follow-up: How would you handle CloudFormation drift during migration?
CloudFormation template uses DependsOn extensively. Terraform imports each resource independently. After import, Terraform doesn't know about CF dependencies. Missing dependency causes incorrect apply order. How do you preserve/reconstruct dependencies?
Reconstruct dependencies in Terraform: 1) Analyze CF template: identify all DependsOn declarations. Create dependency matrix: ResourceA depends on ResourceB. 2) In Terraform: add explicit dependencies: `depends_on = [aws_resource.previous]`. Terraform respects this during apply. 3) Prefer implicit dependencies: if Terraform can infer dependency from reference (resource A references B's ID), explicit depends_on not needed. 4) Example CF: `DependsOn: ["SecurityGroup"]`. Example TF: `vpc_id = aws_security_group.main.vpc_id` (implicit dep). 5) For complex chains: map CF dependencies to TF module structure. CF template = one module. External dependencies become module inputs. 6) Test dependency order: `terraform graph | dot -Tsvg > graph.svg`. Visualize dependency chain. Verify correct. 7) Validate: `terraform plan -refresh=false` should show resources created in correct order. 8) Compare with CF: original CloudFormation create order vs Terraform. Should be similar (Terraform may parallelize some). 9) Document: migration notes should list reconstructed dependencies for future maintainers.
Follow-up: How would you test dependency ordering without full deployment?
You're migrating from Pulumi (Python) to Terraform. Pulumi code is procedural (loops, conditionals, functions). Terraform is declarative. A Pulumi app creating 100 resources with dynamic logic becomes thousands of lines of TF code. How do you handle this?
Map Pulumi procedural logic to Terraform declarations: 1) Pulumi loop: `for i in range(10): ec2.Instance(f"server-{i}", ...)` becomes `for_each = { for i in range(10) : "server-${i}" => {...} }` in TF. 2) Pulumi conditionals: `if environment == "prod": create_replica = true` becomes `count = var.environment == "prod" ? 1 : 0` in TF. 3) Pulumi functions: `def create_resource(config): ...` becomes TF module. Encapsulate logic. 4) Pulumi dynamic outputs: Pulumi can compute outputs mid-apply. TF must compute before apply (locals, data sources). 5) State management: Pulumi has different state backend architecture. Migrate state: extract Pulumi state, convert to Terraform state format, import via `terraform state push`. 6) Large code: if Pulumi code is 5000 lines, TF might be 10,000 lines (more verbose). Accept this. Consider modularizing. 7) Testing: Pulumi uses pytest. Terraform uses Terratest. Rewrite tests. 8) Resource comparison: map Pulumi resource types to Terraform resources. Some Pulumi resources may not have direct TF equivalent. 9) Timeline: expect 1.5-2x complexity increase. Allow extra time. 10) Pilot: migrate one app first. Learn lessons. Apply to rest.
Follow-up: How would you handle Pulumi code that creates resources non-deterministically?
During CloudFormation to Terraform migration, you discover 30 resources in CF that were created manually and added to stacks later (not original CF definition). Migrating these requires import. How do you identify and import orphaned resources?
Identify and import orphaned resources: 1) Audit drift: `aws cloudformation describe-stacks --stack-name prod | jq '.Stacks[0].StackResources'` shows CF-managed resources. 2) Query AWS directly: `aws ec2 describe-instances | jq '.Reservations[].Instances[].InstanceId'` shows all instances. 3) Diff: compare CF resources vs AWS resources. Manually-added instances in AWS but not in CF. 4) For each orphaned resource: decide: a) Delete it (not needed)? b) Import into Terraform? c) Leave in AWS (not managed by IaC)? 5) For import: create resource block in TF: `resource "aws_instance" "orphaned_1" { }`. 6) Import: `terraform import aws_instance.orphaned_1 i-12345`. 7) Verify: `terraform state show aws_instance.orphaned_1` confirms import. 8) Add to git: commit resource definition. 9) Document: comment why resource was orphaned, when added, if should have been in CF originally. 10) Prevention: CF stack policy that prevents manual additions: `"Principal": "*" "Effect": "Deny" "Action": "Update:Modify" "Resource": "LogicalResourceId/..."`. Prevents drift.
Follow-up: How would you detect orphaned resources automatically?
CloudFormation migration in progress. Old CF stacks still managing some resources. New Terraform managing others. Temporary hybrid setup. Two tools try to manage same VPC. Conflicts arise. How do you safely transition?
Gradual hybrid migration with clear boundaries: 1) Split responsibility: CF manages layer 1 (networking: VPC, subnets), Terraform manages layer 2 (compute: EC2, RDS). Clear boundary. 2) Terraform queries CF outputs: use data source to read CF stack outputs: `data "aws_cloudformation_stack" "network" { name = "prod-network" }`. Terraform gets VPC ID from CF. 3) Update CF template: add Outputs section if missing: `Output VpcId = { Value: !Ref VPC }`. 4) In Terraform: `vpc_id = data.aws_cloudformation_stack.network.outputs["VpcId"]`. 5) Phase transition: CF manages network for 4 weeks. During this time, Terraform manages compute + data layers independently. 6) Validation: Terraform plan must show all compute resources. CF template must show all network resources. No overlap. 7) Gradual migration: after 4 weeks, if stable, migrate CF network layer to Terraform. Import CF resources. 8) Final state: everything in Terraform. Delete CF stacks. 9) Rollback: if Terraform network migration fails, revert and keep hybrid setup longer. 10) Communication: team must understand temporary hybrid state. Not permanent. Goal is full TF.
Follow-up: How would you prevent race conditions between CF and Terraform apply?
Post-migration to Terraform, your organization wants to prevent regression: make sure CF (deprecated) doesn't get accidentally modified. CF templates are in git but should be read-only. How do you enforce this?
Enforce migration to Terraform and prevent CF regression: 1) Git branch protection: CF template files in `cloudformation/` directory are read-only. GitHub branch protection rule: no one can modify CF files on main branch. 2) PR checks: if PR modifies CF file, CI rejects: `if git diff --name-only | grep cloudformation/; then exit 1; fi`. 3) Deprecation warnings: add comments to CF templates: `# DEPRECATED: This stack has been migrated to Terraform. See terraform/prod/ for current source of truth`. 4) Archive: move CF templates to `deprecated/cloudformation/` to signal they're archived. 5) DNS/documentation: update runbooks. Point to Terraform docs instead of CF. 6) Team training: show team how to make infrastructure changes via Terraform, not CF. Discourage CF usage. 7) Audit: monitor CF stack updates. CloudTrail alert if anyone modifies CF stacks. Review weekly. 8) Automated cleanup: after 12 weeks post-migration, delete old CF stacks from AWS (after validating TF replacements working). 9) Capability lock: for 6 months, disable CF updates via IAM policy (only allow read). After 6 months, fully delete permissions. 10) Incident review: if someone accidentally modified CF, discuss why. Improve process.
Follow-up: How would you handle emergency CF updates if they're needed during deprecation period?
CloudFormation stack update fails mid-way due to API error. Stack enters UPDATE_ROLLBACK_FAILED state. Resources partially updated. Migrating this stack to Terraform is now risky. How do you handle?
Recover failed CF stack before migration: 1) Investigate failure: CloudFormation events show which resource failed. Review CloudTrail for API error. 2) Manual fix: if resource creation failed, create manually in AWS. Retry CF update: `aws cloudformation update-stack --stack-name prod ...`. 3) Continue rollback: if update hopeless, force rollback: `aws cloudformation continue-update-rollback --stack-name prod`. Rolls back all changes from failed update. 4) Cleanup: manually delete stuck resources if necessary. 5) Validate state: compare CF template vs AWS resources. Are they consistent? 6) Once stable: migrate to Terraform. Import resources. 7) Alternative: if CF stack too broken, start fresh: delete CF stack (after backing up state), create via Terraform. Use state import for existing resources. 8) Prevention: CF drift detection (AWS Config) alerts on stack divergence. Monitor regularly. 9) Post-migration: Terraform handles partial updates better (idempotent applies). Reduces stuck states. 10) Incident review: what caused CF failure? Network issue? API limit? Improve process.
Follow-up: How would you ensure Terraform handles the recovery scenario more gracefully?