You're setting up Terraform for three environments: dev, staging, and production. A team member suggests using workspaces: `terraform workspace new dev`, `terraform workspace new prod`. Another suggests separate directories. Your 15-person team has mixed skill levels. Which approach and why?
Use separate directories for environments in production. Workspaces have severe drawbacks: state files are in same backend making accidental deletes catastrophic, easy to switch workspaces by mistake (`terraform workspace select prod` while intending dev), hard to isolate IAM permissions (all environments use same backend bucket), impossible to parallel-test changes in different environments. Directory structure: `terraform/dev/`, `terraform/staging/`, `terraform/prod/` allows separate backends, different IAM roles, independent CI/CD pipelines, clear visual separation, prevents mistakes. Each directory has identical `main.tf`, `variables.tf`, `outputs.tf` with environment-specific `terraform.tfvars`. Use workspaces ONLY for temporary feature branches within same environment or non-production feature testing. For 15-person team, directory structure scales better and reduces human error.
Follow-up: How would you structure shared modules to be used across these environment directories?
A team member accidentally ran `terraform destroy` in the wrong workspace and destroyed production networking. Rollback was difficult. How do you prevent this with directory structure?
Directory structure + IAM provides defense-in-depth: 1) Separate backend buckets per environment: prod uses `tf-state-prod` bucket, only accessible to prod IAM role. Dev engineers don't have credentials. 2) Run directory-specific CI/CD: dev pipeline only runs `terraform/dev/`, prod requires 2-approvals before applying. 3) Add destroy restrictions: `resource "aws_s3_bucket" "critical" { lifecycle { prevent_destroy = true } }` in prod, nowhere else. 4) Use Terraform Cloud workspaces at org level: create separate TFC organizations for prod vs dev, different API tokens. 5) Add confirmation gates: apply scripts require human confirmation for prod. 6) Add pre-hooks: script checks `terraform plan | grep destroy` and posts to Slack for review before allowing apply. 7) Use `terraform workspace` within directories for temporary testing if needed, but never for environments.
Follow-up: How would you design IAM roles to prevent cross-environment access completely?
Your organization uses workspaces for multiple environments but now needs to: run parallel `terraform plan` in dev and staging, have different CI/CD for each, and isolate state completely. How do you migrate?
Migration strategy: 1) Create new backend configuration for each environment in separate directories. 2) For each workspace, extract state: `terraform state pull > workspace-state.tfstate`. 3) Create directory structure: `mkdir terraform/{dev,staging,prod}` and `cp main.tf terraform/dev/`. 4) In new directories, initialize with new backends: `cd terraform/dev && terraform init -backend-config="bucket=tf-state-dev"`. 5) Migrate state: `terraform state push < ../workspace-state.tfstate`. 6) Run `terraform plan` to verify zero changes. 7) For each environment, set up independent CI/CD pipelines with different approval requirements. 8) Update documentation and scripts pointing to new structure. 9) Remove old workspaces after verification: `terraform workspace delete dev` only after confirming new directory works.
Follow-up: How would you handle this migration with active deployments running continuously?
You use directories for environments. An engineer creates new Terraform configuration in `terraform/staging/` but forgets to create corresponding backend configuration. They run `terraform init` which initializes local state. How do you prevent and detect this?
Prevent with template + validation: 1) Create `terraform/.terraform-setup` template with required structure: `backend.hcl`, `terraform.tfvars`, `variables.tf`. 2) Add pre-commit hook: `pre-commit run --all-files` validates each directory has required files: `test -f backend.hcl && test -f terraform.tfvars`. 3) Run `terraform init -backend=false` first in CI to catch missing config. 4) Add makefile target: `make terraform-validate` runs `terraform validate` and checks for local state files: `test ! -d .terraform -o ! -f .terraform/terraform.tfstate`. 5) Add CI/CD check: `terraform init -backend-config backend.hcl -upgrade` must complete without errors. 6) Add directory template: `scaffold.sh` creates new environment directories with all required files. 7) Monitor for local state: `find . -name .terraform -type d` and alert if found in VCS.
Follow-up: How would you automate scaffolding new environment directories?
Your team uses directories but shares the same VPC across dev and prod for cost savings. Both `terraform/dev/` and `terraform/prod/` manage the same VPC resource. How do you handle this without state conflicts?
Use data sources and careful resource ownership: 1) In prod: `resource "aws_vpc" "shared" { cidr_block = var.vpc_cidr }` creates and manages VPC. 2) In dev: use data source: `data "aws_vpc" "shared" { id = var.shared_vpc_id }` reads existing VPC without managing. 3) Pass VPC ID via variable: dev `terraform.tfvars` contains `shared_vpc_id = "vpc-prod-id"`. 4) Design: prod manages infrastructure, dev uses what prod creates. 5) Add IAM restrictions: dev can read VPC but not modify. 6) Document architecture: show that prod owns shared resources, dev is consumer. 7) Prevent accidents: add Sentinel policy in Terraform Cloud denying creation of resources with same ID in dev and prod. 8) Use remote state data source: `data "terraform_remote_state" "prod" { backend = "s3" config = { bucket = "tf-state-prod" } }` then `shared_vpc_id = data.terraform_remote_state.prod.outputs.vpc_id`.
Follow-up: How would you enforce this ownership model so accidental duplication is impossible?
You manage 50 environments with directory structure. Most share 95% of code, differing only in a few variables. Currently each directory duplicates all HCL. How do you reduce duplication?
Use module-based structure: 1) Create canonical modules in `modules/platform/`: contains all shared logic, configurable via variables. 2) Each environment directory (`prod/`, `dev/`, etc.) is minimal: just `main.tf` calling modules, `variables.tf` defining environment values, `terraform.tfvars` with values. 3) Example: `prod/main.tf` is 20 lines: `module "platform" { source = "../modules/platform" environment = var.environment instance_count = var.instance_count }`. 4) Share state of modules: use `local = { shared_module_version = "v2.1" }` to ensure all envs use same module version. 5) Run tests on module once, all envs get same logic. 6) Add validation: `terraform validate` in each directory to catch misconfigurations. 7) Use `terraform console` to test logic before deploying. Document: show minimal example for new environment setup.
Follow-up: How would you roll out module updates across all 50 environments safely?
You have environments in directories but need to test a feature (new monitoring system) in dev and staging in parallel without affecting each other, and optionally roll back to baseline. Workspaces seem ideal but you've moved away from them. How do you approach this?
Use git branches + directories + temporary workspaces: 1) Create git branch `feature/monitoring`: `git checkout -b feature/monitoring`. 2) In this branch, modify only `terraform/dev/` and `terraform/staging/` to add new monitoring. 3) Deploy: `cd terraform/dev && terraform apply` with new monitoring enabled. 4) Test in dev, then staging. 5) Use git tags to mark baseline: `git tag baseline-stable` on current state. 6) If feature fails in staging, rollback: `git checkout baseline-stable -- terraform/staging/` to revert staging files. 7) For parallel testing: use temporary Terraform Cloud workspaces within the branch for isolated runs if using TFC. 8) After testing succeeds, merge branch to main. 9) This keeps directory structure intact, uses git as control plane, and git branches provide isolation instead of TF workspaces. Each branch manages its own environment state via directories.
Follow-up: How would you prevent merge conflicts when multiple teams test features in parallel branches?
Your organization runs a shared Terraform setup where dev, staging, and prod use a monolithic `terraform/` directory with workspaces. Compliance now requires audit logging showing exactly which environment changes were applied when, and by whom. Current setup makes auditing difficult. How do you fix this?
Migrate from workspaces to directories for auditability: 1) Move to directory structure: `terraform/{dev,staging,prod}/` each with own backend and CI/CD pipeline. 2) Configure Terraform Cloud: separate organizations for each environment allow granular audit logs: TFC shows `Organization: TerraformProd > Workspace: prod-us-east > Apply by: sarah@company.com at 2024-03-15 10:30 UTC`. 3) Enable CloudTrail logging: log all S3 state bucket access, DynamoDB lock table access. 4) Use Git as source of truth: all changes go through PRs, Git history shows who changed what. 5) Tag all resources: `changed_by = "sarah@company.com"`, `changed_at = "2024-03-15"`. 6) Add approval gates: prod changes require review by 2 people minimum. 7) Generate compliance reports: script queries Terraform Cloud audit logs for date range, formats for compliance review. Each directory's pipeline makes clear which environment is being modified.
Follow-up: How would you prove to auditors that no environment changes happened outside the tracked CI/CD pipeline?