Terraform Interview — Provider Version Constraints and Lock Files

Your team uses AWS provider v4.x. Yesterday, v5.0 was released with breaking changes. Your CI/CD pipeline auto-updates provider versions. Production plan shows 150 unexpected changes. How do you proceed?

Use version constraints to prevent auto-upgrades: 1) Immediately revert provider: `terraform init -upgrade` won't help here. Lock file helps: `terraform providers lock` creates `.terraform.lock.hcl` pinning v4.x. 2) In `required_providers`, add constraint: `required_providers { aws = { source = "hashicorp/aws" version = "~> 4.0" } }` to prevent v5.x upgrades until tested. 3) In non-prod: test v5.0 with same configuration, validate plan is identical. 4) Review breaking changes: read AWS provider changelog, understand what changed. 5) If changes are necessary: update HCL for v5.0 compatibility in separate PR. 6) Stage upgrade: use separate branch, test in dev, then staging, then prod. 7) Commit lock file: `git add .terraform.lock.hcl` ensures all team members use same provider version. 8) Communicate: notify team about upgrade plan and timeline.

Follow-up: How would you automate testing of provider upgrades in CI/CD before they hit production?

Your organization requires that all infrastructure deploys use identical provider versions across all environments. Currently dev uses v4.50, staging uses v4.52, prod uses v4.48. How do you enforce consistency?

Enforce via version constraints and lock files: 1) Set strict version constraint in central `versions.tf`: `required_providers { aws = { source = "hashicorp/aws" version = "= 4.50.0" } }`. 2) Commit lock file to git: `terraform providers lock -platform=linux_amd64 -platform=darwin_amd64 -platform=windows_amd64` creates `.terraform.lock.hcl`. 3) All developers run `terraform init` which uses lock file. 4) CI/CD verifies lock file exists before planning/applying. 5) To upgrade: 1a) Proposal: `terraform providers lock` with new version. 1b) PR review: discuss version bump. 1c) Merge: all envs use new version. 1d) Deploy: run terraform in dev, then staging, then prod. 6) Add pre-commit hook: `terraform providers lock --validate` ensures lock file is up-to-date. 7) Document policy: provider upgrades require approval, go through test environments first.

Follow-up: How would you handle providers that break compatibility between patch versions?

You manage terraform modules published to a private registry. Each module needs to declare AWS provider requirements (>= 4.0, < 6.0). But root modules might use different constraints. How do you ensure compatibility?

Use flexible version constraints in modules: 1) In module `versions.tf`: use wide constraint: `required_providers { aws = { source = "hashicorp/aws" version = ">= 4.0, < 6.0" } }`. This allows root modules to pin tighter. 2) Avoid exact versions in modules: modules should accept range of versions. 3) In root module: pin exactly for reproducibility: `version = "= 4.50.0"`. 4) Test modules against all supported versions: in CI, run `for version in 4.50 4.51 4.52 5.0; do terraform init -upgrade && terraform validate; done`. 5) Document compatibility: README shows "Requires AWS provider >= 4.0, tested with 4.50-5.5". 6) Update module when new provider breaks: release new major version of module for new major AWS provider. 7) Publish to registry with version tags: `git tag v2.0.0` maps to module version compatible with AWS provider 5.x.

Follow-up: How would you handle deprecated provider attributes that your module depends on?

A critical AWS provider bug affects your production infrastructure (resources created with bug will fail if recreated). AWS releases patch v4.50.1. But updating triggers `terraform apply` to recreate resources. How do you safely apply the patch?

Use targeted patching with zero downtime: 1) Upgrade provider: `required_providers { aws = { version = "4.50.1" } }`. 2) Run `terraform plan -refresh-only` to see if patch causes unnecessary changes. 3) If patch causes recreation: this is risky. Options: a) Delay patch if not critical, b) Use `lifecycle { create_before_destroy = true }` for stateless resources (ALBs, SGs). 4) For stateful (databases), coordinate with team: schedule maintenance window. 5) Use `terraform apply -target` to patch one resource at a time during maintenance. 6) For each resource: `terraform apply -target=aws_db_instance.main`, monitor health after each. 7) Once validated, apply full state: `terraform apply`. 8) Rollback plan: if patch causes issues, revert to v4.50.0 via lock file and `terraform init`. 9) Communicate timeline to stakeholders before patching.

Follow-up: How would you detect which of your resources would be recreated before actually applying the patch?

Your organization uses 20 different Terraform modules across 50 projects. Each module declares provider requirements independently. Two projects can't work together because module A requires provider v4.x, module B requires provider v5.x. How do you resolve this?

Implement version negotiation strategy: 1) Audit all modules: document provider requirements for each of 20 modules. 2) Find common version range: all modules support >= 4.50, < 5.5. 3) Update module constraints: in each module, set `version = ">= 4.50, < 5.5"`. 4) Update root modules: set exact version compatible with all: `version = "= 4.80"`. 5) Test integration: deploy root modules using all 20 modules together with v4.80. Verify no conflicts. 6) Document compatibility matrix: show which modules work with which provider versions. 7) For incompatible modules: schedule migration. Either: a) Update module A to support v5.x (refactoring needed), b) Defer project B until module A upgraded. 8) Add CI check: when new module version released, verify it still supports required provider range.

Follow-up: How would you automate detection of version conflicts across your module ecosystem?

You run `terraform init` and notice lock file is 2 weeks old. A new provider patch was released last week with a critical feature. Should you update? How do you decide and execute safely?

Use informed update process: 1) Evaluate: is feature critical? Will your infrastructure benefit? 2) Check: read provider changelog, understand what changed in patch. 3) Test in non-prod: `terraform init -upgrade` in dev environment only. Run `terraform plan`, ensure no unexpected changes. 4) If plan is clean, proceed to staging: commit lock file change to git branch, deploy to staging. 5) Monitor staging: check if new feature works correctly. 6) If all good: merge to main, deploy to prod. 7) If feature causes issues: revert lock file, stick with old version. 8) Document decision: why we upgraded or why we didn't. 9) For critical features: create ticket, schedule regular evaluation (monthly). 10) For security patches: always upgrade ASAP via expedited process (single approval instead of 2).

Follow-up: How would you automate security patch alerts and fast-track deployments?

A team member accidentally committed `.terraform` directory (which contains provider binaries) to git along with lock file. Repository size exploded. How do you fix this and prevent recurrence?

Clean repository and prevent recurrence: 1) Remove from git history: `git rm -r --cached .terraform/` then `git commit --amend`. 2) Force push carefully: `git push --force-with-lease` only if repo is small team. For large repos, this is risky - coordinate with team. 3) Add to .gitignore: `echo .terraform/ >> .gitignore`. 4) Commit lock file only: `.terraform.lock.hcl` should be in git. Verify `.gitignore` includes `!.terraform.lock.hcl` if negation needed. 5) Cleanup: `git gc --aggressive` reduces repo size after history rewrite. 6) Educate: show `terraform init` vs `terraform init -backend` difference - init creates .terraform which shouldn't be committed. 7) Add pre-commit hook: verify .terraform directory is gitignored, prevent commits containing it. 8) Document: add CONTRIBUTING.md explaining what to commit (.lock file, .tf files) vs what to ignore (.terraform, .tfvars with secrets).

Follow-up: How would you recover if this accident happened and is discovered weeks later?

You maintain Terraform across 5 AWS regions. Each region uses different AWS endpoint URLs due to compliance (e.g., GovCloud). Provider configurations must be region-specific. How do you version control this?

Use provider aliases and version constraints per region: 1) In `providers.tf`: declare multiple provider blocks with aliases: `provider "aws" { alias = "us_east" region = "us-east-1" endpoints { ...standard... } } provider "aws" { alias = "govcloud" region = "us-gov-west-1" endpoints { ...compliance... } }`. 2) Set version constraints at provider level: `required_providers { aws = { source = "hashicorp/aws" version = "= 4.50.0" } }` applies to all aliases. 3) Lock file captures all providers. 4) In resources, specify provider: `resource "aws_instance" "govcloud" { provider = aws.govcloud }`. 5) Split configurations: `terraform/us-east/` and `terraform/govcloud/` each with region-specific backends and lock files. 6) Document: show which regions use which provider versions and endpoints. 7) Test: verify each region's configuration independently before deploying together.

Follow-up: How would you handle a security vulnerability in a provider that only affects specific regions?