Terraform Interview — Module Design Patterns and Composition

You maintain a 50-service platform. Each service needs VPC, security groups, databases, and monitoring. Currently each team writes their own HCL (massive duplication). Design a composable module hierarchy for maximum reuse and minimal coupling.

Create three-tier module hierarchy: 1) Foundation modules: `vpc`, `security-groups`, `database` with sensible defaults. 2) Composition modules: `service-platform` wraps foundations and chains them. 3) Root modules per environment. Structure: `modules/foundation/vpc/`, `modules/foundation/database/`, `modules/composition/service-platform/`. In composition module: `module "vpc" { source = "../foundation/vpc" }`, `module "database" { source = "../foundation/database" depends_on = [module.vpc] }`. Expose limited outputs: `output "vpc_id"` not internal subnets. Use `required_providers` to lock versions. Each foundation module tests independently with `terratest`. Composition module becomes single source of truth for platform topology.

Follow-up: How would you version these modules for safe updates across 50 services without coordinating deployments?

Your `network` module has been copied into 12 different root modules with local modifications. Now security requirements change and you need to update all 12. How do you consolidate and enforce consistency?

Use `terraform state rm` and `terraform state mv` to migrate: 1) Identify shared resources across projects. 2) Create canonical module: `modules/network/variables.tf` with all options from 12 versions. 3) For each root module, run `terraform state mv module.network.aws_vpc.main module.shared_network.aws_vpc.main` (adjust paths). 4) Update root module to use shared module with `source = "../modules/network"`. 5) Run `terraform plan` to validate before applying. 6) For divergent customizations, add variables: `var.custom_nacls` instead of module copies. 7) Document migration: which root module uses which module version. 8) Implement CI check: `terraform validate -json` on all root modules to catch breaking changes.

Follow-up: How would you detect these duplicated modules automatically in your codebase?

You're designing a `kubernetes-cluster` module that deploys EKS with monitoring, logging, and networking. It needs to work for both dev (minimal) and production (high-availability). How do you structure this without code duplication?

Use variable-driven configuration with conditional logic: Create `variables.tf` with environment flag: `variable "environment" { type = string }`. In main resources: `resource "aws_eks_cluster" "main" { ... node_group_count = var.environment == "prod" ? 3 : 1 ... }`. Use `dynamic` blocks for conditional addons: `dynamic "cluster_addons" { for_each = var.environment == "prod" ? ["ebs-csi", "vpc-cni"] : ["vpc-cni"] content { addon_name = cluster_addons.value } }`. Separate concerns: `locals { ha_config = var.environment == "prod" ? var.prod_config : var.dev_config }`. Define environment presets in `tfvars` files: `terraform/dev.tfvars` vs `terraform/prod.tfvars`. Test both paths with `terratest` matrix tests.

Follow-up: How would you test this module matrix without expensive multi-region deployments?

Your module outputs 45 different values that root modules need. After a refactor, 3 outputs change names. Multiple teams get broken. How do you version outputs and maintain backward compatibility?

Implement output versioning strategy: 1) Keep old outputs but mark deprecated: `output "old_vpc_id" { value = aws_vpc.main.id description = "DEPRECATED: Use vpc_id instead" }`. 2) Add new output alongside: `output "vpc_id" { value = aws_vpc.main.id }`. 3) Add migration guide in README. 4) Use output aliases for gradual migration (not native Terraform, but document in code). 5) In root modules, explicitly import only needed outputs: root modules list dependencies in `required_outputs` documentation. 6) Run `terraform console` to validate outputs exist before applying. 7) Add CI check: validate all root modules reference non-deprecated outputs. 8) Version module in registry with changelog: `github.com/org/terraform-aws-vpc/releases/tag/v2.0.0` lists breaking changes.

Follow-up: How would you handle a situation where a module must change its core output structure?

A module deeply nests other modules: module A calls module B calls module C calls module D (4 levels). Debugging which module sets a value becomes impossible. Each level adds context-specific variables. How do you refactor for clarity?

Flatten hierarchy to 2 levels maximum: Foundation modules directly, then composition. Remove intermediate wrapper modules. In root module, call foundation modules directly with explicit variable passing instead of chaining through intermediate modules. Use locals to compute shared values: `locals { network_tags = merge(var.tags, { ModuleSource = "root/network" }) }`. Document call chain with comments: `# Network module: foundation/vpc + foundation/security_groups`. Use `terraform graph` to visualize: `terraform graph | dot -Tpng > graph.png`. Add module `description` explaining inputs and outputs. Use `terraform metadata` or Terraform Cloud UI to navigate module relationships. Pass only required variables, not cascaded through intermediate modules. Add validation: `terraform validate` plus custom checks for deep nesting in CI.

Follow-up: How would you automatically detect and flag deeply nested modules in CI/CD?

You're building a module for AWS Lambda deployment. Different teams need different runtimes, memory, timeout, VPC settings, environment variables, and layers. Some need 1 parameter, others need 20. How do you design this module without becoming unmaintainable?

Use variable grouping and object types: `variable "function_config" { type = object({ runtime = string memory = number timeout = number }) }` and `variable "optional_config" { type = object({ vpc_config = optional(object({ subnet_ids = list(string) })) layers = optional(list(string)) }) default = {} }`. Provide preset configurations: `locals { presets = { python_basic = { runtime = "python3.11" memory = 256 timeout = 60 } node_advanced = { runtime = "nodejs18.x" memory = 1024 timeout = 300 vpc_config = { subnet_ids = var.private_subnets } } } }`. Root module selects: `module "lambda" { source = "../lambda" config = local.presets[var.preset_name] }`. Validate inputs: `validation { condition = contains(["python3.11", "nodejs18.x"], var.function_config.runtime) error_message = "Unsupported runtime" } }`. Document each variable with examples.

Follow-up: How would you test all preset combinations without quadratic test explosion?

You're refactoring: moving a 100-line module that manages RDS into a smaller foundational module, but two root modules already use the old module with custom logic inside. How do you migrate without creating duplicated resource definitions?

Use `terraform state mv` for zero-downtime refactoring: 1) Create new canonical module `modules/foundation/rds`. 2) In first root module, move resources: `terraform state mv module.old_rds.aws_db_instance.main module.new_rds.aws_db_instance.main`. 3) Update root module HCL to reference new module. 4) Run `terraform plan` to verify zero changes. 5) For custom logic that diverged, extract into variables: ask yourself what differs between the two uses, add variable for each divergence. 6) If logic is fundamentally different, keep separate modules: `modules/foundation/rds-basic` vs `modules/foundation/rds-enhanced`. 7) Run migrations sequentially, not parallel. 8) Revert prepared: keep old module code for 2 weeks for safety rollback.

Follow-up: How would you safely test this refactoring end-to-end with production data?

Your platform module has grown to 2000 lines. Adding features requires understanding entire file. New maintainers spend 2 weeks learning it. How do you break it into coherent sub-modules that still work as single logical unit?

Decompose by concern: Split into `networking`, `compute`, `data`, `observability` sub-modules, all called from parent. Parent module orchestrates: `module "networking" { source = "./networking" vpc_cidr = var.vpc_cidr }` then `module "compute" { source = "./compute" vpc_id = module.networking.vpc_id }`. Each sub-module is 200-300 lines maximum. Create `README.md` per sub-module documenting inputs/outputs. Use `outputs.tf` to re-export important sub-module outputs at parent level. Version management: if a sub-module needs major change, update parent version. Document architecture: `ARCHITECTURE.md` with diagram of module relationships. Test each sub-module in isolation with `terratest`. Test integration in parent module.

Follow-up: How would you organize monitoring and alerting across these sub-modules without repeating code?