Terraform Interview — Custom Providers and Provisioners

Your company has custom infrastructure platform (proprietary SaaS with API). Terraform has no provider for it. Currently you call API manually via scripts. You want Terraform to manage this infrastructure too. Is a custom provider justified?

Evaluate custom provider ROI: 1) Assess frequency: how often do you manage this resource type? If < 5 times/year, script might suffice. If weekly, custom provider ROI is clear. 2) Complexity: custom provider for 3 resource types (service, config, user) is justified. For 1 simple resource, use `curl` in provisioner. 3) Team size: 1-2 person team might not have bandwidth. 10+ person team should own custom provider. 4) Maintenance: custom provider requires ongoing maintenance (API changes, Terraform SDK updates). 5) Build decision: if deciding to build: use Terraform SDK (Go). Typical provider: 500-2000 lines per resource type. 6) Example structure: `resource "company_service" { name = "..." api_key = "..." }`, `resource "company_config" { service_id = "..." value = "..." }`. 7) Publish: if useful, publish to Terraform Registry. Others might use it. 8) Alternative: if custom platform has GraphQL/REST API, use `http` provider: `resource "http_request" { url = "..." method = "POST" }`. Less elegant but works without custom provider. 9) Decision: custom provider if building 10+ resources and managing regularly. Otherwise use http provider or scripts.

Follow-up: How would you version and maintain a custom provider over time?

You've built custom provider for company platform. Provider works locally in dev but fails in Terraform Cloud CI/CD with authentication errors. Local has credentials in env vars, CI/CD doesn't. How do you fix?

Handle provider authentication securely: 1) In local dev, provider reads from env: `provider "company" { api_key = var.api_key }`. Local .tfvars has key. 2) In TFC, pass credentials via workspace variables: TFC UI -> Workspace -> Variables. Add `api_key` as Terraform variable, mark sensitive. 3) Provider reads from variable: `provider "company" { api_key = var.api_key timeout = 30 }`. 4) In Terraform code: `variable "api_key" { sensitive = true }`. 5) Best practice: provider should support multiple auth methods: env vars (local dev), variables (TFC), AWS assume role (production). Provider code: `api_key = try(var.api_key, lookup(local.env, "COMPANY_API_KEY", ""))`. 6) Test authentication: `terraform init` should succeed if credentials correct. Error message should hint at auth issue. 7) Documentation: show team how to set credentials for different environments. 8) Rotate credentials: if key exposed, regenerate in company platform, update TFC variable immediately. 9) Audit: provider logs all API calls for compliance.

Follow-up: How would you handle provider credentials that expire and need rotation?

You're considering provisioners for post-deployment configuration: install monitoring agent, configure firewalls, run initialization scripts. Team disagrees if provisioners are "anti-Terraform". When are provisioners justified?

Provisioners are last resort. Use only when necessary: 1) Anti-pattern reasons: provisioners run outside Terraform management. If they fail, Terraform doesn't know (taints resource, but risky). They're not idempotent - rerunning might break. State is hard to track. 2) Justified use cases: 1a) Legacy systems requiring custom setup. 1b) Third-party applications with no Terraform support. 1c) Temporary workarounds while provider adds support. 3) Better alternatives: 1a) User data: `user_data = base64encode(file("init.sh"))` on EC2. Terraform manages script content. 1b) Cloud-init: EC2 user_data executes cloud-init config. Idempotent. 1c) Configuration management: use Ansible, Puppet, Chef after Terraform creates VMs. 1d) Terraform providers: if app has REST API, use provider instead. 4) If must use provisioner: `provisioner "local-exec" { command = "script.sh" on_failure = continue }`. on_failure = continue allows resource creation despite script failure. 5) Add logging: provisioner should log output for debugging. 6) Idempotency: provisioner script should be idempotent (safe to run multiple times). 7) Example good use: `provisioner "remote-exec" { inline = ["sudo systemctl restart application"] }` on_failure = continue if restart already in progress. 8) Document: explain why provisioner needed. Plan to remove.

Follow-up: How would you prevent provisioner failures from blocking infrastructure deployment?

You use `local-exec` provisioner to run post-deployment script. Script fails on Windows (uses bash syntax). Developers use both Mac and Windows. How do you write cross-platform provisioners?

Implement cross-platform provisioner logic: 1) Detect OS: `locals { is_windows = startswith(var.os, "windows") }`. 2) Conditional provisioner: use `count` to run provisioner only on relevant OS: `provisioner "local-exec" { count = var.os == "linux" ? 1 : 0 command = "bash script.sh" }`. 3) Use PowerShell for Windows: `provisioner "local-exec" { count = var.os == "windows" ? 1 : 0 command = "powershell -Command \"& { . script.ps1 }\"" }`. 4) Better: use platform-neutral Python script: `provisioner "local-exec" { command = "python3 script.py" }`. Python works on Mac, Windows, Linux. 5) For remote-exec: specify shell: `provisioner "remote-exec" { inline = ["#!/bin/bash", "set -e", "script command"] }` or PowerShell. 6) Use Terraform `templatefile` to generate OS-specific scripts: `locals { deploy_script = var.os == "windows" ? file("deploy.ps1") : file("deploy.sh") } provisioner "local-exec" { command = local.deploy_script }`. 7) Test: run Terraform on both OS to verify provisioners work. 8) Document: show both Mac/Windows setup instructions.

Follow-up: How would you test provisioners on both platforms in CI/CD?

You've created custom provider for company platform. API changes frequently. Provider tests are manual and fragile. How do you build robust testing for custom providers?

Implement comprehensive provider testing: 1) Unit tests: mock API responses. Test provider logic independently: `func TestCreateService(t *testing.T) { mock_api := NewMockAPI() provider := NewProvider(mock_api) resource := provider.CreateService(...) assert.NotEmpty(t, resource.ID) }`. 2) Integration tests: use staging API environment. Deploy real resources: `terraform apply` with test config. 3) Provider acceptance tests: use Terraform testing framework. Example: `resource "company_service" "test" { name = "test-service" }`. Test harness runs apply, verify, destroy. 4) Test scenarios: 1a) Happy path: create resource, update, delete. 1b) Errors: invalid input, API errors, timeouts. 1c) Edge cases: empty values, special characters, concurrency. 5) API mocking: use `httptest` Go package to mock API responses. Tests run without real API. 6) Fixtures: store test data: `testdata/service_response.json`. Mock API returns fixture. 7) CI/CD: run unit tests on PR, integration tests nightly against staging, acceptance tests weekly. 8) Documentation: provider README shows how to run tests locally: `make test`. 9) Versioning: track provider version. Test against multiple API versions to ensure compatibility.

Follow-up: How would you handle provider testing if the API is rate-limited?