Ansible Interview — Idempotency Patterns and Best Practices

Your Ansible playbook modifies /etc/hosts file on 200 servers for DNS override. Running the playbook twice causes duplicate entries, breaking applications. You need to ensure the playbook is fully idempotent and can run multiple times safely.

Use lineinfile module with state: present, line: ' hostname', and regexp: '^.*hostname$' to ensure only one entry exists. The regexp parameter makes it idempotent by finding and replacing existing entries instead of appending. For multi-line blocks, use blockinfile with marker comments. Set backup: yes to create backup copies before modification. Validate idempotency in Molecule tests: run playbook twice in same test scenario, verify both runs report no changes in second run. Use check: true mode in CI pipeline to detect non-idempotent tasks. Register task results and verify with assert: result.changed == false on second run. For complex file edits, use templating: render entire config file from template, Ansible handles diffing automatically. Implement change tracking: log result.msg containing what changed to audit trail.

Follow-up: If your compliance system requires archival of all /etc/hosts changes with timestamps, but the idempotency check must still report 'no changes', how would you separate the audit logging from the idempotency status?

Your team uses shell module to run custom scripts in playbooks. Scripts create temporary files, download data, and perform calculations. After 3 playbook runs on the same day, the system has 300 temporary files consuming 10GB, causing disk space alerts. Scripts are non-idempotent by nature.

Wrap script execution in block + always to ensure cleanup: block: [run script], always: [rm -rf /tmp/ansible-*]. Use tempfile module to generate unique temp directories: register: tmp_dir then path: '{{ tmp_dir.path }}', ensuring Ansible tracks and cleans temporary artifacts. Replace shell scripts with native Ansible modules where possible: use command, unarchive, get_url instead of shell wrappers. For unavoidable scripts, implement idempotency markers: scripts check for idempotency lock files before execution: if [ ! -f .idempotent_flag ]; then run; touch .idempotent_flag; fi. Use changed_when: false for read-only scripts that don't modify state. Register script output and use when: result.stdout == '' to skip on repeat runs. Implement state files: scripts write JSON output to /var/lib/ansible/state, check state before re-running. Create Molecule test that runs playbook 5 times and verifies no disk space growth after cleanup.

Follow-up: If your scripts are parameterized (different inputs on each run), how would you implement idempotency markers that account for parameter variations without duplicating logic across 50+ scripts?

Your database backup playbook uses mysql_query to run backup commands, but the backup table already exists from the previous run. The module fails with 'table already exists' on second run, causing playbook failure even though backups succeed.

Use state: present/absent options in database modules: set state: absent before state: present to ensure clean state. Use check_mode: false explicitly for tasks that must execute even in check mode. Implement pre-flight validation: use mysql_query with SELECT to check if backup table exists: query: 'SHOW TABLES LIKE "backup"', register result, conditionally create table only if result is empty. Use mysql_db with state: present which is naturally idempotent; if database exists, no action taken. For backup procedures, use mysql_query with ignore_errors: true and check result.msg for 'already exists' and treat as success. Implement transaction management: wrap backup in BEGIN, COMMIT, ROLLBACK to ensure operations are atomic and safe for re-runs. Test with Molecule: spin up MySQL container, run playbook twice, verify both runs succeed and backup integrity. Log backup metadata with timestamp to verify multiple runs create separate backup artifacts, not duplicates.

Follow-up: If your backup table schema changes monthly, and you need backwards compatibility with old backups while ensuring idempotency, how would you handle schema migrations safely?

Your package installation playbook uses apt: name=nginx state=present which is idempotent, but post-install tasks like systemctl enable nginx cause issues when nginx is already enabled. Some team members report playbook reports 'changed' on every run despite packages being installed.

Use systemctl module with enabled: yes, daemon_reload: true which is idempotent: running it when nginx is already enabled reports no changes. The issue is task ordering: ensure systemctl daemon-reload runs before enabled: yes to apply changes. Register systemctl is-enabled command to verify current state: shell: 'systemctl is-enabled nginx', register result, use changed_when: result.rc != 0 to report changes only when state actually changes. Use native modules instead of shell: systemd module is fully idempotent with all flags. For multi-step installations, break into separate plays: discovery play checks current state, second play applies only needed changes. Use handlers for restart/reload to ensure they run only once even if multiple tasks trigger them. Test idempotency in Molecule with --strategy=debug to trace execution. Enable diff mode: diff: true shows what changed and what was skipped.

Follow-up: If your organization has services that must be in specific state (running vs stopped) at specific times (maintenance windows), how would you implement time-based idempotency that respects maintenance schedules?

Your infrastructure uses Ansible to configure load balancers with health check endpoints. The health check script is deployed via playbook, but the script is updated weekly. Current playbook always restarts the health check service, causing brief downtime. You need to restart only when the script actually changes.

Use copy module with src pointing to updated script and register result: register: script_update. Check result.changed to conditionally trigger handler: notify: 'restart health check' only executes if file changed. Use handlers properly: define in handlers: section, triggered by notify only when tasks report changes. In handler, use systemctl restart not systemctl force-restart to allow graceful shutdown. Implement health check bypass during restart: endpoint returns 503 Service Unavailable during 10-second restart window; load balancer removes from rotation before restart. Add pre-restart validation: handler first runs validate script to ensure new version is syntactically correct before allowing restart. Use meta: flush_handlers to restart immediately rather than waiting for play completion, reducing downtime to 1-2 seconds. Monitor with callback plugin: log every handler execution with timestamp to audit trail. Implement rollback: if post-restart health checks fail, handler restores previous script version automatically.

Follow-up: If your health check script must validate 50 different endpoints and can fail on any one, how would you implement a validation layer that allows partial failures without triggering an automatic rollback?

Your playbook configures Nginx with SSL certificates from Let's Encrypt. Certificate renewal happens automatically via cron, but the playbook doesn't know when renewal occurred. Running the playbook reloads Nginx config unnecessarily, causing brief connection drops. You need to reload Nginx only when certificate actually changed.

Track certificate state with stat module: stat: path=/etc/letsencrypt/live/domain/cert.pem, register result, store result.stat.mtime in fact file at /var/lib/ansible/cert_mtime.txt. On subsequent runs, stat again and compare result.stat.mtime with stored value: if different, certificate changed, reload Nginx. Use changed_when: cert_mtime_changed to properly track changes. Implement check command before reload: nginx -t validates syntax without restarting. Use command: 'nginx -s reload' which gracefully reloads without dropping connections (vs restart). For certificate renewal tracking, use external facts: store JSON to /etc/ansible/facts.d/nginx_facts.json with certificate metadata, Ansible auto-loads as hostvars. Implement monitoring: alert when certificate is 30 days from expiry but playbook hasn't run renewal yet. Create separate handler for certificate reload vs config reload to avoid unnecessary restarts. Test with Molecule: use faketime to simulate certificate expiration, verify playbook reloads only when needed.

Follow-up: If your organization uses multiple certificate providers (Let's Encrypt, Digicert, self-signed) with different renewal windows, how would you implement a unified idempotency check that works across all providers?

Your team manages DNS records via Ansible using route53 module. Daily playbook adds A records for auto-scaled instances, but instances scale down at night. Current playbook leaves orphaned DNS records consuming quota. Running playbook multiple times creates duplicates; removing by name causes false 'changed' reports.

Use route53 with state: present and overwrite: true to ensure records are idempotent: if record exists with same value, no change reported. Implement cleanup via separate play: use dynamic inventory to fetch current instance IPs, use set_difference filter to identify orphaned DNS entries, remove with state: absent. Use alias records instead of A records where possible; alias records auto-update with Autoscaling group changes, naturally idempotent. Register route53 operations with changed_when: 'message' not in result to ignore status messages that don't reflect actual changes. Implement dry-run: use check mode to preview DNS changes before applying. Use callbacks to audit all DNS changes: log each route53 call to CloudTrail or syslog. For quota management, implement guardrails: before adding records, check current record count, fail if approaching limit. Test idempotency in Molecule using moto (mock AWS): run playbook, scale instances, run playbook again, verify only changed records are updated.

Follow-up: If your DNS TTL is 300 seconds but cached resolvers ignore TTL and hold records for hours, how would you implement idempotency validation that accounts for resolver cache inconsistency?

Your Ansible playbook configures Kubernetes manifests stored in Git. Developers commit manifest changes daily. Playbook applies manifests with kubectl apply, but running playbook twice on same day reports 'changed' even though manifests are identical, causing unnecessary reconciliation cycles.

Use kubectl diff --dry-run=client to preview changes before applying: only report changed if diff output is non-empty. Use kubectl rollout history to track changes; compare between runs. Implement server-side apply with kubectl apply --server-side which handles concurrent modifications better and reports true vs false changes. Register kubectl output and use changed_when: 'configured' in result.stdout or 'created' in result.stdout to accurately track changes vs reconciliation. Use Kustomize or Helm for templating: generate manifests locally with kustomize build, compare hash of rendered output between runs, only apply if hash changed. Implement manifest validation: before applying, validate schema with kubectl apply --validate=true --dry-run=server. Use check_mode: false to run kubectl in dry-run mode first, capture output, compare with live state. Monitor with Kubernetes controller-manager logs: verify apply operations aren't triggering unnecessary reconciliation. Test with Kind cluster in Molecule: deploy manifests, run playbook 3 times, verify only first run reports changes.

Follow-up: If your Kubernetes manifests include auto-generated fields (metadata.managedFields, status) that change on each apply, how would you filter those fields to achieve true idempotency?