Ansible Interview — Tags, Conditionals, and Flow Control

Your Ansible playbook has 50 tasks. Teams only want to run specific groups of tasks: "run only security tasks", "skip database migrations", "run only if deploying to production". Currently all tasks run every time. How do you implement granular task execution control using tags?

Implement comprehensive tagging strategy. Add tags to all tasks: `- name: Install packages tags: [packages, base]`. Create tag groups: `base`, `security`, `packages`, `config`, `database`, `monitoring`. Teams run specific tags: `ansible-playbook site.yml --tags security` runs only security tasks. Use `--skip-tags` to exclude: `--skip-tags database` skips all database tasks. Implement tag hierarchy: `security.firewall`, `security.selinux` for nested organization. Use `always` tag for critical tasks that run even with `--tags` filter. Implement documentation: maintain list of all tags and their purpose. Use naming conventions: `service_taskgroup` for clarity: `nginx_config`, `nginx_restart`. Implement selective runs for different environments: dev tags differ from prod. Create playbook variables for tag defaults: `deploy_tags: ['base', 'packages']` as default. Override with `--tags` flag. Test tagging: verify `--tags security` actually runs only security tasks. Monitor tag usage: track which teams use which tags, plan infrastructure accordingly. Implement tag inheritance: if task belongs to `web_deployment` role, inherit tags from role.

Follow-up: How would you implement tag dependencies where running one tag automatically runs prerequisites?

Your Ansible playbook uses complex conditionals: `when: ansible_os_family == "RedHat" and inventory_hostname.startswith("prod") and deployment_env == "production" and enable_feature_flag == true`. These conditionals are duplicated across tasks. Conditionals are hard to test and maintain. How do you architect cleaner conditionals?

Refactor complex conditionals into pre-computed facts. Create pre_tasks that compute derived facts: `run_on_prod: "{{ inventory_hostname.startswith('prod') }}"`. Create boolean facts encapsulating complex logic: `enable_security_config: "{{ is_prod and security_enabled }}"`. Use simpler conditionals: `when: enable_security_config` instead of complex expression. Create conditional matrix as fact: dict mapping condition combos to boolean: `condition_matrix: { prod_feature_enabled: true, prod_feature_disabled: false, ...}`. Refactor duplicated conditionals into shared variables section. Use `include_tasks` with conditional: conditionally include task file based on simple condition. Document conditions: create table showing condition combinations and expected behavior. Use `assert` module to validate assumptions at runtime. Implement ansible-lint rule: fail if conditional >3 terms (forces simplification). Store complex conditionals in documentation, not code. Use inventory variables to customize behavior per host, reducing conditional complexity. Test conditionals: create test playbook validating all condition combinations. Use `debug` module to print computed facts, aiding debugging.

Follow-up: How would you implement policy engine to evaluate complex conditionals consistently?

Your Ansible playbook uses `when:` conditionals on 30 tasks. It's unclear why some tasks are skipped—conditions are opaque. Teams debug by adding `debug` modules to print condition values. This creates debugging clutter. How do you make conditional behavior transparent?

Implement conditional transparency using descriptive naming and documentation. Use clear conditional names: `when: is_production` instead of `when: env == 'prod'`. Add comments above conditionals explaining logic: `# Only run on production servers with security enabled`. Use `debug` module strategically: `debug: msg="Condition X is {{ condition_value }}"` in playbook to show conditional values at start. Create conditional reference documentation: table showing all conditionals and their meanings. Implement conditional analysis: playbook that iterates through all condition combinations, shows which tasks run for each. Use `assert` to validate conditional assumptions: `assert: that: is_production or not security_required`. Use `changed_when` to make conditional impact visible: if task conditionally skipped, show why. Implement callback plugin that logs all conditional evaluations: `Task X: condition 'is_prod=true' evaluated to True, running task`. Enable verbosity: `ansible-playbook -vvv` shows conditional evaluation. Use `--check` mode to test conditionals without actual execution. Store conditional mappings in external file (YAML) for clarity. Create visualization of conditional flow: show which task runs for which condition combination. Document edge cases: when conditionals behave unexpectedly.

Follow-up: How would you implement conditional testing to verify all condition branches execute correctly?

Your playbook uses `block/rescue/always` for error handling on 20 plays. Some blocks handle errors gracefully, others allow errors to propagate. Error handling logic is fragmented across playbook. Teams don't know what happens if task fails in their play. How do you standardize error handling?

Implement standardized error handling patterns. Create block templates for common scenarios: 1) Critical tasks (fail immediately), 2) Important tasks (retry 3 times), 3) Optional tasks (ignore failures). Store in `includes/error_handlers.yml`. All plays use consistent blocks: `block: [tasks] rescue: [recovery] always: [cleanup]`. Implement role-based error handling: security-critical plays fail on any error, routine plays tolerate some failures. Create error handling matrix: task type vs. acceptable failure modes. Use `register` + `when` for selective error handling: retry only on transient errors (timeout), fail on permanent errors (auth). Implement error classification: tasks classify errors as `transient`, `permanent`, `recoverable`. Error handler responds appropriately per classification. Use `failed_when` to define what constitutes failure (not all non-zero exits are failures). Use `ignore_errors: true` selectively—document why error is acceptable. Implement error handlers that log details before proceeding: always record failure context. Create `always` block that runs cleanup regardless of success/failure. Document error handling strategy: which plays fail-fast, which tolerate errors? Test error scenarios: simulate failures and verify error handling activates.

Follow-up: How would you implement error recovery that automatically tries alternate approaches?

Your Ansible playbook controls workflow: task A runs conditionally, if successful task B runs, else task C. Workflow is sequential: if condition X, do step 1 then 2 then 3, else do step 4 then 5. Playbook is becoming increasingly complex with nested conditionals. How do you manage complex workflows?

Implement workflow patterns using plays and handlers. Instead of nested conditionals within play, create separate plays for each workflow branch. Play 1: `when: is_production` run production tasks. Play 2: `when: not is_production` run dev tasks. This separates logic clearly. Use handlers for sequential workflows: task A notifies handler B, handler B fires task C. Use `meta: flush_handlers` to control handler timing. Implement playbook composition: break monolithic playbook into smaller focused plays. Main playbook imports focused plays: `import_playbook: deploy_base.yml` then `import_playbook: deploy_app.yml`. Use conditional imports: `include_playbook: production_tasks.yml` when is_prod. Implement workflow as Tower workflow (not playbook): Tower workflows can branch, route based on outcomes. Use conditional branching: `on_success: run_next_play, on_failure: run_recovery_play`. Create state machine pattern: use `set_fact` to track state, conditionals on state value. Implement task group organization: use `block` to group related tasks together. Document workflow: diagram showing which tasks run in which condition. Use ansible-lint to identify complex conditionals and refactor. Test workflow: run with different conditions, verify correct path executes.

Follow-up: How would you implement feature flags as conditionals for gradual rollout?

Your team uses `--tags` to run subsets of playbook, but teams frequently make mistakes: "I ran `--tags packages` but database role was included and caused issue." Tags aren't enforced—developers need to know which tags to use. How do you make tags discoverable and enforce correct usage?

Implement tag discovery and enforcement. Create tag registry: document all available tags, their purpose, and implications. `tags.yml` lists: `deploy_base`: Update base packages, `deploy_app`: Deploy application code, etc. Implement tag documentation in playbook comments: at top of playbook, list available tags and when to use. Create playbook that lists all tags: `ansible-playbook list_tags.yml` outputs all tags used in playbook. Implement tag validation: lint playbook to verify all tags are documented. Use `ansible-lint` custom rule: fail if task has undocumented tag. Implement tag defaults: if no tags specified, use sensible defaults. Example: `--tags deploy` runs all deployment tasks (depends on explicit tag definition). Implement tag help: `--help` or documentation showing "run `--tags deploy` to deploy, `--tags rollback` to rollback". Implement Tower job templates with tag pre-selection: dropdown menu lets users choose tags, prevents wrong selection. Implement tag combinations: `--tags deploy,config` runs both deploy and config tags. Document interactions: does running one tag implicitly run others? Use `always` tag for critical tasks that must run regardless. Test tag combinations: verify `--tags X` produces expected result. Create runbook: teams reference runbook for which tags to use for their tasks.

Follow-up: How would you implement tag versioning where old tag names are deprecated gradually?

Your Ansible playbook has a `when: not skip_this_entire_play` conditional that skips entire play based on flag. However, if flag accidentally left true, entire play is skipped in production. How do you prevent unintended skips?

Implement skip prevention mechanisms. Use explicit conditions instead of negation: `when: run_this_play` (positive) instead of `when: not skip_this`. Positive conditions are easier to verify. Implement validation: assert that skip flags are in expected state before playbook runs. Example: `assert: that: skip_this_play == false or deployment_env != 'production'`. Implement warning alerts: if condition would skip important plays, alert and require confirmation. Use Tower approval nodes: before running plays with skip conditions, require approval. Implement naming: use clear flag names: `skip_database_migrations: true` is explicit. Document flags: maintain list of all skip flags and their purpose. Implement defaults: skip flags default to `false` (don't skip). Require explicit `true` to skip. Use Tower variables with validation: GUI prevents invalid flag values. Implement monitoring: track which plays are skipped, alert if skip condition evaluates unexpectedly. Use `debug` module to log flag values at playbook start. Create playbook test: verify no plays are unintentionally skipped. Use version control: document when skip flags were added and why. Implement expiration: skip flags have expiration date, must be explicitly renewed. Implement audit logging: log whenever condition causes play to skip.

Follow-up: How would you implement A/B testing where playbook behavior differs based on experiment ID?

Your Ansible playbook executes tasks in strict order: task 1, 2, 3, 4, 5. However, tasks 2 and 3 are independent—they could run in parallel to save time. Task 3 takes 10 minutes, task 2 takes 5 minutes. Currently tasks run sequentially (15 min total). How do you optimize task ordering with conditionals and parallel execution?

Implement parallel execution using `async` and `poll`. Make task 2 and 3 async: `async: 300 poll: 0` (fire-and-forget). Task 2 and 3 start in parallel without waiting for each other. Use `async_status` to collect results later: `register: task2_result` then wait for completion after both started. Alternatively, use `forks` to increase parallelism: `forks: 2` processes tasks concurrently if possible. Implement task grouping: group independent tasks into separate plays running in parallel. Use Ansible Tower workflows: create parallel job templates for tasks 2 and 3. After both complete, run task 4. Use `block` with `block_until_all_changed: false` to parallelize tasks within block. Reorder tasks: if task 5 doesn't depend on tasks 2-3, move it up to run in parallel. Analyze task dependencies: use `register` to identify which tasks depend on which. Create dependency graph: show parallelizable vs. sequential tasks. Test parallel execution: verify no race conditions or resource conflicts. Monitor parallel task execution: track time savings from parallelization. Document task ordering rationale: why are tasks in current order? Can they be reordered?

Follow-up: How would you implement dynamic task scheduling based on system load?