Ansible Interview — Callback Plugins and Custom Output

Your Ansible playbooks output verbose default format that's hard to parse. Teams need custom output: compact, structured (JSON), with only relevant information. Tower dashboard needs to consume playbook output programmatically. How do you implement custom output plugins?

Implement custom callback plugin in Ansible collection. Create `plugins/callback/custom_output.py` that extends BaseCallbackPlugin. Override callback methods: `v2_runner_on_ok` (task succeeded), `v2_runner_on_failed` (task failed), `v2_runner_on_skipped` (task skipped). In each method, format output as desired: JSON, CSV, custom text. Register callback in `ansible.cfg`: `callback_plugins: ./plugins/callback/` and set as default: `stdout_callback = custom_output`. Implement structured JSON output: `json.dumps({'host': host, 'task': task_name, 'status': 'ok'})`. Add custom fields: execution time, resource changes, error details. Implement log rotation: output to file that's rotated on size/date. For Tower integration, output JSON that Tower can parse via API. Implement callback that sends events to monitoring system: each task completion triggers event. Test callback with `ansible-playbook --verbose` to verify output format. Document callback behavior: what fields are included, what's omitted. Implement callback versioning: maintain compatibility with old Tower versions.

Follow-up: How would you implement callback plugin chaining where multiple callbacks process output sequentially?

Your Ansible playbooks run critical production deployments. You need detailed audit trail: every task execution must be logged with timestamp, user, host, result, and changes made. Standard Ansible logging isn't sufficient. How do you implement comprehensive audit logging?

Implement custom callback plugin for audit logging. In callback methods, log full context: playbook name, play name, task name, host, user, timestamp, exit code, stdout, stderr. Write to centralized logging system (Splunk, ELK, Datadog). Structure log as JSON for parsing: `{"timestamp": "2026-04-07T10:00:00Z", "user": "deployer", "playbook": "deploy.yml", "host": "prod-1", "task": "restart service", "status": "ok", "changed": true}`. Implement callback that logs to syslog for long-term retention. Use syslog forwarding to external system (CloudWatch, Papertrail). Implement immutable audit: logs must be append-only, no modification/deletion. Use callback to validate log integrity: calculate hash of logs, verify hash unchanged. Implement real-time alerting: callback sends events to alerting system, alert on production changes. Use callback to correlate logs: add deployment ID to all logs, enabling trace of entire deployment. Implement callback that stores logs in database for querying. For compliance, implement audit retention policy: logs must be kept for 7 years. Use callback to encrypt sensitive data in logs (passwords, keys).

Follow-up: How would you implement callback plugin that correlates playbook execution with application metrics?

Your teams use Ansible Tower across 20 different tools (Slack, PagerDuty, Jira, ServiceNow, Datadog, etc.). Each tool needs different notifications from playbook: Slack gets summary, PagerDuty gets escalation alerts, Datadog gets metrics. Managing multiple notification plugins is complex. How do you centralize notifications?

Implement a universal callback plugin that acts as notification router. In callback methods, extract relevant data and route to appropriate destinations. Implement routing rules: `if status == 'failed' and severity == 'critical': send_to(['pagerduty', 'slack', 'servicenow'])`. Create notification handler that formats data per destination. Use HTTP webhooks to send notifications to external systems. Implement retry logic with exponential backoff for failed notifications. Structure notifications per tool requirements: Slack uses formatted text with buttons, PagerDuty uses incidents with severity, Jira uses issue creation. Implement notification templates: define what information goes to each tool. Store webhook URLs in Tower credentials vault. Implement notification throttling: don't spam destinations with duplicate notifications. Use callback to deduplicate notifications: same failure repeated 5 times = single notification, not 5. Implement notification enrichment: add context (change ticket, deployment environment) before sending. Test notifications: simulate failures and verify destinations receive correct notifications. Implement callback metrics: track notification delivery success rates per destination.

Follow-up: How would you implement callback plugin that creates incidents in response to deployment failures?

Your custom callback plugins are written in Python by different teams. One plugin crashes silently, breaking Tower job output. Another plugin is slow (takes 30 seconds per task). Callback plugin reliability is critical. How do you ensure plugin resilience?

Implement callback plugin framework with error handling and monitoring. Wrap all callback methods in try-catch to prevent crashes: `try: format_output() except Exception: log_error()`. Implement plugin isolation: use subprocess or container to run plugin in separate process, preventing crashes from affecting Ansible. Implement plugin timeouts: if callback doesn't complete in 5 seconds, timeout and continue. Use plugin health monitoring: track callback execution times per plugin. Alert if callback takes >1 second (indicates problem). Implement plugin version validation: verify plugin is compatible with Ansible version. Test plugins in CI/CD: run callback tests that verify: 1) handles all event types, 2) completes within timeout, 3) doesn't crash on exceptions. Implement plugin logging: callbacks log their execution for debugging. Use callback to monitor other callbacks: meta-callback tracks callback health. Implement fallback callbacks: if custom callback fails, fallback to default Ansible callback. Store plugin metrics: track success rate, latency per callback per plugin method. Implement plugin rate limiting: prevent plugin from overwhelming downstream systems.

Follow-up: How would you implement callback plugin testing framework to verify plugin behavior?

Your Ansible playbook playbook processes sensitive data (PII, credentials). Standard callback output logs this sensitive data in plaintext, violating compliance. You need callback plugins that automatically redact/mask sensitive information. How do you implement secure output filtering?

Implement callback plugin with sensitive data redaction. In callback methods, scan output for sensitive patterns: credit card numbers, SSNs, passwords. Use regex to identify patterns: `\d{4}-\d{4}-\d{4}-\d{4}` for card numbers. Redact matching data: replace with `[REDACTED]` or `***`. Use configuration file to specify sensitive keys: `['password', 'api_key', 'secret']`. In output, replace values of these keys. Implement context-aware redaction: `password: mypassword123` → `password: [REDACTED]`. Use callback to mark sensitive tasks with `no_log: true`: don't log output at all. Implement callback that queries external service for sensitive patterns: if system knows which variables are secrets, callback redacts them. Use callback to audit redaction: log what data was redacted for compliance verification. Implement callback that stores unredacted logs separately (encrypted) for authorized debugging. Test redaction effectiveness: verify no sensitive data leaks in logs. Implement compliance checking: scan logs automatically for unredacted sensitive data, alert on issues. Store redaction rules in vault to prevent unauthorized modification.

Follow-up: How would you implement callback plugin that correlates changes with application logs?

Your callback plugins output structured JSON, but teams need different formats: some want JSON, others want CSV or Prometheus metrics format. Supporting multiple output formats in single callback is complex. How do you architect multi-format callbacks?

Implement plugin architecture that separates data collection from formatting. Create base callback that collects data from Ansible events. Then implement separate formatters for each output format: JSON formatter, CSV formatter, Prometheus formatter. Use factory pattern: `get_formatter(format_type)` returns appropriate formatter. In ansible.cfg, specify output format: `callback_format: json` or `callback_format: prometheus`. Implement formatter interface that each format implements: methods like `format_task_result()`, `format_summary()`. Base callback collects data, then iterates through registered formatters, calling each. For multiple output simultaneously: register multiple formatters, base callback outputs to all. Use configuration per formatter: JSON callback uses different fields than CSV callback. Implement streaming vs. batching: JSON can stream event-by-event, CSV batches for proper format. Test each formatter independently. Implement formatter versioning: JSON schema version 1.0, version 1.1 adds new fields. Document each format's schema for consumers.

Follow-up: How would you implement callback plugin that aggregates results from multiple playbook runs?

Your Ansible Tower environment runs 1000 jobs daily. Each job generates callback output consuming storage. After 1 month, 30GB of callback logs accumulated. Storage costs are significant. Callback logs retention is 90 days. How do you implement efficient callback log storage?

Implement tiered callback log storage. Recent logs (7 days): store in hot storage (fast SSD) for quick analysis. Warm logs (8-30 days): compress and move to slower storage. Cold logs (31-90 days): archive to cheapest storage (S3 Glacier). Implement callback compression: compress logs to 10-20% of original size. Use gzip or brotli for better compression. Implement log sampling: store every task execution, but sample callback details (e.g., store every 10th task's full output). Implement log rotation: daily rotate callback logs, compress previous day's logs. Use TTL policy: automatically delete logs older than 90 days. Implement callback filtering: don't store verbose output, only structured JSON summary. Use Tower's built-in log retention settings: configure retention policy. Implement callback to store only failures (not successes) for verbose output, reducing storage 90%. Store success summaries (counters) not detailed logs. Use object storage (S3) with lifecycle policies: auto-transition to Glacier after 30 days. Implement log deduplication: identical callback outputs merged to single copy. Query old logs from Athena/Glue to analyze trends without storing full logs.

Follow-up: How would you implement callback plugin for real-time dashboard showing live job progress?

Your organization standardized on custom callback plugins for all playbook output. However, teams sometimes run playbooks with `--callback-plugin /tmp/test_plugin.py` for debugging, bypassing standard plugins. This causes inconsistent logging and audit gaps. How do you enforce callback plugin standardization?

Implement Ansible configuration enforcement. Set `ANSIBLE_CALLBACK_WHITELIST` environment variable to whitelist approved callbacks only. In ansible.cfg, disable plugin loading from arbitrary paths: use `callback_plugins` to specify only approved directories. Use Ansible Tower to enforce: Tower doesn't allow custom plugin specification in job launch. Implement pre-flight checks: Tower job template validates playbook uses approved callbacks. Use ansible-lint to detect unauthorized callback usage: create custom rule that fails if playbook specifies non-approved callbacks. Implement callback plugin registry: maintain list of approved plugins. Implement policy engine that blocks execution of playbooks with unapproved callbacks. Use RBAC: only admins can approve new callback plugins. Document approved callbacks in runbook. Implement monitoring: alert if playbook runs with non-standard callback. Use Kubernetes pod security policies (if running in K8s): prevent loading arbitrary plugins. Implement Git pre-commit hook: prevent committing playbooks with unauthorized callbacks. For debugging, provide approved debug callback that logs verbosely but still respects standardization.

Follow-up: How would you implement callback plugin that tracks infrastructure changes for compliance auditing?