Your static inventory file has grown to 500 servers, but your cloud infrastructure auto-scales to 2000 servers during peak traffic and shrinks to 200 at night. Manual inventory updates are causing deployment failures. How do you implement dynamic inventory to track these changes automatically?
Implement cloud-native dynamic inventory plugins. For AWS, use the `aws_ec2` inventory plugin to query instances by tag, VPC, or security group in real-time. Configure it in inventory.aws_ec2.yaml with dynamic filters like `filters: ['tag:Environment', 'tag:Role']. For multi-cloud, use cloud-init metadata servers or implement a custom Python inventory script that queries your infrastructure provider's API. Set up caching with `cache_plugin: jsonfile` and `cache_timeout: 300` to avoid constant API calls during large deployments. Implement inventory refresh before playbook execution using `--refresh-inventory`. Monitor inventory plugin latency and add alerting if inventory queries exceed expected duration, indicating API issues.
Follow-up: How would you handle inventory refresh during a large cluster deployment where nodes are being added mid-playbook?
Your Ansible Tower deployment needs to sync inventory from multiple sources: AWS EC2 for compute, Kubernetes API for container hosts, and a custom CMDB database for legacy systems. These sources have different update frequencies and formats. How do you unify them?
Create a meta-inventory plugin that aggregates multiple sources. Implement a custom Python inventory script that: 1) queries AWS EC2 API with caching, 2) pulls Kubernetes nodes via kuberenetes.io inventory plugin, 3) fetches legacy systems from CMDB with fallback handling. Use Ansible's inventory merging to combine results. Set different cache TTLs per source (AWS: 5 min, K8s: 30 sec, CMDB: 30 min). Implement source-specific grouping using `compose` plugin to add group metadata from each source. Handle conflicts when the same hostname appears in multiple sources by implementing priority rules (Kubernetes nodes take precedence over legacy). Use Ansible Tower's inventory sync job templates to refresh at staggered intervals, preventing API rate limits.
Follow-up: How would you debug inventory inconsistencies when a server appears in AWS but not in playbook execution?
Your inventory plugin queries AWS EC2 API for 5000 instances, but the inventory build takes 45 seconds before any plays execute. This delays deployments critically. How do you optimize inventory performance?
Implement aggressive caching: set `cache_plugin: jsonfile` with `cache_timeout: 600` seconds. Use inventory filtering in the plugin configuration to reduce scope—query only relevant instances instead of all 5000. Example: `filters: ['state', 'running']` and `filters: ['tag:Ansible', 'true']`. Implement pagination in API queries to avoid timeout. Use inventory compose to pre-compute groups at inventory build time rather than during task execution. Enable `expand_hostvars: false` to avoid loading all host variables initially. Split large inventories by environment using separate inventory files per region/environment, executed in parallel. Implement inventory caching warming in a background job that pre-builds inventory before peak usage. Monitor inventory query latency and add CloudWatch alarms for slow API responses.
Follow-up: How would you implement inventory-as-code so team members can modify inventory through pull requests with validation?
Your dynamic inventory queries AWS, but during an AWS API outage, Ansible playbooks fail immediately because the inventory build fails. How do you implement resilience for inventory sourcing?
Implement multi-layered fallback strategy: 1) Primary: AWS API query with 30-second timeout, 2) Secondary: cached inventory file with timestamp validation, 3) Tertiary: last-known-good inventory snapshot. Configure this in the inventory plugin with error handling. Use `strict: false` in ansible.cfg to continue if inventory queries partially fail. Implement circuit breaker pattern: if AWS API fails 3 consecutive times, automatically switch to cached inventory for 10 minutes before retrying. Monitor inventory source health and alert on failures. Pre-populate cache before outages during maintenance windows. For critical deployments, use `validate_inventory.yml` pre-flight check that verifies connectivity to inventory sources and required hosts are present before proceeding with the playbook.
Follow-up: How would you implement read-only access to inventory APIs while still allowing Ansible to update server state?
Your team manages Ansible across multiple AWS accounts, each with hundreds of servers. Implementing separate inventory files per account creates maintenance overhead. How do you implement cross-account inventory with least privilege?
Implement a hierarchical inventory strategy using AWS organizations and cross-account IAM roles. Create an inventory plugin that assumes roles in member accounts to query EC2 in each. Store cross-account role ARNs in a central configuration. Use Ansible's assume role plugin to dynamically switch roles per account. Group inventory by account and environment using compose plugin: `groups: { account_aws_account_number: true }`. Implement least privilege by creating minimal IAM roles for Ansible that only permit ec2:DescribeInstances action. Centralize inventory in a master account where Tower/AWX runs, with cross-account role assumption. Use inventory sync job templates in Tower to refresh all accounts on schedule. Monitor cross-account API calls and alert on failure.
Follow-up: How would you handle inventory conflicts when instances in multiple accounts have the same hostname?
Your inventory contains sensitive information: server roles, IP addresses, application versions. When developers run `ansible-inventory --list`, they see entire infrastructure. How do you implement inventory access control?
Implement Ansible Tower's RBAC system which provides inventory-level access control. Create inventory objects in Tower and assign team/user permissions at the inventory level, per-playbook, or per-host. Developers only see hosts their team manages. Alternatively, implement a wrapper script around ansible-inventory that filters output based on user identity and permissions from an external system (LDAP, OAuth2). Use Tower's organizations and teams to segment inventory access. Filter inventory at the source using IAM policies—developers' AWS credentials only permit describing instances in their environment. For dynamic inventories, implement query filters in the plugin based on user context. Encrypt sensitive host_vars and use Ansible Vault per-environment. Audit inventory access by logging ansible-inventory commands to CloudWatch/ELK.
Follow-up: How would you implement inventory-as-data for disaster recovery, where you can rebuild your infrastructure from inventory metadata?
Your Kubernetes cluster runs 500 pods that are constantly being created and destroyed. You need Ansible to monitor and manage these ephemeral workloads. The kubernetes.io inventory plugin works but doesn't update frequently enough—pods disappear before Ansible notices. How do you solve this?
Implement event-driven inventory updates using Kubernetes watches instead of polling. Create a custom Python inventory script that uses Kubernetes client library's watch feature to get real-time pod events. Additionally, run `kubectl get pods --watch` in background and maintain a real-time inventory cache. For Tower, implement continuous inventory refresh with short intervals (10-30 seconds) specifically for Kubernetes. Use Kubernetes labels and selectors to group pods dynamically—these become Ansible groups. Implement health checks in inventory plugin to verify pod connectivity before including in playbook targets. Use Ansible's `serial` strategy with inventory refresh to ensure fresh pod list for each batch of operations. Consider using Helm with Ansible for infrastructure-as-code, which provides better event integration.
Follow-up: How would you implement blue-green deployment using dynamic inventory where you switch traffic between two pod groups?
Your inventory includes 10,000 hosts across global regions. Playbooks that loop over entire inventory timeout because of network latency. You need better performance without losing inventory coverage. What's your strategy?
Implement inventory sharding and parallel execution. Split inventory by region using group filters—run separate playbook executions per region with parallelism. Use Ansible Tower to execute playbooks against regional groups concurrently. Implement local execution where possible: use delegation to run tasks on local agents per region rather than from central Ansible controller. Enable fact caching with Redis backend to avoid redundant fact gathering. Use `gather_subset: min` to collect only required facts initially. Implement batch execution with `serial: 100` to process hosts in chunks, allowing progress monitoring. Use asynchronous tasks with `async` and `poll` to avoid blocking on slow hosts. Monitor network latency to inventory sources and consider adding regional cache layers or Edge cache for inventory metadata.
Follow-up: How would you implement self-healing inventory where unreachable hosts are automatically removed and re-queried?