Ansible Interview — Custom Module Development

Your company's infrastructure uses proprietary APIs not supported by Ansible built-in modules. You need to manage these systems via Ansible. Options: use raw shell commands, develop custom modules, or use existing modules workarounds. What's your decision framework for custom module development?

Evaluate three options: 1) Shell/raw commands: quick, unmaintainable, error-prone. Not idempotent. 2) Existing module workarounds: cumbersome, doesn't fully utilize Ansible patterns. 3) Custom module: slightly more effort initially, fully idempotent, reusable. Choose custom module if: API is complex (>5 API calls), will be reused in >3 playbooks, or requires idempotency. Implement module in Python in collection: `collections/ansible_collections/myorg/custom/plugins/modules/api_service.py`. Module should follow Ansible patterns: accept params as module args, return result dict with 'changed' flag. Implement idempotency: check if resource exists before creating. Return diff on changes. Document module with examples. Use AnsibleModule class for boilerplate. Test module with Molecule. For simpler needs (single API call), use debug/uri modules instead. For really complex cases, consider using existing modules as building blocks. Community contribution: if module is useful generally, contribute to Ansible community. Document decision making in architecture docs.

Follow-up: How would you design a custom module API that's intuitive and backwards-compatible?

You developed a custom Ansible module that manages cloud resources. The module works for simple cases but fails on edge cases: network timeouts, partial failures, race conditions. The module's error handling is minimal. Production deployments fail mysteriously. How do you architect robust custom modules?

Implement comprehensive error handling in custom modules. Use try-catch for all API calls. Distinguish error types: transient (network timeout) → retry, permanent (auth failed) → fail. Implement retry logic with exponential backoff: `for attempt in range(3): try: api_call() except TransientError: backoff()`. Use module's `fail_json()` for errors: provides clear error messages. Implement timeouts: don't hang forever waiting for API. Set reasonable timeouts (10s for connection, 30s for API calls). Implement idempotency checks: before API call, query current state. Return no change if already in desired state. Implement dry-run support: `check_mode=True` validates without making changes. Implement change detection: diff between desired/current state, show what will change. Use module's `diff` support to report changes. Implement logging: log API calls and responses for debugging. Use debug mode to output raw API responses. Implement parameter validation: validate module args before API calls. Test edge cases: timeout, partial failure, race conditions. Test module against multiple API versions to ensure compatibility. Document expected behavior and error codes.

Follow-up: How would you implement testing for custom modules that interact with external APIs?

Your custom module needs to run on both Linux and Windows managed nodes. Module code is Python, should work on both. Currently module is Linux-only because it uses Linux-specific modules (os.path). How do you write cross-platform modules?

Use pathlib for cross-platform path handling instead of os.path. Avoid platform-specific code: don't use `os.system()` or shell commands. Use Ansible's subprocess module (`module.run_command()`) for portability. Import platform-specific modules conditionally: `if sys.platform == 'win32': import windows_specific`. Use Ansible's fact variables: `ansible_os_family` to detect OS. In module code, use conditionals based on OS. Document platform support: module docstring specifies supported OSes. Test module on both Linux and Windows: use Molecule with Windows image. For Windows-specific operations, use Ansible's win_* modules as reference. Use Docker for testing: docker images for Linux testing. Use Vagrant or cloud VMs for Windows testing. Use PSGallery instead of pip for Windows PowerShell dependencies. Avoid hardcoded paths: use facts for path discovery. Example: `ansible_user_home` instead of hardcoding `/home/user`. Implement conditional behavior: `changed_when` and `failed_when` may differ between platforms. Document platform-specific behavior. Avoid race conditions: Windows and Linux handle concurrent operations differently. Test throughly on both platforms before release.

Follow-up: How would you implement module parameter validation that works across different Ansible versions?

Your team developed 20 custom modules over 2 years. Ansible updated, introducing breaking changes. Some modules broke. Updating all modules manually is time-consuming. Teams are afraid to upgrade Ansible. How do you manage custom module compatibility?

Implement version compatibility testing. Create CI/CD matrix that tests modules against multiple Ansible versions: 2.9, 2.10, 2.11, etc. Automated testing catches breaking changes immediately. Implement module versioning: track which module versions work with which Ansible versions. Use `module.version_added` to document when module was added. For breaking Ansible changes, implement compatibility layer: detect Ansible version and use appropriate code path. Example: `if ansible_version >= (2.10): use_new_api() else: use_old_api()`. Update modules on Ansible release: test against new version, update compatibility layer if needed. Publish compatibility matrix: document which module versions work with which Ansible versions. For critical modules, maintain multiple versions: v1.0 for Ansible 2.9, v2.0 for Ansible 2.10. Use semantic versioning: major bump for breaking changes, minor for additions. Deprecate old module versions: announce support end date. Provide migration guide for breaking changes. Implement automated compatibility testing in CI: on each Ansible release, run full module test suite. Alert on failures. For teams hesitant on Ansible upgrade: provide compatibility guarantee for 6 months. Backport critical fixes to old Ansible versions.

Follow-up: How would you implement module deprecation lifecycle to guide users to newer versions?

Your custom module makes API calls to external service. Module performance degrades when service is slow: playbook waits 30 seconds for each module call. With 1000-host deployment, this adds 8 hours. How do you optimize module performance?

Implement async support in custom module. Use Ansible's `async` keyword: `async: 300 poll: 0` dispatches module call without waiting. Module returns immediately with async job ID. Playbook continues with other tasks. Later, `async_status` polls for completion. Implement module-level caching: module caches API responses with TTL. If same call made within TTL, return cached result instead of calling API. Use connection pooling: reuse HTTP connections to API instead of creating new connection per call. Implement module batching: instead of 1000 individual API calls, batch into 100 bulk calls. Implement parallel execution: use `forks: 50` to run 50 module calls concurrently. Implement module timeout: if API call takes >10 seconds, timeout and fallback. Implement request coalescing: if multiple playbook tasks call same module with same params, only call API once, return result to all tasks. Use CDN or edge caching: API results cached closer to execution location. Implement module circuit breaker: if API fails repeatedly, stop attempting and fallback. Profile module execution: identify bottlenecks (API latency, network, processing). Test with realistic load: verify performance with 1000 concurrent requests.

Follow-up: How would you implement module result caching with invalidation strategy?

Your custom module handles resource deletion. A playbook bug causes mass deletion: 10,000 resources deleted unintentionally. Module was too permissive. How do you implement safety measures in destructive modules?

Implement multiple safety layers for destructive operations. Layer 1: require explicit `state: absent` parameter—don't default to delete. Layer 2: implement force flag: `force: true` required for deletion to prevent accidental deletion. Layer 3: implement check mode: `--check` preview changes without making them. Module validates without actual deletion. Layer 4: implement dry-run parameter: `dry_run: true` simulates deletion, reports what would be deleted. Layer 5: implement confirmation: prompt for confirmation before deletion. For automation, this can be a parameter: `confirm: true`. Layer 6: implement audit: log all deletions with timestamp, user, reason. Layer 7: implement retention: keep deleted resources for grace period (24 hours) before permanent deletion. Layer 8: implement limits: prevent deletion of >100 resources in single call. Force batch operations. Layer 9: implement protection flags: resources can be marked protected, deletion fails if protected. Layer 10: implement approval workflow: critical deletions require approval in Tower. Document deletion behavior clearly. Test deletion carefully with test data. Implement rollback capability: if deletion fails, don't partially succeed. Either all delete or all succeed.

Follow-up: How would you implement module documentation automatically from code docstrings?

Your custom modules are used by 100 engineers. Module behavior is undocumented. Engineers make incorrect assumptions causing deployment failures. Module implementation is opaque. How do you make modules maintainable and understandable?

Implement module documentation as first-class artifact. Use module's DOCUMENTATION dict (Ansible convention) to document all parameters, examples, return values. Use YAML format: parameters with types, descriptions, defaults, required flag. Generate docs from YAML: use `ansible-doc` to display docs inline. Document examples: show common usage patterns. Document limitations: explain what module can't do. Document error codes: explain what each error means. Implement module inline comments: explain complex logic. Document assumptions: what does module assume about system state? Implement changelog: document module changes with each release. Use semantic versioning: major.minor.patch. Implement module versioning: `module_version` fact available in playbook. Document backwards compatibility: which module versions work with which Ansible versions. Implement module testing as documentation: test cases show expected usage. Document performance characteristics: module execution time. Document security considerations: does module expose secrets? Document integration points: what systems does module interact with? Use autodoc tools: automatically generate HTML docs from module YAML. Host docs on wiki. Implement module examples in playbooks: realistic playbooks show usage. Use tool like ansible-doc-generator to create comprehensive docs.

Follow-up: How would you implement module linting to enforce code quality standards?

Your company shares custom modules with partners via GitHub. A critical security vulnerability discovered in module: it doesn't validate API responses, allowing injection attacks. Partners deployed vulnerable module without knowing. How do you implement security in module development and distribution?

Implement security in module design and distribution. Design: 1) Validate all inputs and API responses (never trust external data), 2) Sanitize output to prevent injection, 3) Use HTTPS for all communication, 4) Implement authentication verification, 5) Encrypt sensitive data in transit/at rest. Testing: 1) Security-focused unit tests validating input validation, 2) Fuzz testing with invalid inputs, 3) OWASP vulnerability scanning, 4) Dependency scanning for vulnerable libraries. Implementation: 1) Use bandit security scanner in CI/CD, 2) Require security code review before release, 3) Implement static analysis tools (flake8, pylint). Distribution: 1) Sign modules with GPG, 2) Publish checksums for integrity verification, 3) Document security properties in module docs, 4) Implement version pinning: don't auto-upgrade to latest (users must choose). Security response: 1) Incident response plan for vulnerabilities, 2) Security advisory channel, 3) Automated patch notification system, 4) Coordinate disclosure process (notify users before public announcement). Testing: 1) Security test cases in Molecule, 2) Penetration testing by external team, 3) Threat modeling during design. Documentation: 1) Security considerations in module docs, 2) Known vulnerabilities and workarounds.

Follow-up: How would you implement module versioning strategy that doesn't break existing deployments?