Ansible Interview — Windows and WinRM Management

Your organization manages 500 Windows servers via Ansible using WinRM. Connection failures are common: WinRM listeners not responding, SSL certificate validation fails, credentials invalid. Troubleshooting WinRM issues is time-consuming. How do you diagnose and resolve WinRM connection problems?

Implement systematic WinRM troubleshooting. First: verify WinRM is running on target Windows server: `Get-Service WinRM`. If stopped, start it: `Start-Service WinRM`. Check WinRM listeners are configured: `winrm enumerate winrm/config/listener` should show HTTP/HTTPS listener on port 5985/5986. Check firewall allows WinRM ports: `netsh advfirewall firewall show rule name="Windows Remote Management*"`. For SSL certificate issues: verify certificate validity: `Get-ChildItem cert:\LocalMachine\My | where Subject -like "*HOSTNAME*"`. For certificate thumbprint mismatches, update WinRM listener: `Set-Item -Path WSMan:\localhost\Listener\Listener_*\CertificateThumbprint -Value NEWHASH`. Test connectivity from Ansible controller: `winrm invoke Create http://TARGET:5985/wsman` using curl or pywinrm. Verify credentials: test username/password locally on server. For Kerberos auth, verify SPNs registered: `setspn -L HOSTNAME`. For encryption issues, verify: `winrm get winrm/config/service` shows AllowUnencrypted=false in production. Create playbook that validates WinRM configuration, runs pre-flight checks. Document common WinRM issues and solutions in runbook.

Follow-up: How would you implement automated WinRM health checks to catch issues before deployment?

Your Ansible WinRM connections to 500 Windows servers often timeout or hang, particularly during mass deployments. Connection pool exhaustion causes 10% of jobs to fail. How do you optimize WinRM connection performance and reliability?

Implement WinRM connection optimization. Configure connection pooling: `ansible_connection_timeout: 30` (shorter timeout fails faster, allowing retry). Use `ansible_winrm_operation_timeout_sec: 60` for slower operations (default 20s). Enable WinRM HTTP Keep-Alive: reduces connection overhead. Configure multiple WinRM listeners for redundancy: HTTP on 5985, HTTPS on 5986. Use HTTPS for better performance (less overhead than HTTP). Increase WinRM service concurrency: `Set-Item -Path WSMan:\localhost\Shell\MaxShellsPerUser -Value 100` (default 10). Increase shell quota: `Set-Item -Path WSMan:\localhost\Service\MaxConcurrentOperations -Value 4294967295` (unlimited). Use `forks: 20` to limit concurrent connections (don't overwhelm server). Implement connection reuse: persistent SSH multiplexing for better performance. For large deployments, use serial batching: `serial: 100` to batch deployments, avoiding connection surge. Implement DNS caching: reduce DNS lookup overhead. Test WinRM performance: time a simple command execution with different configurations. Monitor connection pool: alert if connection errors exceed threshold. Implement circuit breaker: if WinRM server consistently slow, route jobs to alternate server.

Follow-up: How would you implement WinRM request prioritization where critical jobs get priority?

Your Windows servers require specific PowerShell execution policies. Some servers have "Restricted" policy (disables all scripts), preventing Ansible from running PS scripts. Changing execution policy requires admin privileges and registry modifications. How do you handle PowerShell execution policies in Ansible?

Implement PowerShell execution policy management via Ansible. Use `win_powershell` module with elevated privileges (RunAs admin). Create setup playbook that configures execution policy: `Set-ExecutionPolicy RemoteSigned -Force -Scope LocalMachine`. For multiple servers, create baseline playbook that runs first. Understand execution policy levels: `Restricted` (no scripts), `RemoteSigned` (local scripts OK, remote requires signature), `Unrestricted` (any script). Use `RemoteSigned` as standard for production. For security, use script signing: sign scripts with certificate, only signed scripts execute. Implement certificate-based execution: sign all Ansible-generated scripts with organizational certificate. For quick testing, use `Bypass` temporarily (not production-safe). Use `win_shell` module instead of `win_powershell` to avoid policy: shell modules bypass some policies. For compliance: audit execution policy regularly, alert if policy relaxed. Store execution policy baseline in code: document what policy each server should have. Implement pre-flight checks: verify execution policy supports required operations before running Ansible. Create remediation playbook: detects non-compliant policy, fixes it. Document execution policy strategy: why chosen policy, security implications.

Follow-up: How would you implement code signing for PowerShell scripts in Ansible?

Your Ansible Windows playbook runs PowerShell scripts that need to interact with remote systems: query AD, connect to SQL databases, call APIs. Scripts need credentials but hardcoding them is a security risk. How do you securely pass credentials to Windows PowerShell scripts?

Implement secure credential handling for PowerShell. Never hardcode credentials in scripts. Use Ansible Vault: store credentials in vault files, reference in playbook: `{{ vault_db_password }}`. Create PowerShell variables from Ansible: `$dbPassword = "{{ vault_db_password }}"`. However, this exposes in process memory. Better: use Windows Credential Manager: pre-store credentials in CredentialManager, PowerShell retrieves from there without hardcoding. Use managed service accounts: Windows services run under MSA accounts with credentials managed by Windows, scripts don't need credentials. Use ActiveDirectory integration: scripts authenticate using Windows auth (Kerberos), no explicit credentials needed. For API calls: use OAuth tokens stored in vault, not passwords. Use WinRM encryption: ensure WinRM communication encrypted (HTTPS), prevents credential exposure in transit. Use `no_log: true` on tasks handling credentials to prevent logging. Create secure credential retrieval function: PowerShell function that retrieves credentials from Credential Manager, called by scripts. Test: verify credentials don't appear in logs or playbook output. Document credential handling strategy. Implement credential rotation: periodically rotate passwords, update vault. For long-lived credentials (service accounts), use MFA/certificates if possible.

Follow-up: How would you implement service account provisioning via Ansible?

Your Windows servers run diverse applications with different requirements: some need .NET Framework 4.7, others need 4.8. Some servers have conflicting requirements. Ansible playbook needs to install right .NET version per server without conflicts. How do you manage complex Windows dependencies?

Implement dependency management via inventory groups and conditionals. Create inventory groups per .NET requirement: `dotnet_47_required`, `dotnet_48_required`. Assign servers to groups based on application needs. Create separate plays per group: Play 1 targets `dotnet_47_required`, installs .NET 4.7. Play 2 targets `dotnet_48_required`, installs .NET 4.8. Use conflict detection: if server in both groups, trigger error or manual resolution. Implement version checking: pre-playbook, check current .NET version on each server. Use `set_fact` to determine required version. Implement idempotency: don't reinstall if version already present. Use `win_dotnet_ngen` to optimize .NET after installation. For complex dependency trees, use role dependencies: create `dotnet_base` role depended on by `app_specific` roles. Implement testing per server type: validate dependency installation before production. Create dependency matrix: document which app requires which .NET version. Implement conflict resolution: if incompatible dependencies detected, fail playbook and alert for manual intervention. Use Windows package manager `chocolatey` for consistent package management. Implement side-by-side .NET installation: multiple .NET versions can coexist (4.x). Implement monitoring: alert if required dependencies missing.

Follow-up: How would you implement Windows package updates while maintaining application compatibility?

Your Windows Ansible playbook uses `win_dsc` (Desired State Configuration) modules for complex system configuration. DSC is powerful but opaque: when DSC configuration fails, error messages are unclear. Debugging DSC issues is difficult. How do you troubleshoot Windows DSC configurations?

Implement DSC troubleshooting systematically. Enable DSC logging: `Update-DscLocalConfigurationManager -Path C:\Dsc` to turn on debugging. Check DSC event logs: Event Viewer → Applications and Services Logs → Microsoft → Windows → Desired State Configuration → Operational. View detailed DSC status: `Get-DscLocalConfigurationManager` shows configuration status. Test DSC manually: run DSC configuration locally on server to test. Use `Test-DscConfiguration` to verify if system meets desired state. For failed resources, run `Get-DscConfiguration` to see actual state vs. desired. Enable verbose logging: add `LogPath = "C:\Dsc\Logs"` to DSC configuration. Check Resource Kit: ensure DSC resources are installed, latest version. For complex DSC, test locally first before Ansible integration. Implement DSC validation: Ansible task that runs `Test-DscConfiguration` post-DSC to verify success. For DSC failures, use rescue blocks to capture detailed error state. Document common DSC issues and resolutions. Implement monitoring: alert if DSC configuration drifts (unintended changes). Use Ansible loops to apply DSC to multiple resources, catch per-resource failures. Implement fallback: if DSC fails, fallback to `win_powershell` manual configuration.

Follow-up: How would you implement DSC idempotency validation?

Your Ansible deployment includes Windows servers that need certificates updated. Certificate installation, verification, and binding to services is complex. Ansible doesn't have built-in Windows certificate management for complex scenarios. How do you implement secure certificate management on Windows?

Implement Windows certificate management using combination of modules and PowerShell. Use `win_certificate` module for basic cert import: uploads certificate from control node to Windows cert store. Use `win_certificate_thumbprint` to manage IIS certificate bindings. For complex cert operations, use custom PowerShell scripts via `win_powershell` module. Script tasks: 1) Check if cert exists by thumbprint, 2) If exists, compare expiry, 3) If expired or missing, import new cert, 4) Bind to IIS site/service. Implement certificate renewal process: pre-generate new cert, import to server, test, then activate (swap binding). Use Let's Encrypt for automatic renewal: `win_certbot` module handles acme renewals. Implement certificate validation: verify cert is valid (not expired, proper chain) before using. Store certificates securely: use cert storage paths with restricted permissions (SYSTEM only). Implement cert backup: backup existing cert before replacement (recovery if new cert fails). Implement monitoring: alert if cert expires within 30 days, triggering renewal. Use Tower workflows for cert renewal: Tower manages cert updates across fleet. Test cert updates in staging first. Implement rollback: keep previous cert, quick switch back if new cert causes issues. Document certificate management procedure, renewal schedule.

Follow-up: How would you implement multi-domain certificate (SAN) management at scale?

Your organization manages Windows and Linux servers via Ansible. Some playbooks run on both Windows and Linux (e.g., creating users, setting firewall rules). Platform-specific tasks are interspersed with shared tasks. Playbooks are hard to maintain, duplicating logic. How do you write multi-platform playbooks?

Implement platform-agnostic playbook architecture. Create roles per platform: `roles/user_create_linux/`, `roles/user_create_windows/`. Main playbook includes appropriate role based on platform: `include_role: name: "user_create_{{ ansible_os_family | lower }}"`. Create shared data structures: user list is common format, but tasks differ per platform. Implement conditionals per task: `when: ansible_os_family == "Windows"` or `when: ansible_os_family == "RedHat"`. Document platform-specific behavior: table showing which role/task runs on which platform. Use Ansible facts to detect platform: `ansible_os_family`, `ansible_system`. Create fallback roles: if platform not detected, use generic role. For shared logic (restarting services), use platform-agnostic module names: `service` module works on both Windows/Linux (abstraction layer). Use `win_service` for Windows-only features. Create multi-platform test matrix: Molecule tests on both Windows and Linux images. Use `loop` over platforms in CI/CD to test all. For documentation: create matrix showing which playbook targets which platforms. Implement platform-specific inventories: `inventory/windows/`, `inventory/linux/`. Implement wrapper playbook that detects platform, runs appropriate sub-playbook. Test thoroughly on all platforms: behavior should be equivalent cross-platform.

Follow-up: How would you implement Windows Update automation with Ansible while maintaining uptime?