GitHub Actions Interview — Secrets, Environments, and Deployment Rules

Your team has production API keys stored in GitHub Secrets. A junior engineer accidentally pushed a workflow file that logs all environment variables. The API key is now visible in a public repo's Actions logs. You have 10 minutes to contain the breach.

Immediate steps: (1) Revoke the exposed API key immediately. GitHub Actions masks known secrets in logs, but custom env vars aren't masked—echo statements can leak them. (2) Rotate all API keys that were logged. (3) Check CloudTrail/logs for unauthorized access using the old key. (4) Implement secrets best practices going forward: (a) never log env vars; use GitHub's built-in masking for custom outputs: `echo "::add-mask::[value]"`. (b) Use GitHub Secrets for sensitive data; they're stored encrypted and automatically masked in logs. (c) Implement a policy: no `echo` or `print` statements for secrets. (d) Use static analysis: scan workflows in PRs for risky patterns (env var logging, hardcoded secrets). (5) For long-term: migrate to OIDC/federated auth instead of API keys—tokens are short-lived (15 min) and auto-rotated, limiting blast radius. (6) Enable branch protection: require code review before merging workflow changes. (7) Restrict who can modify workflows: only senior engineers should touch CI/CD.

Follow-up: How would you implement a GitHub Action that prevents workflows from logging secrets?

Your organization has 5 production environments: dev, staging, pre-prod, prod-us, prod-eu. Each has different API keys, database credentials, and deployment targets. You want to ensure that only the staging service can deploy to staging, and prod deployments require a specific team's approval. How do you structure this?

Use GitHub Environments with protection rules: (1) Create an environment for each deployment target (dev, staging, prod-us, prod-eu). Each environment has its own secrets scoped to that environment. (2) Environment secrets override repository secrets: if you have `API_KEY` in both repo and environment scope, the environment version is used. (3) Set up protection rules for prod environments: `required-reviewers: [senior-engineers]`, `wait-timer: 15` (15-minute delay before deployment, allowing review). (4) Use deployment branches: restrict prod deployments to only main branch via environment settings. (5) In your workflow, gate deployment on the environment: `deployment: environment: prod-us`. (6) Combine with OIDC: have environment-scoped AWS roles: `prod-deployment-role-us` for prod-us env, `staging-deployment-role` for staging env. This ensures least-privilege access per environment. (7) Audit: GitHub logs all deployments and approvals; review monthly who deployed what and who approved.

Follow-up: Design a deployment approval workflow that requires two different people to approve production deployments.

You set up environment secrets and protection rules. A workflow deployment to prod requires approval. An engineer triggers the deployment, and a reviewer approves it. But when the deployment actually runs, it fails because the environment secret isn't populated. The secret shows as configured, but the job can't access it. Why?

Possible issues: (1) Secret scope mismatch: the secret is defined at the repo level, not the environment level. Environment secrets are inherited by the job if the job targets that environment. Check: in workflow, confirm `environment: prod-us`. (2) The workflow runs on a branch that isn't allowed for that environment. Some environments restrict to specific branches (e.g., only main). If the workflow runs on a feature branch, environment secrets aren't available. (3) The secret was added after the environment was created; the UI sometimes requires a refresh. (4) For pull_request events: environment secrets aren't available in PR workflows for security. If the workflow runs on `pull_request`, environment secrets won't populate even if PR is to main. Use `workflow_run` or manual `workflow_dispatch` to access environment secrets. (5) Check job context: the secret might require a specific runner type (e.g., only self-hosted runners get access). Review environment settings: "Allow public repositories" and "Allow deployments from branches with source control protected by required checks." If strict settings are enabled, they might restrict secret access.

Follow-up: How would you debug environment secret availability during a workflow run?

Your team has 100+ secrets across dev and prod environments. Rotating all of them manually every quarter takes a full day. You want to automate secret rotation, but GitHub doesn't have a built-in way to update secrets via API in workflows. How do you automate this?

GitHub's REST API allows updating secrets programmatically, but it requires admin credentials (a bot token or GitHub App). Strategy: (1) Create a GitHub App with `administration:write` permission scoped to your org. This app can update secrets. (2) In a workflow, authenticate as the app: `github-app-token-action` generates a token. (3) Use the REST API to update secrets: `curl -X PUT https://api.github.com/repos/org/repo/actions/secrets/API_KEY -H "Authorization: Bearer [app-token]" -d '{"encrypted_value": "[new-secret]", "key_id": "[key-id]"}'`. (4) For rotation, integrate with your secrets manager (AWS Secrets Manager, HashiCorp Vault). The workflow: call your secrets manager API → generate new credentials → use GitHub API to update → test the new credentials → confirm rotation. (5) Alternatively, don't store long-lived secrets in GitHub. Use OIDC to assume roles (AWS, GCP, Azure) that auto-rotate credentials. (6) For dev environments, use temporary credentials with automatic expiry (24 hours). Rotate them daily; no manual action needed. (7) Implement a schedule: `schedule: cron: '0 2 15 * *'` (2 AM on the 15th) triggers a monthly rotation workflow.

Follow-up: Design a secret rotation system that uses a GitHub App and AWS Secrets Manager to keep credentials up-to-date automatically.

A contractor worked on your project and had access to production secrets through a GitHub environment. Their contract ended last week, but they still have push access to the repo. They could potentially create a workflow that exfiltrates production credentials. You need to immediately revoke their access to secrets.

Immediate actions: (1) Remove the contractor from the GitHub org/team. This revokes all repo access. (2) Rotate all production secrets they had access to—API keys, database passwords, deployment tokens. (3) Audit their commits and workflow runs: did they create any suspicious workflows or deploy anything unusual? Check CloudTrail for actions under their account. (4) If they have local clones of repos, those clones still have git history with potentially sensitive info (if secrets were ever committed before being rotated). Force-push a clean history or require all team members to re-clone. (5) Going forward, implement: environment protection rules that require approval for prod deployments (so a rogue workflow can't just run without review), IP restrictions if using self-hosted runners (contractors can't deploy from home IP), and deployment branches (only certain branches can deploy to prod). (6) Use OIDC instead of static credentials: even if someone gets push access, they can't deploy to prod without a valid OIDC token + an AWS/GCP/Azure role. (7) Implement JIT (Just-In-Time) access: production secrets are only available during the approval window, not stored permanently in GitHub.

Follow-up: Design a JIT access system where production secrets are only available for 30 minutes after approval, then auto-revoke.

Your team uses GitHub Secrets for production database credentials. You have 50 services deploying to prod. Each service needs the same database password. Updating the password requires updating 50 secrets in 50 repos manually. A better approach?

Don't store the same secret in 50 places. Use a centralized secrets manager: (1) Store the database password in AWS Secrets Manager or HashiCorp Vault once. (2) Each service's workflow authenticates to AWS/Vault via OIDC and fetches the credential at runtime. (3) In the workflow: `aws secretsmanager get-secret-value --secret-id prod/db-password`. The secret is fetched fresh each run, always up-to-date. When you rotate the password in Secrets Manager, all 50 services automatically pick up the new value on their next run. (4) If you can't use a secrets manager, use a reusable workflow that stores the secret centrally. But this still requires storing it somewhere—not ideal. (5) Alternative: use GitHub's org-level secrets (available in Enterprise). Org secrets can be inherited by all repos in the org. A single update propagates everywhere. (6) For non-Enterprise orgs, use a bot token with the ability to update repo secrets via the GitHub API: the bot stores one secret, and on update, it uses the API to push to all 50 repos simultaneously. (7) Best practice: minimize credential sharing. Instead of all services sharing one database user, create a service-specific database user per service with minimal permissions. Rotate each independently.

Follow-up: Design a credential distribution system where a central team manages secrets and all services fetch them at runtime.

Your organization recently enabled "GitHub Secret Scanning" which detects when secrets are accidentally committed. A scan detected a leaked API key in a commit from 3 months ago. The key has been rotated, but you want to understand: what's the detection accuracy? Are there false positives? How many secrets might have been missed?

GitHub's secret scanning has high precision (few false positives) but may miss secrets not in its detection patterns. (1) Audit the findings: check if all detected secrets were actually sensitive (low false-positive rate for common secrets like AWS keys, GitHub PATs). (2) Check for missed secrets: GitHub only detects patterns it knows about (AWS, GitHub, Stripe, etc.). Custom secrets (internal API keys, auth tokens) won't be detected. Implement custom patterns via security settings. (3) Extend detection: use a tool like `git-secrets` or `truffleHog` to scan your entire history for additional patterns (internal auth tokens, database credentials). (4) For secrets found, validate: was it ever used? Check audit logs/CloudTrail for activity using that key. If never used, low risk; if used, high priority to rotate. (5) Remediate: rotate all found secrets, force-push history to remove them (or use `git filter-repo`), and implement pre-commit hooks to prevent future leaks. (6) Going forward: require all engineers to install client-side secret scanning (git-secrets hook). GitHub's server-side scanning is a safety net, not the primary defense.

Follow-up: How would you implement a custom GitHub secret scanning pattern for your organization's internal API keys?

You implement environment protection rules requiring approval before deploying to prod. An urgent production bug needs to be fixed NOW. The on-call engineer triggers a prod deployment but is waiting for approval (requires a second person, unavailable for 20 minutes). The business is losing $10K/minute. The engineer asks: "Can I just skip approval this once?" What's your answer?

No. Approval rules exist for good reasons—to prevent mistakes that cost more than the current outage. However, optimize for urgency without sacrificing safety: (1) Have an escalation path: instead of waiting 20 minutes for any person to approve, designate an on-call reviewer who's always available. They approve in <2 minutes. (2) Implement a "skip approval" workflow that requires authentication from a hardware security key (FIDO2). This is unbypassable even by admins, but faster than finding another person. (3) Use a waiting period instead of a required reviewer: `wait-timer: 5` means the deployment automatically proceeds after 5 minutes if nobody objects. This gives time for review without blocking forever. (4) Implement progressive deployment: deploy to a canary environment first (no approval), monitor for errors, then deploy to production after 15 minutes (auto-approval). If canary fails, human must approve or fix before prod. (5) For true emergencies, have a breakglass procedure: deploy manually outside CI/CD with a post-mortem afterward. (6) Root cause: if prod bugs are urgent, your dev/staging process is broken. Invest in better testing/staging so bugs don't reach prod.

Follow-up: Design a deployment approval system that balances security with on-call responsiveness.