Your team stores AWS credentials in GitHub Secrets to deploy applications. During a routine audit, you realize 80+ workflows have access to the same AWS IAM user credentials, all with AdministratorAccess. A junior engineer accidentally leaks those credentials in a GitHub issue. You have 15 minutes to contain the blast radius.
Static credentials are a single point of failure. Immediately rotate the AWS IAM user, invalidate existing access keys, and revoke any tokens. Then migrate to OIDC (OpenID Connect): GitHub Actions can request short-lived STS tokens from AWS without storing credentials. Configure GitHub's OIDC provider in AWS IAM, create a role with the minimum permissions needed per workflow (e.g., ec2:DescribeInstances for deployment, s3:PutObject for artifact uploads), and use role assumption in your workflow. Each job gets a unique token valid for only ~15 minutes, scoped by repository and branch. Cost: $0. Security: credentials are never stored, rotated automatically, auditable in CloudTrail. To migrate: swap `aws-configure-credentials` for `aws-actions/configure-aws-credentials@v4` with `role-to-assume`, remove all static secrets. This shrinks your attack surface from "anyone with repo access to AWS" to "anyone with workflow modification access to specific repos."
Follow-up: How would you prevent a compromised workflow from assuming a high-privilege AWS role?
You've set up OIDC for AWS deployments. A workflow tries to assume an AWS role but fails with "not authorized to perform: sts:AssumeRoleWithWebIdentity". The workflow was working yesterday. You didn't change the trust policy, and OIDC is still configured in AWS. What's the issue?
Several possibilities: (1) The GitHub OIDC thumbprint changed—AWS caches GitHub's OIDC provider certificate. GitHub rotates certs annually; if AWS's cached thumbprint is stale, token validation fails. Check the OIDC provider in IAM: if the thumbprint is outdated, update it manually or use a script to fetch the current one. (2) The trust policy doesn't match the token claims. The token includes `repo:[org]/[repo]:ref:[branch]`, but the trust policy might require `repo:[org]/[repo]:ref:refs/heads/main` (exact path matching). Check the trust policy's conditions; they should use `StringLike` not `StringEquals` unless you want exact matching. (3) The audience (aud) claim is wrong—if you specified a custom audience in the workflow action, AWS must trust that exact audience. Verify the workflow passes `audience: aws://iam.amazonaws.com` (or your custom value) and AWS IAM role's trust policy includes it.
Follow-up: How would you automate updating GitHub's OIDC thumbprint in AWS before it expires?
Your organization runs 500+ microservices across AWS, GCP, and Azure. Each service team creates their own AWS roles, GCP service accounts, and Azure managed identities for their deployments. You now have 2,000+ credentials to manage. Auditing which service has access to which cloud resources is a nightmare. The security team demands a unified, auditable solution.
Use OIDC for all three clouds. Each cloud provider supports OpenID Connect—GitHub is a trusted identity provider. Configure GitHub's OIDC endpoint in each cloud (AWS IAM, GCP Workload Identity Federation, Azure AD). For each microservice, create a minimal role/service account scoped to only what that service needs: e.g., `service-auth-svc` has `s3:GetObject` on its specific bucket, `service-api-svc` has `compute.instances.get` on its instances. In each workflow, use the cloud-specific action to assume the role (`aws-actions/configure-aws-credentials`, `google-github-actions/auth`, `azure/login`). This gives you: (1) zero stored credentials, (2) automatic audit trail in each cloud's logs (who, when, from which repo), (3) fine-grained RBAC per service, (4) cost: $0 once configured. Downside: initial setup per cloud (1-2 days), but pays off after 6 months in reduced credential rotation overhead.
Follow-up: Design a policy framework where only approved repos can deploy to production AWS accounts via OIDC.
You've implemented OIDC for AWS, GCP, and Azure across your monorepo. A security audit finds that ANY workflow in the repo can assume ANY role in all three clouds. An engineer in the frontend team accidentally deployed frontend code to the production database cluster. You need to retrofit least-privilege access retroactively without disrupting existing CI/CD.
The issue: your OIDC trust policies are too permissive (e.g., `repo:myorg/*:*` allows all repos/branches). Implement fine-grained conditions: (1) Repository filtering: trust only specific repos, e.g., `repo:myorg/backend:*` for backend roles. (2) Branch/ref filtering: production roles only trust main branch, e.g., `ref:refs/heads/main`. (3) Actor filtering: require PRs to be reviewed before deployment, e.g., `actor:[deployment-bot]`. (4) Workflow filtering: new in 2024, GitHub's OIDC includes `workflow` claim—trust only specific workflows, e.g., `workflow:deploy.yml`. Update each role's trust policy in AWS/GCP/Azure. Migrate incrementally: create new roles with strict conditions, route new workflows to them, then deprecate old permissive roles after 30 days. For immediate urgency, you can use GitHub's environment protection rules: deployments to prod require approval and only allow specific branches—this gates access at the workflow level before reaching OIDC.
Follow-up: How would you audit which teams have deployed to production in the last 30 days using cloud provider logs?
Your organization recently migrated from static AWS credentials to OIDC. During the migration, a developer added OIDC token handling to a custom Python script that runs in GitHub Actions. The script logs the entire environment (for debugging) which includes `GITHUB_TOKEN`, `ACTIONS_ID_TOKEN_REQUEST_TOKEN`, and other sensitive values. The logs are public because the repo is public.
This is an information disclosure vulnerability. The `ACTIONS_ID_TOKEN_REQUEST_TOKEN` is a short-lived token (valid ~5 minutes) that lets anyone exchange it for the OIDC token using the `ACTIONS_ID_TOKEN_REQUEST_URL` endpoint. If exposed, an attacker could request new OIDC tokens and assume your AWS roles. Immediate mitigation: (1) GitHub Actions masks known secrets in logs, but custom variables aren't masked—disable debug logging for this workflow. (2) Configure output masking: wrap sensitive outputs in GitHub's masking syntax to prevent accidental exposure. (3) Don't echo environment variables; pass tokens only to secure locations (AWS, etc.). For permanent fix: (1) audit all workflows for environment variable logging, (2) enforce a security policy: "no debug output in public repos," (3) implement token expiration shorter than log retention—GitHub logs are deleted after 90 days, so a 1-hour token is safer than a 24-hour token.
Follow-up: How would you implement a GitHub Action that validates OIDC tokens before accepting them?
Your team enabled OIDC for GCP and created a Workload Identity Federation pool. The first 100 deployments worked. On day 3, new workflows start failing with "PERMISSION_DENIED: The caller does not have permission" even though the code didn't change. GCP support says the service account is correct. What's happening?
Likely cause: GitHub's OIDC thumbprint rotated, and GCP's cached version is stale. GCP Workload Identity Federation validates GitHub's certificate using a thumbprint—when GitHub rotates the certificate (approximately once per year), the thumbprint changes. If GCP doesn't have the new thumbprint in its provider config, token validation fails. Check GCP's Workload Identity Federation provider: if the thumbprint is older than 6 months, it's likely stale. Update it: retrieve the current thumbprint from GitHub's OIDC endpoint (`https://token.actions.githubusercontent.com/.well-known/openid-configuration`, then fetch the certificate from the `jwks_uri`) and update the GCP provider. Automate this: create a Cloud Function that runs monthly to fetch and update the thumbprint, preventing future outages. Alternative: GCP now supports dynamic provider updates via the OIDC discovery endpoint; enable auto-refresh instead of hardcoding the thumbprint.
Follow-up: Design a monitoring solution that alerts if your OIDC provider's thumbprint hasn't been updated in >6 months.
Your organization uses OIDC for AWS, but you need to deploy to an on-premises Kubernetes cluster that doesn't support OIDC. The cluster uses static bearer tokens stored in GitHub Secrets. You want to eliminate the static token but also can't modify the cluster's auth mechanism. What's your approach?
OIDC doesn't solve this—you still need a token somewhere. However, minimize the blast radius: (1) Store the Kubernetes bearer token in a secrets manager (AWS Secrets Manager, HashiCorp Vault, GitHub Encrypted Secrets), then use OIDC to authenticate to the secrets manager, not directly to Kubernetes. Your workflow: authenticate to AWS via OIDC → assume a role → fetch the Kubernetes token from Secrets Manager → use it once per job, then discard it. (2) Rotate the token frequently (weekly, not annually). (3) Implement audit logging: wrap Kubernetes API calls in a proxy that logs all deployments. (4) Scope the token: if Kubernetes supports RBAC, create a service account with minimal permissions (deploy to specific namespaces only). (5) Use a temporary token service: if your on-prem Kubernetes supports webhook authentication, build a bridge that converts OIDC tokens into Kubernetes tokens on-the-fly—eliminates persistent secrets. Long-term: migrate the cluster to support OIDC (via a reverse proxy with OIDC support, like Dex or oauth2-proxy).
Follow-up: How would you build an OIDC-to-Kubernetes token bridge that converts GitHub tokens to Kubernetes bearer tokens?
You've been using OIDC for 6 months. An incident report shows that a GitHub token was used to assume an AWS role, but the `actor` claim didn't match the expected service account. Investigation reveals: the token was issued to a workflow run, not a user. Someone modified the workflow to assume a different AWS role than intended. Your RBAC didn't prevent the role switch. How do you fix this?
The issue: OIDC's trust policy included conditions on `sub`, `repo`, and `ref`, but not on the specific role being assumed. If your AWS role's trust policy says "any token from repo X is trusted," an engineer can craft a workflow that assumes any role the policy trusts. Implement step-level enforcement: (1) hard-code the role name in the workflow action, not as a variable: `role-to-assume: arn:aws:iam::123456789012:role/backend-deploy` (don't use `${{ vars.ROLE_NAME }}`). (2) Use GitHub's environment protection rules: set prod deployments to require approval from a specific team, limiting who can trigger them. (3) Add an AWS STS policy that requires specific resource tags: the role's trust policy can include `StringEquals: { 'sts:ExternalId': '[static-uuid]' }`. (4) Implement automated checks: after assuming the role, verify you got the expected role ARN and permissions—fail the workflow if not. (5) Audit CloudTrail monthly: look for unexpected role assumptions.
Follow-up: Design an automated audit workflow that compares intended role assumptions (from workflow code) against actual assumptions (from CloudTrail) weekly.