Docker Interview — Docker Image Vulnerability Scanning

Trivy scans your Docker image and reports 47 CVEs. Your base image is ubuntu:22.04, which has outdated packages. You need to triage these CVEs, decide which are actually exploitable in your container context, and fix them without breaking the build. Walk through your triage and remediation strategy.

CVE triage requires understanding severity, exploitability, and context. Process: (1) Run Trivy with verbose output: trivy image --severity HIGH,CRITICAL ubuntu:22.04. Filter to only high/critical CVEs (47 might include low-severity vulnerabilities that aren't urgent). (2) For each CVE, evaluate: is the vulnerable package actually installed? Is it used by your app? Is it exploitable in the container context? Example: an OpenSSL CVE might affect SSH, but your app doesn't use SSH—it's lower priority. (3) Group CVEs by fixability: some are fixed by apt-get upgrade (newer package available). Some require a base image upgrade (no fix available yet). Some are unfixable (vulnerabilities in packages that are no longer maintained). (4) Fix high-priority CVEs: RUN apt-get update && apt-get install -y --only-upgrade openssl curl (upgrade specific packages). RUN apt-get remove -y gcc (remove unnecessary packages that introduce surface area). (5) For unfixable CVEs, accept the risk with documentation: add a comment in Dockerfile: # CVE-2024-1234 is unfixable in ubuntu:22.04; we accept this risk because the vulnerability requires X and our app does Y. (6) Switch to a minimal base image if applicable: use alpine (smaller attack surface) or distroless (no shell, no package manager post-deployment). (7) Implement scanning in CI: every build is scanned; if critical CVEs are introduced, fail the build. Example: Dockerfile FROM ubuntu:22.04 RUN apt-get update && apt-get install -y curl openssl && apt-get clean. Then scan: trivy image --exit-code 1 --severity CRITICAL myimage. This ensures you're only deploying images with acceptable CVE risk profiles.

Follow-up: How do you distinguish between vulnerabilities that are exploitable in containers vs. the host? What's the actual risk?

You're using node:18-alpine as your base image. Trivy reports 5 CVEs in npm dependencies (from package.json). None are in the OS layer—all are in npm packages. You can't fix them with OS updates. Your CI pipeline is blocking deployments. How do you handle CVEs in application dependencies?

Application dependency CVEs are separate from OS CVEs and require different remediation. Approach: (1) Use npm audit to scan dependencies: npm audit. This reports CVEs and suggests fixes. (2) Update vulnerable packages: npm update package-name. This upgrades to the latest patch version that fixes the CVE. (3) If no fix is available, assess exploitability: does your app actually use the vulnerable code path? Example: a SQL injection CVE in a database library is only relevant if your app uses that library with untrusted input. (4) For unfixable CVEs, use audit exemptions or exceptions: npm audit --omit=dev (skip dev dependencies in audit). npm audit --audit-level=moderate (only fail on moderate or higher CVEs). (5) Pin transitive dependencies: if a vulnerable package is a transitive dependency, you can't upgrade it directly. Use npm shrinkwrap or package-lock.json to pin versions, then negotiate with maintainers or find alternatives. (6) Implement SBOM (Software Bill of Materials) scanning: use Trivy or Syft to generate SBOM, then scan it with Grype. This provides visibility into all dependencies and their vulnerabilities. (7) For unfixable vulnerabilities, document the risk and get security approval. Add to EXCEPTIONS.json: {"cve": "CVE-2024-1234", "package": "lodash", "reason": "this CVE requires admin access; our app runs as non-root and doesn't expose this", "expires": "2026-06-07"}. This allows CI to pass while tracking known risks. Best practice: fix all vulnerable packages you can; for unfixable ones, document and get security sign-off.

Follow-up: How do you determine if a CVE in a dependency is actually exploitable by your application? What's the CVSS score and how does it relate to real risk?

You've fixed all CVEs in your base image and dependencies. You build the image, push to registry, and within hours, a new CVE is discovered in a package you included. Your deployed containers are now vulnerable. How do you handle zero-day vulnerabilities and continuously update already-deployed images?

Zero-day vulnerabilities appear after deployment. Handle them with: (1) Continuous scanning of deployed images: run Trivy on all images in your registry periodically (hourly or daily). If new CVEs appear in deployed images, alert. Example: trivy image-repository --severity HIGH,CRITICAL. (2) Implement auto-remediation: when a new CVE is detected in a base image, automatically trigger a rebuild of all dependent images. Use a CI trigger: on CVE detection, rebuild from scratch, test, and re-deploy. (3) Use image digest pinning with ephemeral rebuilds: instead of pinning to ubuntu:22.04 (which changes when the base image is updated), pin to the digest ubuntu@sha256:abc123 and periodically update this digest. (4) Implement staged rollouts: when you detect a CVE and rebuild, deploy the new image to dev/staging first, validate, then promote to production. Don't deploy directly to production. (5) For emergency zero-days (critical, widely exploited), have a fast-track process: rebuild within 1 hour, test in staging, deploy to production within 3 hours. (6) Use tool like Snyk or Dependabot that watches your dependencies and automatically creates PRs to fix CVEs. (7) For non-fixable CVEs, implement compensating controls: if there's a kernel CVE, run containers with seccomp profiles that disable the affected syscall. If there's a library CVE you can't patch, restrict network access or file permissions. Example CI pipeline: daily scan → detect new CVEs → auto-rebuild images → re-deploy. This ensures vulnerability information is quickly incorporated into deployments. For production, use image registries that rescan images automatically (Docker Hub, ECR, Artifactory) and alert you to new vulnerabilities.

Follow-up: What's the difference between image scanning at build time vs. runtime? When should you use each?

Your company uses private npm packages from a private registry. Trivy can't scan these packages because it doesn't have access to the registry (missing credentials). Your SBOM is incomplete, and you're blind to CVEs in private dependencies. How do you scan private dependencies?

Scanners need access to private registries to scan private dependencies. Solutions: (1) Configure credentials in the scanner: provide npm registry credentials to Trivy. Create a .npmrc file with auth tokens and mount it into Trivy: trivy image --registry-credentialsFile .npmrc myimage. (2) Use a private vulnerability database: subscribe to a service that tracks CVEs in your private packages (or maintain one internally). (3) Scan at build time, not runtime: during the build, the builder has access to all registries. Generate SBOM and CVE report then. In CI: npm install (pulls from private registry), then trivy sbom (or npm audit). This happens with full credentials. (4) Use Docker build secrets to pass credentials securely: docker build --secret npm_token=value . Then in Dockerfile: RUN --mount=type=secret,id=npm_token npm install. (5) For private Dockerfile dependencies: use ARG to pass credentials, or use multi-stage builds to exclude the credentials from the final image. (6) Generate SBOM locally with all credentials, then scan independently: npm install && syft . -o spdx > sbom.json. Commit sbom.json (without credentials) to your repo. Scan it: trivy sbom sbom.json. This way, scanning tools don't need credentials. (7) Work with your security team to whitelist private packages: if a private package is trusted, get security approval to exclude it from scanning. Example: in CI, after installing private packages, generate SBOM with full credentials, then scan before deployment. Store SBOM in artifact registry for auditing.

Follow-up: How do you balance security scanning with build time? Scanning can add minutes to CI. Is it worth it?

Your Kubernetes cluster is running 50 images. You've implemented vulnerability scanning in your CI pipeline, but images in production are running stale versions that had CVEs at deployment time. New images are scanned, but old ones are never re-scanned. You discover that 10 deployed images have now-critical CVEs. How do you implement runtime image scanning to catch stale images?

Runtime image scanning detects vulnerabilities in running containers. Implement: (1) Use Kubernetes admission controllers to scan images before they run: deploy a policy engine (Kyverno, OPA/Gatekeeper) that intercepts pod creation and verifies the image is vulnerability-free. Example Kyverno policy: validate image via Trivy API; if CVEs exist, reject the pod. (2) Use image scanning tools that hook into the container runtime: Falco, Aqua, or Prisma Cloud scan running containers in real-time and alert on CVEs. (3) Implement scheduled re-scanning of deployed images: run Trivy daily on all images in your cluster (extract image digests from running pods, scan each). If vulnerabilities are found, alert and create an incident. (4) Use image signing and attestations: tag each image with a signed attestation proving it was scanned and clean at a certain time. When running an image, verify the attestation. If the image is older than the security window, reject it. (5) Implement automatic image replacement: if a deployed image is found to have CVEs, automatically rebuild and re-deploy it. Use GitOps: when a new image is built, update the Deployment manifest in Git; the CD system deploys it. (6) Set image retention policies: don't keep images longer than 90 days without re-scanning. Delete old images to force rebuilds and re-scans. Implementation: In Kubernetes, deploy Kyverno with Trivy integration. Set policy to scan all new images. Separately, run daily batch job that scans all deployed images and reports results. Example: kubectl get pods -A -o json | grep image | scan-with-trivy. Alert on any CVEs found. This ensures both new images are scanned and old deployed images are continuously monitored.

Follow-up: What's the performance impact of admission controller scanning? Does it slow down pod startup?

You're in a regulated industry (healthcare, finance) and need to maintain compliance with strict vulnerability policies. Your policy requires: zero critical CVEs in any deployed image, all images signed, and full audit trail of who approved each image. Current scanning is ad-hoc. Design a compliant vulnerability management process.

Regulated environments require strict processes and audit trails. Implement: (1) Centralized image scanning and approval: all images are scanned by a central system (Trivy server). Results are stored in a database with full audit logs. (2) Automated enforcement: CI rejects images with critical CVEs automatically. No manual workarounds. For exceptions, require explicit security team approval (documented, with expiration date). (3) Image signing: all images are signed with a private key held by the security team. Kubernetes admission controller verifies signatures before running images. This prevents running unsigned (unapproved) images. (4) Audit logging: every image scan, approval, deployment, and exception is logged with timestamp, user, reason, and outcome. (5) Regular compliance reports: generate reports showing all deployed images, their CVE status, age, and approval status. These satisfy audit requirements. (6) Secure build process: use a secure build system (isolated CI workers, air-gapped if needed) where images are built and scanned. Sign images immediately after scan passes. (7) Runtime enforcement: Kubernetes admission controller rejects unsigned images or images with unapproved exceptions. (8) Incident response: if a CVE is discovered in a deployed image, trigger an incident: stop accepting new traffic (or shift to backup), rebuild image, re-sign, re-deploy. All logged. Implementation: CI pipeline → Trivy scan → if critical CVE, fail and alert security team. If pass, sign image. Kubernetes admission controller verifies signature before pod creation. All logs stored in secure audit system (SIEM). This ensures compliance with regulated industry requirements and provides full audit trail for regulators.

Follow-up: How do you handle emergency patches when a zero-day CVE is discovered in a critical package? What's the fast-track process?

You're using distroless base images (e.g., distroless/cc) which have minimal OS footprint and fewer CVEs. However, you still need to install a few application dependencies at build time. The Dockerfile uses a multi-stage build, but you accidentally included the build stage packages in the final image. Now the distroless image has unnecessary CVEs from build-only dependencies. Fix this without removing necessary runtime dependencies.

Multi-stage builds help avoid this, but mistakes happen. Fix: (1) Use proper multi-stage Dockerfile: FROM golang:1.18 AS builder ... RUN go build ... FROM distroless/base COPY --from=builder /app/binary /app. This ensures build dependencies (Go compiler, git, etc.) don't end up in the final image. (2) Clean up packages in each stage: in the builder, don't worry about size. In the final stage, only include what's needed for runtime. Example: FROM ubuntu:22.04 AS builder RUN apt-get install build-essential ... RUN make ... FROM ubuntu:22.04 RUN apt-get install -y --no-install-recommends runtime-deps && apt-get clean && rm -rf /var/lib/apt. (3) Use a minimal final base image: instead of ubuntu:22.04, use distroless/cc (smaller, fewer CVEs). (4) Scan both stages separately: trivy image myimage:builder-stage (check build dependencies). trivy image myimage:final-stage (check runtime dependencies). If builder-stage has CVEs but final-stage doesn't, it's not a deployment risk. (5) To verify stage separation, inspect the image: docker inspect myimage | grep Layers, then use dive to visualize layer contents: dive myimage. (6) Document in Dockerfile: # Build stage: includes compiler, git, etc. These are not included in final image. # Final stage: only runtime dependencies. Example multi-stage: FROM golang:1.18 AS builder; WORKDIR /build; COPY . .; RUN go build -o app.; FROM distroless/cc; COPY --from=builder /build/app /app; ENTRYPOINT ["/app"]. Run Trivy on final image: trivy image myimage. This ensures the deployed image is clean of build-time CVEs.

Follow-up: What's the trade-off between using distroless images (minimal CVEs, small size) and full OS base images (more tooling available)?

Your vulnerability scanning process is working, but teams are frequently creating exceptions for CVEs they claim are "not exploitable in our context." The exceptions pile up, and soon you have 100+ exceptions with weak justifications. The security team can't audit all of them. Design a process to prevent exception creep while still allowing legitimate exceptions.

Exceptions are sometimes necessary, but unchecked exceptions create blind spots. Process: (1) Make exceptions explicit and traceable: exceptions must be documented in a file (EXCEPTIONS.md or exceptions.json) committed to the repo. This forces visibility and code review. Format: {cve: "CVE-2024-1234", package: "lodash", reason: "this CVE requires admin access; our app is non-root", expiration: "2026-06-07", approver: "security-team"}. (2) Require security team approval: exceptions aren't self-service. A security engineer must review and approve before it's valid. Use code review: merge exceptions only if security team approves. (3) Set expiration dates: exceptions are valid for 90 days. After that, reassess. If the CVE is still unfixed and not addressed, the exception must be renewed or the image fails scanning. (4) Distinguish between exception types: a) unfixable (no patch available yet, set 30-day expiration), b) false positive (tooling error, set 90-day expiration but mark as low priority), c) contextually safe (exploitability requires X, and we mitigate X, set 6-month expiration). (5) Track metrics: report on total exceptions, expiration dates, and reason breakdown. Alert if exceptions exceed a threshold (e.g., >50). (6) Regularly audit exceptions: once a quarter, security team reviews all active exceptions, checks if CVEs are now fixed, and removes obsolete ones. (7) For contextually safe exceptions, implement compensating controls: if a CVE is safe only because we run as non-root, enforce non-root in Dockerfile and policy. Document the control in the exception: {control: "runs as non-root: RUN useradd -u 1000 app && USER app"}. Implementation: exceptions.json in repo, require security approval for changes, automated validation in CI that rejects expired exceptions. This prevents exception creep while maintaining flexibility.

Follow-up: How do you balance security (fixing all CVEs) with pragmatism (some CVEs are genuinely unfixable)? What's the right threshold?