Docker Interview — Multistage Builds and Minimal Images

Your Go service image is 2.3GB despite a tiny binary. You run `docker history myimage:latest` and see the builder stage is 1.8GB, then the final stage copies just the binary (15MB). Builds take 8 minutes and registry storage costs $340/month across 50 services. How do you architect multistage builds to eliminate bloat?

Multistage builds separate compile, test, and runtime layers. In your Dockerfile, use two or more `FROM` statements. The builder stage installs compilers, dependencies, and build tools—all discarded in the final image. Use `COPY --from=builder /app/binary /usr/local/bin/` to copy only the artifact. For Go: stage 1 uses `golang:1.21-alpine` (350MB), compiles, then stage 2 uses `alpine:latest` (7MB), copies binary and runs. Result: 2.3GB → 25MB. Registry: $340 → $12/month. Verify with `docker images` and `docker history myimage:latest` to confirm no bloat layers remain.

Follow-up: What if your final stage needs git history for version info? How do you avoid embedding 150MB of .git in the production image?

Node.js app: 890MB image (node:18, npm deps, then app code). You run CI 40 times/day. Docker BuildKit reports cache miss on `RUN npm install` even though package.json hasn't changed. Builds stall at 7+ minutes. What layer order saves the cache?

Docker caches layers based on the Dockerfile line and preceding layer content hash. If you `COPY . .` (app code) before `RUN npm install`, any code change busts the entire cache—npm rebuild from zero. Reverse the order: `COPY package.json package-lock.json ./`, `RUN npm ci`, then `COPY . .`. If package files are unchanged, npm cache is reused. If code changes, npm is skipped. With BuildKit, use `docker buildx build --cache-from type=local,src=/tmp/buildcache --cache-to type=local,dest=/tmp/buildcache .` to persist cache across CI runs. Test: touch a comment in app.js, rebuild—npm should be cached (0m2s vs 3m30s).

Follow-up: Your base image node:18 gets a security patch. How do you invalidate just that layer cache without invalidating npm, and what's the performance cost?

Python web service: base image `python:3.11` (920MB). You add 47 system packages with `apt-get install`, then `pip install -r requirements.txt` (150 packages). Final image: 1.6GB. Your CI publishes to ECR 50 times/day—upload costs spike in peak hours. Optimize for both size and speed.

Use alpine Linux as base: `FROM python:3.11-alpine` is 50MB vs 920MB. Combine RUN commands into a single layer with `&&` chaining, then clean apt cache: `RUN apk add --no-cache gcc musl-dev && pip install --no-cache-dir -r requirements.txt && rm -rf /root/.cache`. Alpine doesn't have apt-get; use `apk add`. Skip docs/man pages. In multistage: builder stage (alpine + build tools) compiles extensions, final stage copies wheels/binaries only. Result: 1.6GB → 180MB. ECR bandwidth: 50 images × 1.6GB = 80GB/day → 50 images × 180MB = 9GB/day. Test locally with `docker build -t test . && docker images` to confirm final size.

Follow-up: Alpine removes libc; your C extension fails to load. How do you debug this in the image, and what's the production trade-off?

Your microservices use a shared base image (450MB) that you rebuild weekly. All 12 services inherit FROM it. You publish all 12 at once—12 × 450MB = 5.4GB in 8 seconds via ECR push, saturating your 100Mbps pipe. How do you parallelize and compress pushes without changing application code?

Use Docker layer sharing and push parallelization. Each service layers only changes atop the base, so ECR stores one 450MB base copy; the 12 service layers (each 5-50MB) are separate. But serial pushes serialize this. Use BuildKit's parallel exports and multi-platform builds: `docker buildx build --platform linux/amd64,linux/arm64 -t myrepo/svc1 . --push` parallelizes architecture builds. Combine with `--cache-to type=registry` to share builder cache across CI agents. In CI (GitHub Actions, GitLab), use matrix strategy to build 4 images in parallel vs 1 at a time. Compress layers with `docker build --compress`. Result: 8s → 2s per batch, pipe utilization 90% → 40%. Verify: `docker buildx du` shows cache efficiency.

Follow-up: You deploy to both x86 K8s cluster and ARM Raspberry Pi edge devices. Does multiarch BuildKit slow down your hotpath builds?

Your Java Spring Boot app Dockerfile: `FROM openjdk:17-jdk (650MB) → Maven compile/test (RUN mvn clean package) → COPY target/app.jar → CMD ["java", "-jar", "app.jar"]`. Total: 1.2GB image. You repeat this 8 times/day in staging. Compile step takes 6m30s, slowing deployments. Optimize the build without changing Spring code.

Multistage: stage 1 `FROM maven:3.9-eclipse-temurin-17` (800MB), copy pom.xml, run `mvn dependency:resolve` to cache deps separately from code. Then copy src, compile. Stage 2: `FROM eclipse-temurin:17-jre` (300MB, JRE not JDK), copy jar from stage 1. JRE is 56% smaller—no javac/tools needed at runtime. Result: 1.2GB → 450MB. For faster rebuilds, use `.dockerignore` to skip test files and .git. Add `true` to Surefire plugin in multistage builder, or skip tests entirely in Docker: `RUN mvn package -DskipTests`. Test: `docker build -t app . && docker images` confirms 450MB. In CI, use `docker buildx build --cache-from type=registry,ref=myrepo/app:buildcache` to reuse Maven dependency layer across builds.

Follow-up: Your CI agent has limited disk (5GB). Maven cache + two image builds exceed capacity. How do you persist Maven cache without mounting the Docker socket?

Rust service: release build generates 180MB binary (debug metadata, symbols). Dockerfile copies it into `FROM debian:bookworm-slim` (80MB). Total: 260MB. You strip the binary locally (`strip` command) and size becomes 45MB—but you need debugsymbols for prod crashes. Design an image that ships stripped binary + separate debug info.

Use `objcopy` to split debug symbols: build binary normally, then `objcopy --only-keep-debug binary binary.debug && objcopy --strip-all binary`. Binary is now 45MB, binary.debug is 140MB. In Docker multistage: stage 1 builds and strips, stage 2 `FROM debian:bookworm-slim`, copies only the 45MB binary and `.keep-symbols` flag. Store binary.debug in a separate image tag `myrepo/rust-svc:latest-debug` (165MB) or in your artifact store (S3, Artifactory). At runtime, if prod crashes, `docker run -v /var/log/crash:/mnt myrepo/rust-svc:latest-debug gdb /binary --batch -ex 'file /mnt/crash.core'` to analyze. Production image: 125MB vs 260MB. Test: `docker build -t app . && docker images && objdump -h app.binary | grep debug` confirms symbols stripped.

Follow-up: Your CI pipelines run on GitHub Actions (15GB image cache) and self-hosted runners (2GB). How do you sync debug images across both without doubling push time?

Your microservices mono-repo: 60 services, shared Dockerfile template. Each service adds 50-200MB of unique dependencies. You publish all 60 to a private registry in 12 minutes (peak hour push spike). Bandwidth budget: 500GB/month. You're at 480GB and growing. Design an image compression and deduplication strategy for registry.

Three strategies: (1) Deduplication: tag common layers as separate images. E.g., `myrepo/base-python:3.11` (180MB) shared by 30 services. Each service layers atop, sharing the base layer in registry. Registry compression: configure containerd with zstd (better than gzip): `[plugins."io.containerd.grpc.v1.cri".registry.configs."myrepo".tls]` settings. (2) Multi-stage aggressive pruning: remove build artifacts, pip caches, apt caches aggressively. (3) OCI artifact format with content-addressable storage: images with identical layers have one copy in registry. Result: 60 services × 800MB = 48GB (naive) → 30GB (dedup base) → 18GB (zstd + pruning). Verify: `docker system df -v` on build machine shows layer reuse. In registry (Docker Hub, ECR, Harbor): enable garbage collection to remove orphaned layers after 30 days.

Follow-up: Your security scan finds 47 images with vulnerable glibc 2.31. Do you rebuild all 60 or use a base-image-as-artifact strategy?

Your C++ service compiles to a 85MB static binary (no libc dependency). Dockerfile: `FROM alpine:latest (7MB) + COPY binary (85MB) = 92MB total`. At scale (1000 replicas), your K8s pulls this image 1000 times in parallel during a rolling update. Network saturation: 92GB transfer in 60s → network spike → node readiness probes timeout. How do you handle image pull parallelization and caching at the edge?

K8s image pull happens on each node. Use a local registry mirror or pull-through cache on each node. Docker daemon has `--max-concurrent-downloads` (default 3). Increase it: edit `/etc/docker/daemon.json` → `"max-concurrent-downloads": 10` or use containerd config `/etc/containerd/config.toml` → `[plugins."io.containerd.grpc.v1.cri".registry.configs."myrepo".capabilities] = ["pull", "resolve"]`. On K8s: use DaemonSet with image cache sidecar (e.g., image-cache-node), or configure image pull policy `IfNotPresent` so nodes cache locally. For 1000 parallel pulls: distribute across registry replicas (3-5 read replicas, CDN front-end). Result: 1000 sequential pulls (92GB, 60s) → 100 concurrent pulls × 10 nodes = 10 parallel (92GB, 6s). Verify: `docker pull --verbose` and observe concurrent chunk downloads. Monitor K8s events for pull timeouts: `kubectl describe node | grep ImagePull`.

Follow-up: Your K8s cluster spans 3 geographic regions (US, EU, Asia). How do you minimize image pull time without triplicating storage cost?