Docker Interview — PID 1 and Signal Handling in Containers

Your Node.js app container ignores SIGTERM and takes 30 seconds to force-kill. Your orchestrator timeout is 20 seconds, causing unclean shutdowns and data corruption. You traced the issue: a shell wrapper script (not your app) is PID 1 and doesn't forward signals to the Node process. How do you fix this immediately and prevent it in future deployments?

This is a classic PID 1 signal-handling problem. Shell scripts as PID 1 don't forward signals—only init processes like tini, dumb-init, or systemd do. Fix it in three ways: (1) Use exec in your entrypoint (bash exec node app.js) to replace the shell with the app process, making it PID 1 directly. (2) Add a minimal init system: FROM ubuntu && RUN apt-get install -y dumb-init && ENTRYPOINT ["dumb-init", "node", "app.js"]. (3) Implement graceful shutdown in your app: Node's process.on('SIGTERM', () => { server.close(); db.disconnect(); process.exit(0); }). The fix: use exec + dumb-init in Dockerfile, set your orchestrator timeout to 35-45 seconds, and add a STOPSIGNAL SIGTERM directive. This ensures signals propagate correctly and your app has time to clean up connections before forceful termination.

Follow-up: What happens if your app receives SIGKILL instead of SIGTERM? Can you catch it? How do orchestrators use SIGTERM vs SIGKILL?

Your Python FastAPI service receives SIGTERM but continues processing requests for 45 seconds before closing connections. Meanwhile, the load balancer has already removed the container from the service pool. Some in-flight requests hit the closing container and get 503 errors. Design a pre-shutdown sequence.

This requires a multi-stage graceful shutdown: (1) On SIGTERM, immediately set a shutdown flag and stop accepting new connections without killing active ones. (2) Use a health-check endpoint that returns 503 when shutdown is in progress, so the LB detects and drains traffic immediately. (3) Implement a drain period (e.g., 5-15 seconds) where existing requests complete but no new ones start. (4) After the drain period, force-close remaining connections and exit. In Python FastAPI: import signal; signal.signal(signal.SIGTERM, lambda: set_shutdown_flag()); then in your health route, return 503 if shutdown_flag is True. Use an async context manager to track in-flight requests and wait for completion. Set your orchestrator gracePeriodSeconds to 35+ to accommodate your max request time + drain period. This ensures the LB sees the 503 and removes traffic before the container is killed.

Follow-up: How do you measure if your graceful shutdown is actually working? What metrics should you emit?

You're running a Kafka consumer in a Docker container. It caches 10,000 messages in memory before batch-processing them. On SIGTERM, it gets 5 seconds before the orchestrator sends SIGKILL. The consumer crashes without committing offsets, causing message reprocessing. How do you guarantee offset safety during shutdown?

Kafka consumers need special shutdown logic to avoid reprocessing. On SIGTERM: (1) Stop consuming new messages immediately. (2) Process the buffered batch and commit offsets for completed messages only. (3) Close the consumer connection (which triggers offset commit). (4) Exit cleanly before SIGKILL arrives. Implement this by wrapping the consumer in a context manager that catches SIGTERM. In Python with kafka-python: consumer.close() handles graceful shutdown and commits. Set enable.auto.offset.store=false and manually commit only after processing completes. Configure your orchestrator termination grace period to at least message_buffer_max_time + batch_processing_time (typically 30-45 seconds). Monitor with metrics that track shutdown events, messages buffered, and time-to-graceful-shutdown. This ensures no offset resets and no double-processing.

Follow-up: What's the difference between autocommit and manual offset commits? When should each be used?

Your Go gRPC service receives SIGTERM and initiates graceful shutdown via GracefulStop(). But your Python client has timeouts set to 60 seconds. After 10 seconds, the gRPC server closes and returns UNAVAILABLE to the client, which then retries. Design a coordinated shutdown between gRPC server and clients.

gRPC GracefulStop() waits for in-flight RPCs but stops accepting new ones after a grace period (default 0, configurable). Coordinated shutdown requires: (1) When the server gets SIGTERM, immediately return a Unavailable status with a message indicating drain mode—clients see this and should reconnect to another replica. (2) Set a drain timeout (e.g., 15 seconds) during which existing streams complete. (3) Implement exponential backoff on the client side with max retries; on UNAVAILABLE errors, clients should fail-over to healthy peers. (4) Set the orchestrator's termination grace to 30+ seconds. In Go, use grpc.Server.GracefulStop() with a timeout wrapped in a signal handler. For Python clients, use grpc.aio.secure_channel() with automatic reconnection and pool management. This ensures clients get clear signals to switch replicas rather than endlessly retrying a dying server.

Follow-up: How do you implement health-aware load balancing in gRPC to ensure clients detect server shutdown before sending requests?

Your Rails app is running Puma (multi-worker process). On SIGTERM, Puma sends SIGTERM to workers, but some background jobs hang in cleanup callbacks (Rack finalizers), blocking the entire shutdown for 120 seconds. Your orchestrator timeout is 30 seconds. Every deploy causes cascading restarts. How do you unblock this?

Puma's graceful shutdown uses signals, but hanging finalizers block it. Fix this: (1) Wrap your Rack finalizers in timeout blocks—if cleanup takes > 5 seconds, kill it and move on. Use Ruby's Timeout module or a custom timer. (2) Separate long-running jobs into a separate worker process outside the web container; don't run them inside Puma. (3) Configure Puma's persistent_timeout and shutdown timeout in config/puma.rb: workers 4; timeout 15; max_threads 5. (4) On SIGTERM, Puma will wait for in-flight requests to complete (up to timeout), then shut down. (5) Set your orchestrator grace period to 40-45 seconds. Implement graceful job draining separately: if your job worker gets SIGTERM, finish the current job but reject new ones. This decouples web request lifecycle from background work, preventing web timeouts.

Follow-up: How do you test graceful shutdown locally? What tooling can simulate orchestrator timeouts?

You have a Spring Boot app that uses @PreDestroy bean lifecycle hooks for database connection pooling cleanup. On SIGTERM, Spring's graceful shutdown waits 30 seconds for in-flight requests to complete, but your database driver timeout is only 5 seconds. When the grace period expires, Spring forcefully kills the app mid-cleanup. Design a proper shutdown sequence accounting for db driver behavior.

Spring Boot's GracefulShutdown (enabled via server.shutdown=graceful) waits for active requests but still uses underlying driver timeouts. Problem: the driver timeout is global and non-overridable during shutdown. Solution: (1) Set your database connection pool timeout to match or exceed your orchestrator grace period. In Hikari: hikariConfig.setConnectionTimeout(45000); (2) In your @PreDestroy method, explicitly close the connection pool with a timeout: datasource.getHikariPoolMXBean().softEvictConnections() followed by a graceful wait. (3) Configure Spring's grace period conservatively: server.shutdown.wait-for-requests-to-finish-timeout=35s. (4) Use a shutdown hook before @PreDestroy fires to drain new connection requests. The sequence: SIGTERM → Spring stops accepting requests → existing requests complete (with connection timeouts set high enough) → @PreDestroy runs → datasource closes cleanly → app exits. Set orchestrator timeout to 50+ seconds to accommodate. This ensures database resources are properly released without timeout conflicts.

Follow-up: How do connection pool exhaustion and graceful shutdown interact? What happens if all pool connections are in-use when SIGTERM arrives?

You're using init=true in docker-compose to run systemd inside your container. The container receives SIGTERM but systemd inside doesn't propagate it to your main application process because systemd itself is PID 1. The service inside systemd doesn't receive the signal. Explain what's happening and fix it.

When init=true (or --init flag), Docker runs a minimal init process as PID 1, which then starts your app as a child. Signals go to PID 1 (the init), not your app. The init process must forward signals to children. Most Docker-provided inits (tini, runc init) do this correctly, but systemd inside a container sometimes doesn't. The issue: SIGTERM → init (PID 1) → systemd (PID N) → your service (doesn't get signal because systemd in containers sometimes blocks signal propagation). Fix: (1) Don't use systemd inside containers—it's designed for real systems, not containers. Instead, run your service directly: docker run app. (2) If you must use systemd, ensure it forwards signals by setting environment KillMode=mixed or KillMode=process in the unit file and DefaultSignal=SIGTERM in systemd config. (3) Use Docker's --init flag (which runs tini) for proper signal forwarding, and run your app as the main process without systemd wrapper. Best practice: app as PID 1 (using exec in entrypoint) + --init flag + proper signal handlers in app code.

Follow-up: When should you use --init vs. dumb-init vs. tini? What's the minimal viable init system?

You're running a container with a multi-stage entrypoint: it starts a supervisor process that manages three child processes (web server, cache warmer, metrics exporter). On SIGTERM to the container, the supervisor receives the signal but the three children keep running for another 20 seconds, all exiting uncleanly. The supervisor itself doesn't properly forward signals. How do you design the entrypoint to handle this?

Supervisors and process managers need careful signal handling. When the supervisor is PID 1, it receives SIGTERM; it must immediately propagate to children and wait for their graceful exit. Implementation: (1) Use a proper process manager (supervisord, Monit) that's signal-aware, or write a shell supervisor that catches SIGTERM and kills children with -TERM. (2) In your shell entrypoint: trap "kill -TERM \$child_pids; wait" SIGTERM; exec supervisord --nodaemon. This ensures that when the container gets SIGTERM, the supervisor catches it and sends SIGTERM to all child process groups. (3) Set child process timeouts individually (e.g., each child has 10 seconds to shut down gracefully). (4) After all children exit or timeout, the supervisor exits. (5) Set orchestrator grace period to supervisor_timeout + max_child_timeout + buffer (typically 40-50 seconds). The trap ensures signals propagate through the process tree synchronously. This prevents zombie children and ensures coordinated shutdown.

Follow-up: What are process groups (process groups), and how do they differ from killing individual PIDs? When should you use killpg vs. kill?