Docker Interview — Volume Mounts vs Bind Mounts

You run `docker-compose up` with a service using a bind mount (host volume). After `docker-compose down`, the container data is gone. You re-run compose, expecting to resume—but the data isn't there. Why did using a bind mount cause data loss, and what's the difference from a Docker volume?

Bind mounts and volumes store data differently. Bind mount: `-v /host/path:/container/path` mounts a host directory directly into the container. Data is stored on the host filesystem. If `/host/path` is a temporary directory (e.g., `/tmp`) or gets deleted, data is lost. Docker volume: `docker volume create mydata && -v mydata:/container/path` creates a managed volume in Docker's data directory (usually `/var/lib/docker/volumes/mydata/_data`). Docker manages it. After `down`, volumes persist by default. After `up`, they're re-attached. To verify: (1) bind mount: `ls /host/path` on host shows files. If you deleted the host directory, data is gone. (2) volume: `docker volume ls` shows volumes, `docker volume inspect mydata` shows `/var/lib/docker/volumes/mydata/_data`. After `down && up`, volume is reattached and data is there. Test scenario: (1) `docker run -v /tmp/test:/data alpine:latest sh -c 'echo hello > /data/file.txt'`. (2) Exit container. (3) Check `/tmp/test/file.txt` on host—file exists. (4) Later, cleanup script deletes `/tmp/test` → data lost. With volume: (1) `docker volume create mydata && docker run -v mydata:/data alpine:latest sh -c 'echo hello > /data/file.txt'`. (2) Exit container. (3) `docker volume inspect mydata` shows the data persists. (4) Cleanup only removes the volume via `docker volume rm mydata`; data doesn't disappear on container restart. For production: use volumes for persistent data (databases, configs). Use bind mounts for development or temporary shared access (logs, temp caches).

Follow-up: You need persistent data AND temporary host file access. How do you use both bind mounts and volumes in a single container?

PostgreSQL container with `-v /var/lib/postgresql/data:/dbdata` (bind mount from host). Host filesystem is ext4. After 3 months, container storage spikes to 500GB. You check the bind-mount directory: `du -sh /var/lib/postgresql/data` shows 500GB. How do you safely migrate data to a larger storage location without stopping the database?

Direct migration (stop container → move mount → start) causes downtime. For live migration: (1) Set up a Docker volume on the new storage location (or new filesystem). (2) Use a temporary container to copy data: `docker run --rm -v old-mount:/source -v new-volume:/dest busybox cp -r /source/* /dest/`. (3) Sync changes during copy: use `rsync` or live `dd` to keep source and destination in sync while the DB runs. (4) Pause the DB (lock tables, stop writes): `docker exec db psql -c 'LOCK TABLE ...'`. (5) Final sync (fast, minimal downtime: ~1 second). (6) Switch container to new volume: stop container, modify docker-compose to use `new-volume:/dbdata` instead of old bind mount. (7) Restart container on new volume. (8) Verify: `docker exec db du -sh /dbdata` confirms size, run sanity checks (SELECT COUNT(*) on tables). For no-downtime: use PostgreSQL replication (primary-replica), redirect clients to replica, pause primary, catch up, promote replica. For large migrations (100GB+): consider tablespaces or external storage (cloud block storage) to avoid local filesystem constraints. Verify: `docker volume inspect new-volume | jq '.Mountpoint'` shows new data location on filesystem.

Follow-up: The bind mount is on NFS (network filesystem, 500ms latency). Your DB queries are slow. How do you identify if it's the mount layer causing slowness, and can you improve it without moving data?

Container runs with `--tmpfs /tmp` (tmpfs volume: in-memory, 64MB by default). App writes temp files to /tmp. After restart, /tmp is wiped (normal). But you need to increase tmpfs size to 256MB for a specific workload. How do you resize it, and what's the impact on the container?

Resize tmpfs: use `--tmpfs /tmp:size=256m` instead of `--tmpfs /tmp`. This allocates 256MB in RAM for tmpfs (instead of default 64MB). Test: `docker run --tmpfs /tmp:size=256m -d alpine:latest sleep 1000 && docker exec sleep df /tmp` shows 256MB capacity. If you already have a container with smaller tmpfs, you need to recreate it (restart alone doesn't change tmpfs size). Impact: (1) RAM usage increases by 256MB per container using tmpfs. (2) At scale (100 containers), that's 25.6GB of RAM reserved for tmpfs. (3) If system RAM is exceeded, kernel may OOM-kill containers (no swap for tmpfs by default). (4) Tmpfs performance is fast (in-memory), but RAM is shared: if tmpfs allocates 256MB and app allocates 512MB heap, total is 768MB, must fit in container memory limit. Example: `docker run -m 1g --tmpfs /tmp:size=256m myapp` means tmpfs takes 256MB of the 1GB limit, leaving 744MB for app heap/OS. Verify: `docker stats` shows memory usage, `docker exec container df /tmp` shows tmpfs size. Avoid excessive tmpfs sizes; use volumes for large temp storage that should outlive container or spill to disk. Tmpfs is ideal for: session data, temporary caches, build artifacts (small, short-lived).

Follow-up: tmpfs is fast but limited by RAM. For a large ETL workload needing 2GB temp space, is tmpfs viable, or should you use a persistent volume?

Two containers share a Docker volume: Container A (writer) writes data, Container B (reader) reads. Volumes support `ro` (read-only) mounts. How do you mount the same volume as read-write for Container A and read-only for Container B, and what are the implications?

Use mount options: `docker run -v myvol:/data:rw container-a` (read-write), `docker run -v myvol:/data:ro container-b` (read-only). Both containers attach to the same volume (`myvol`), but permissions differ. Container A can write to `/data`, Container B cannot (filesystem-level enforcement by kernel: EACCES on write attempts). Data is shared: writes by A appear immediately in B's view of `/data`. Test: (1) start A: `docker run -d -v myvol:/data alpine:latest sh -c 'echo hello > /data/test.txt; sleep 1000'`. (2) Start B: `docker run -d -v myvol:/data:ro alpine:latest sleep 1000`. (3) In B: `cat /data/test.txt` shows "hello" (file created by A is visible). (4) In B: `echo world > /data/test2.txt` fails (EACCES, read-only). Implications: (1) Data consistency: if B caches data in memory and A concurrently modifies files, B doesn't auto-refresh its cache—B must re-read from volume. (2) Write conflicts: if both A and B try to write, they might corrupt shared files (file locking is app-level, not kernel-level for volumes). (3) Performance: read-only mounts are slightly faster (kernel skips write permission checks), but difference is minimal. For production: use read-only mounts for replicas/consumers (Elasticsearch replicas, cache readers) to prevent accidental writes. Use read-write for single writer (primary DB) with multiple readers. Verify: `docker inspect container-b | jq '.Mounts[] | select(.Name=="myvol")'` shows `"RW": false`.

Follow-up: Container A crashes and data is partially written (corrupted state). Container B (read-only) is already serving stale data. How do you ensure consistency in read-only mounts?

Host filesystem uses NFS (network storage). You bind-mount it: `-v nfs-mount:/app/data`. Container writes 1000 files/second. NFS latency spikes (RTT 500ms). Container blocks on write calls. Other containers on the same host are unaffected. How do you diagnose NFS latency and improve throughput?

NFS latency is measured client-side. High latency = slow write syscalls. Diagnose: (1) Inside container: `strace -e write -c /app/myapp` shows time spent in write syscalls. (2) On host: `nfsstat -m` (Linux mount.nfs) shows NFS mount stats, dropped packets, retransmits, timeouts. (3) `rpcinfo -s ` checks NFS server responsiveness. (4) `iotop` on host shows I/O wait (%wa) for NFS mounts. (5) Network latency: `ping nfs-server` measures RTT; if 500ms, network is slow. (6) NFS server overload: `ssh nfs-server && top` to check server CPU/load. Solution: (1) Use NFS caching: set `-o actimeo=60` (attribute timeout 60s) in mount options to reduce GETATTR calls. (2) Use read-ahead buffer: `-o rsize=65536,wsize=65536` (larger I/O sizes per request, fewer round-trips). (3) NFS v4 (vs v3): v4 has better performance, use `mount -o vers=4` in Dockerfile or compose. (4) Async writes: `-o async` (risky: data loss if server crashes, but faster; for non-critical data). (5) Volume aggregation: batch writes in app, write in bulk (not 1000/sec; write 1000-file chunk once). (6) Local volume + sync: use Docker volume on local fast storage, periodically sync to NFS (e.g., rsync at off-peak). Test: `docker run -v nfs-mount:/data --mount type=nfs,o=vers=4,rsize=65536,wsize=65536 alpine:latest dd if=/dev/zero of=/data/testfile bs=1M count=100` measures throughput with tuned options.

Follow-up: You set `-o async` for speed but NFS server crashes, corrupting data. How do you balance performance with durability guarantees?

Container mounts a volume at `/app/data`. Inside, the app creates files with permissions 0600 (owner read/write only). After container restart, a different container (or host user) tries to read those files but gets EACCES (permission denied). The volume's `_data` directory on host is owned by root:root. Explain the permission problem and fix.

Volume permissions issue: volume is stored on host at `/var/lib/docker/volumes/myvol/_data`, owned by root:root (by default). When Container A creates files, they're created within the container's user context (often root inside container = UID 0, which maps to host root UID 0 via user namespace). File permissions: files created by container root are 0600 (readable only by owner). When Container B (or host user) tries to read, kernel checks permissions: Container B's user (e.g., UID 1000) doesn't own the file → EACCES. Fix: (1) Set volume ownership at creation: `docker volume create myvol && docker volume inspect myvol` to get host path, then `chown 1000:1000 /var/lib/docker/volumes/myvol/_data` and `chmod 755`. (2) Inside container, ensure app writes with permissive mode: `touch -m 0644 /data/file.txt` (readable by all). (3) Use a Docker volume with explicit permission settings in compose/stack: `volumes: - myvol:/data:Z` (Z flag = selinux relabel; not applicable on all systems). (4) Run container as non-root: `docker run -u 1000:1000 -v myvol:/data container` → app writes files as UID 1000 inside container, which maps to host UID 1000 (if no userns remapping). Then other containers with same UID can read. Verify: `ls -la /var/lib/docker/volumes/myvol/_data` on host shows permissions; inside container: `ls -la /data` matches. Test: (1) create volume, (2) container A writes file, (3) container B reads (should succeed). For production: use explicit UID/GID (avoid root), set volume ownership to match app user, use 0644/0755 permissions for shared volumes.

Follow-up: User namespace remapping (--userns remap) changes UID mapping. How does this interact with volume permissions, and do you need to rechown the volume data?

Docker volume stored on a host filesystem with encryption at rest (LUKS). Volume driver is `local`. Container restarts. Does the volume remain encrypted? If you migrate the volume to a different host without encryption, what happens to security?

Docker volumes inherit encryption from the underlying host filesystem. If host filesystem is LUKS-encrypted, volumes are encrypted at the storage layer (transparent to Docker). When container reads/writes, kernel decrypts on-the-fly. Container restarts: volume remounts, kernel decrypts automatically (assuming LUKS key/passphrase is available). No data loss. Encryption is persistent. However, Docker volume driver itself doesn't add encryption (it's just a mount/storage interface). Verify: `docker volume inspect myvol | jq '.Mountpoint'` shows host path (e.g., `/var/lib/docker/volumes/myvol/_data`). On host: `mount | grep /var/lib/docker` shows mount point. If LUKS device is listed as source, encryption is active. Migration security: if you copy volume data (via `docker run -v source:/src -v dest:/dst cp -r /src /dst`) to a non-encrypted filesystem, data is exposed in plaintext. Files copied lose encryption. Solution: (1) Encrypt destination filesystem before migration. (2) Keep volume on encrypted source until moved, then mount encrypted volume on destination. (3) At application level, encrypt sensitive data (AGPIC). For production: (1) encrypt host filesystem (LUKS + passphrase on startup). (2) Enable Docker volume encryption plugin (e.g., HashiCorp Vault integration). (3) Use cloud provider encryption (AWS EBS encryption, GCP persistent disk encryption). (4) Audit volume access: monitor who can read `/var/lib/docker/volumes`. Test: `docker run -v myvol:/data alpine:latest hexdump -C /data/sensitive.txt` should show encrypted gibberish if host filesystem is unencrypted and container doesn't decrypt; if legible, encryption is not working or not applied.

Follow-up: You need to rotate LUKS keys without downtime. How do you do this for active Docker volumes?

Container mounts two volumes: volume A (read-write, 100GB), volume B (read-only, 50GB). Both are on NFS. Container writes 1000 files/sec to A. Volume B serves reads. After 1 hour, NFS server loses network (brief network partition). Container A's writes hang (awaiting NFS ACK). Container B's reads also hang. How do you prevent one volume's network issue from blocking the other?

NFS mounts share the same network connection. Network partition → both mounts block (soft/hard mount policy applies globally). By default, NFS is hard mount (retries indefinitely), so both read and write hang until network recovers. Solution: (1) Use separate NFS mount points with different timeouts. Soft mount on B (read-only): `-o soft,timeo=10,retrans=3` (retries 3 times, ~1 second, then fails with timeout). Hard mount on A (write-critical): `-o hard,timeo=30,retrans=5` (retries 5 times, ~3 seconds, then waits indefinitely). Result: B times out and app handles error gracefully, A continues retrying (write-critical). (2) NFS v4 with NFSv4 callback channel: provides more intelligent timeout handling. (3) Use separate NFS servers or mounts: mount A on nfs-server-1, B on nfs-server-2. If one server is down, only that mount blocks. (4) Implement client-side retry logic: application catches mount timeouts, retries with backoff. (5) Use read-ahead caching on B: `-o rsize=1048576,readahead=60` (larger reads, cache in page cache, if NFS temporarily unavailable, reads from cache). For production: separate mounts for critical (writes) vs. non-critical (reads) workloads. Use soft mount for read-only, hard for write-critical. Verify: `mount | grep nfs | head -5` shows mount options. `strace` app during NFS partition: write process hangs, read process times out or succeeds from cache. After network recovery: both mounts resume.

Follow-up: Your app can't tolerate read timeouts (SLA broken). How do you cache reads aggressively without risking stale data?