Containers provide process isolation, not security isolation. This distinction matters enormously. A virtual machine runs its own kernel, providing a strong boundary between the guest and the host. A container shares the host kernel, relying on Linux namespaces and cgroups for isolation. When that isolation breaks down, an attacker inside a container gains access to the host system.
Container escapes are not theoretical. They have been demonstrated repeatedly through CVEs, misconfigurations, and creative exploitation of Linux kernel features.
Escape Technique 1: Privileged Containers
The most common container escape is not a vulnerability at all — it is a misconfiguration. Running a container with --privileged disables nearly all isolation mechanisms:
# This is essentially running directly on the host
docker run --privileged -it ubuntu bash
A privileged container can:
- Mount the host filesystem
- Load kernel modules
- Access all host devices
- Modify cgroups and namespaces
Escaping from a privileged container is trivial:
# Inside a privileged container
mkdir /mnt/host
mount /dev/sda1 /mnt/host
chroot /mnt/host
# You now have a root shell on the host
Defense: Never run privileged containers in production. If a process needs specific capabilities, grant them individually with --cap-add rather than enabling everything with --privileged.
Escape Technique 2: Docker Socket Mounting
Mounting the Docker socket (/var/run/docker.sock) inside a container gives that container full control over the Docker daemon — including the ability to create new privileged containers:
# A container with the Docker socket mounted
docker run -v /var/run/docker.sock:/var/run/docker.sock -it docker sh
# Inside the container, create a privileged container with host filesystem
docker run --privileged -v /:/host -it ubuntu chroot /host
This pattern is disturbingly common in CI/CD setups where containers need to build other containers.
Defense: Use alternatives to Docker socket mounting for CI/CD. Kaniko, buildah, and BuildKit can build container images without Docker daemon access.
Escape Technique 3: Kernel Exploits
Because containers share the host kernel, kernel vulnerabilities can be exploited from inside a container to gain host access. Recent examples:
Dirty Pipe (CVE-2022-0847)
The Dirty Pipe vulnerability allowed overwriting read-only files through the pipe mechanism. From inside a container, an attacker could overwrite files in the host's page cache if those files were accessible through bind mounts.
Dirty COW (CVE-2016-5195)
A race condition in the kernel's copy-on-write mechanism allowed writing to read-only memory mappings. This was exploitable from inside containers to modify host binaries.
CVE-2022-0185
A heap overflow in the filesystem context handling code allowed a container process to escape by exploiting the kernel's filesystem operations. This was particularly concerning because it could be triggered from an unprivileged container.
Defense: Keep the host kernel updated. This is the single most important defense against kernel-based container escapes. Consider using kernel-hardened distributions or minimal host operating systems like Bottlerocket or Flatcar.
Escape Technique 4: Dangerous Capabilities
Even without --privileged, individual Linux capabilities can enable escapes:
CAP_SYS_ADMIN
The Swiss Army knife of capabilities. With CAP_SYS_ADMIN, a container process can mount filesystems, configure namespaces, and perform dozens of operations that break isolation.
CAP_SYS_PTRACE
Allows a process to trace other processes. In some configurations, this can be used to attach to processes in other containers or on the host.
CAP_NET_ADMIN
Allows modification of network configuration. This can be used to intercept traffic from other containers or the host.
Defense: Drop all capabilities that your application does not need:
# Kubernetes pod security context
securityContext:
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # Only if needed
Escape Technique 5: Sensitive Host Path Mounts
Mounting sensitive host paths into containers creates direct escape paths:
# Dangerous volume mounts
volumes:
- /:/host # Entire host filesystem
- /etc:/etc # Host configuration
- /proc:/host-proc # Host process information
- /sys:/host-sys # Host system information
Even read-only mounts can leak sensitive information (credentials, configuration, process details) that aids further exploitation.
Defense: Minimize volume mounts. Never mount /, /etc, /proc, or /sys from the host. Use named volumes or tmpfs for data that does not need to persist.
Escape Technique 6: Container Runtime Vulnerabilities
The container runtime itself (runc, containerd, CRI-O) can have vulnerabilities:
runc CVE-2019-5736
A vulnerability in runc allowed a malicious container to overwrite the host runc binary, achieving code execution on the host the next time any container was started.
containerd CVE-2022-23648
A flaw in containerd's handling of image pulls allowed containers to access host files through specially crafted image configurations.
Defense: Keep your container runtime updated. Monitor for CVEs in runc, containerd, and your chosen runtime.
Defense in Depth
No single defense prevents all container escapes. Layer your defenses:
1. Use Rootless Containers
Run the Docker daemon and containers as a non-root user. Even if an escape occurs, the attacker lands in an unprivileged user context on the host.
# Run Docker in rootless mode
dockerd-rootless-setuptool.sh install
2. Enable Seccomp Profiles
Seccomp profiles restrict which system calls a container can make. Docker's default profile blocks approximately 44 of the 300+ syscalls, but custom profiles can be much more restrictive:
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{
"names": ["read", "write", "open", "close", "stat", "fstat"],
"action": "SCMP_ACT_ALLOW"
}
]
}
3. Use AppArmor or SELinux
Mandatory access control systems provide an additional layer of restriction that applies even to root processes inside containers.
4. Implement Runtime Monitoring
Use tools like Falco or Sysdig to detect escape attempts in real time:
- Unexpected mount operations
- Process execution outside expected patterns
- Network connections to unexpected destinations
- Capability usage anomalies
5. Use Read-Only Root Filesystems
Containers that cannot write to their filesystem are significantly harder to exploit:
securityContext:
readOnlyRootFilesystem: true
How Safeguard.sh Helps
Safeguard.sh scans container images for misconfigurations, excessive capabilities, and known vulnerabilities in base images and runtimes. Our platform identifies dangerous patterns like privileged containers, Docker socket mounts, and excessive host path exposure before deployment. Combined with continuous runtime monitoring, Safeguard.sh helps you maintain container isolation across your entire Kubernetes estate.