Container Security

Container Image Supply Chain: From Dockerfile to Production

Every container pulled in production is a trust decision. Here's how to secure the chain from base image selection through Dockerfile to admission control.

Shadab Khan
Security Engineer
8 min read

A container running in production is the end of a supply chain that started with someone else's compiler, pulled somebody's base image, layered your application on top, and got admitted to your cluster on the basis of whatever policy was in force at the registry and admission layers. If any link is weak, the rest doesn't matter. The 2023 3CX compromise, the 2024 XZ Utils backdoor (CVE-2024-3094) that rode the upstream path into containers, and the Ultralytics PyPI hijack that late 2024 punched through into Docker images all exploited gaps in this chain. Locking it down takes work at every stage, but the stages are well-defined and the controls have matured considerably.

Which base image should you actually use?

Use the smallest base image that satisfies your runtime requirements, and choose it from a distribution that publishes signed provenance. In practice that narrows to four serious options: Chainguard Images (minimal, FIPS-capable, near-zero CVE count at release because they're continuously rebuilt), distroless (Google's gcr.io/distroless/*, no shell, no package manager, only the runtime), Alpine (musl-based, small, but occasional libc surprises), and minimal Red Hat UBI variants (good for Red Hat shops with support requirements).

The single biggest security win is moving off the default ubuntu:latest or debian:slim images. A fresh Ubuntu 24.04 base carries 80-plus CVEs in the apt database on any given day, most of them in packages your application doesn't touch. Chainguard's cgr.dev/chainguard/python:latest typically ships with zero CVEs, a non-root default user, and a SLSA L3 attestation. Distroless images have no shell at all, which cripples a large class of post-exploitation techniques — an attacker who gets RCE in your Node.js app can't exec('/bin/sh') because /bin/sh doesn't exist. The cost is that debugging requires kubectl debug with an ephemeral container, which is fine for modern teams but occasionally surprises operators used to kubectl exec -it.

What Dockerfile patterns actually reduce risk?

Pin every base image by digest, not tag, run as a non-root user, use multi-stage builds to keep build tooling out of the final image, and declare a HEALTHCHECK so orchestrators can detect silent failures. A tag like node:22-alpine can be republished at any time; a digest like node:22-alpine@sha256:abc123... is immutable. This is the difference between your pipeline producing the same artifact tomorrow as it did today and producing something subtly different because Alpine pushed a point release overnight.

Multi-stage builds matter because most supply chain attacks on containers exploit leftover build tooling. If your final image contains gcc, curl, git, and a package manager, an attacker who gets code execution has a full toolchain to pivot from. A multi-stage build with FROM ... AS builder for compilation and FROM gcr.io/distroless/static for the final stage leaves the attacker with a statically linked binary and nothing else. Adding USER 10001:10001 (any non-root UID) prevents trivial container escapes via kernel vulnerabilities that require root in the container's user namespace. And HEALTHCHECK CMD gives Kubernetes a way to distinguish a running process from a healthy one, which matters when an attacker's payload keeps the process alive but non-functional.

How should build-time scanning fit into CI?

Run SCA and image scanning as gates in the pull request, not as advisory warnings, and prioritize by reachability rather than raw CVE count. A typical pipeline runs Trivy, Grype, or Snyk Container against the final image, generates an SBOM (Syft is the common choice), and signs both the image and the SBOM with Cosign. The scanning step should fail the build on critical CVEs that are reachable from your application's entry points, and emit a report for everything else.

The key word is reachable. A Python base image with a vulnerability in libxml2 doesn't matter if your application never touches XML. Treating every CVE as equally critical is how teams end up with 2,000-row SARIF reports that nobody reads. Reachability analysis — tracing which functions in your image are actually invoked from your application code — typically cuts 60-80% of the noise, leaving a focused list that humans can review. The build should also attest to the scan result itself, producing a VEX statement alongside the SBOM so downstream consumers know what was checked and which CVEs were ruled out as non-exploitable.

What does registry-side signing add?

Registry-side signing proves that the image bytes in your production cluster match the bytes your CI signed, even if the registry itself is compromised. The mechanics are Sigstore Cosign plus a Rekor transparency log entry; the practical value is cutting off an entire attack vector. A malicious registry (or a malicious insider with registry write access) cannot replace ghcr.io/yourorg/app:v1.2.3 with a trojaned version, because the replacement won't carry a valid Sigstore signature for the expected identity.

Sign at build time with the CI workflow's OIDC identity, not a long-lived key. GitHub Actions, GitLab CI, and Buildkite all issue OIDC tokens that Fulcio accepts, so the signature ties the image to the exact workflow and commit. A verifier downstream can then say only images signed by the workflow at github.com/yourorg/app/.github/workflows/release.yml on a release tag are acceptable. Hardware Security Modules and air-gapped key ceremonies are still relevant for offline artifacts, but for cloud-native workloads the keyless approach is strictly better because there's no key to steal.

How do you enforce policy at admission time?

Enforce policy at admission time with Kyverno or Gatekeeper, and treat unsigned or unscanned images as undeployable. Kyverno's verifyImages rule, Gatekeeper's ratify integration, and the Sigstore policy-controller all implement the same pattern: when the Kubernetes API server receives a Pod creation request, an admission webhook inspects the images referenced, fetches their signatures and attestations, evaluates policy, and either admits or rejects.

A production-grade policy does more than check for signature presence. It verifies the signing identity matches an expected OIDC subject (pinning the specific workflow or account allowed to sign), checks that the attached SLSA provenance comes from an approved builder, confirms the SBOM attestation is present, and can additionally enforce that the image has been scanned within the last N hours. The policy runs against digest-referenced images only; allowing image:latest in production is effectively opting out of admission control because the digest can change between admission and pull. The 2024 Sisense incident, where attackers used a compromised GitHub token to push a malicious image through to customer environments, would have been blocked at admission if the customers had enforced signature verification with an identity pin.

How do you detect drift after deployment?

Detect drift after deployment by capturing a baseline of the running image's filesystem and process tree, then alerting on deviation. eBPF-based runtime security tools (Falco, Tetragon, Tracee) observe process executions, file writes, and network connections in near-real time, and emit events when behavior diverges from the baseline. A container that started as a read-only Go binary and is now writing to /tmp/payload.so and connecting to an unexpected IP is either a bug or a compromise; either way you want to know.

The Trojaned base image class of attack — where a legitimate-looking alpine:3.19 or python:3.12 image in a third-party registry contains extra binaries — shows up at runtime even if it slipped past build-time scanning. There was a 2025 incident where an attacker uploaded images to Docker Hub with names mimicking popular projects and embedded a cryptominer in the entrypoint. Build-time scanning caught the crypto-related binaries, but several users had pulled the images before the takedown because the images passed shallow checks. Runtime observability caught the connections to mining pools before serious damage, but only for users who had instrumented their clusters. Drift detection is the last defensive layer and the one that most closely mirrors what an attacker is actually trying to do.

How Safeguard.sh Helps

Safeguard.sh covers the container supply chain end-to-end and ties together the signals that usually live in separate tools. Our SBOM generation and ingestion module produces and consumes CycloneDX and SPDX attestations for every image you build, and correlates them with the SLSA provenance and Cosign signatures your CI emits. Reachability analysis cuts 60-80% of the CVE noise from image scans by tracing which vulnerable code paths are actually called by your application, so the admission-time policy and the security review both focus on exploitable risk. Griffin AI autonomous remediation rebuilds vulnerable images against patched base images and opens PRs with the updated Dockerfiles, frequently closing out critical CVEs the same day they publish. The TPRM module evaluates third-party images the same way — vendor SBOMs, attestations, and signature posture — so images pulled from partners get the same scrutiny as your own. Combined with 100-level dependency depth scanning and container self-healing that responds to runtime drift, Safeguard keeps the chain intact from Dockerfile to kubelet.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.