The phrase "zero-CVE base image" used to draw eye-rolls in security review meetings. In 2026 it has become a procurement requirement, a regulatory talking point, and an engineering project plan with hundreds of line items. The reason is plain. A single Debian-slim base in production typically ships with one hundred and sixty unfixed CVEs. A hardened, minimal base ships with zero or near-zero, dramatically shrinking the surface that triage teams have to argue about every Monday morning.
Rolling out zero-CVE base images org-wide is not a tooling exercise. It is a migration that touches build pipelines, registry policy, deployment manifests, on-call documentation, and team identity. This post describes how we phased the rollout across roughly four hundred services, what broke, and what survived contact with reality.
Why The Old Base Image Strategy Stopped Working
Most engineering organisations inherit a base image strategy from whichever team published the first internal Dockerfile. That strategy usually looks like a slim Debian or Ubuntu base, a curl-and-untar dance to install language runtimes, and a final stage that copies the application binary in. It worked for years. It now fails in three specific ways.
First, scanning produces noise that no one can act on. The base layer alone contributes hundreds of unfixed and unfixable findings, drowning out the few real exposures that exist higher in the layer stack.
Second, regulators have caught up. NIST SP 800-218A and the European Cyber Resilience Act both expect demonstrable evidence that producers are not shipping software with known critical vulnerabilities at release time. "We are aware of them" is no longer an answer.
Third, customer security questionnaires have started asking for image SBOMs and unfixed CVE counts at quarter close. Sales engineering does not want to spend its week explaining glibc.
Choosing The Pilot
The pilot has to teach you things, not validate things you already believe. We picked three services with deliberately mismatched profiles. A latency-critical Go API on the edge, a Java batch job with native dependencies, and a Python data-science image with a long pip install graph.
Pilot scope had a hard rule: no service entered the pilot unless its team had bandwidth to actually rebuild and re-test in the same sprint. Pilots are for learning, and a pilot that languishes for six weeks teaches you nothing except that calendars are hard.
For each pilot service we tracked four numbers before and after migration. The unfixed CVE count, the image size, the cold start time, and the build duration. Three of those numbers improved by margins large enough to silence objections. Build duration was a wash. We expected that.
Registry Mirroring And Provenance
Pulling hardened images directly from a vendor registry on every build worked for the pilot and immediately fell over at scale. Registry rate limits, transient network issues, and the sudden visibility of supply chain dependencies on third-party uptime made mirroring non-negotiable.
We stood up an internal mirror with the following rules. Every hardened image was pulled, verified, re-tagged with an internal digest reference, and stored. Pulls from the upstream vendor were allowed only from the mirror sync job. All other clusters and CI runners pulled from the mirror.
Provenance verification ran at sync time. Each image had to come with a Sigstore signature chain that traced back to the publisher, an in-toto attestation of build steps, and an SBOM. If any of those were missing or unverifiable, sync failed and an incident was created. Roughly two percent of sync attempts in the first month failed, almost all because of upstream tag races where a digest got rotated mid-pull.
Rewiring The Build Pipeline
Most Dockerfiles in our estate started with a FROM line pointing at a tag like the slim Debian or Alpine variant, sometimes pinned by digest, often not. The new policy was simple. Every FROM line had to reference an image from the internal mirror, pinned by digest, with an inline comment naming the upstream source.
Enforcement happened at three points. A pre-commit hook in template repositories caught obvious violations. A CI step ran a structured parse over every Dockerfile in the repo and refused to proceed if a non-mirrored or non-digested base appeared. The cluster admission controller refused to schedule any pod whose image digest had not been seen by our mirror sync.
The third gate caught the most interesting violations. Engineers who had figured out how to bypass CI by building locally and pushing manually still got stopped by admission. That gate also caught the case where a vendor image was pulled into the mirror, found to be missing its attestation, and quarantined.
The Distroless Question
A natural question during the rollout was whether to standardise on distroless or chiseled images instead of merely "hardened" ones with a shell. The answer came out differently for different runtimes.
For Go binaries, statically linked, no shell, the answer was distroless from day one. It was strictly better in every metric. For Java, the JVM still wanted a libc and a small set of system utilities for diagnostics, so we used a minimal hardened base. For Python, the calculus was harder. Many libraries still link against system packages at runtime, and we ended up running two parallel base images for a quarter while we sorted the dependency mess.
The lesson was not that distroless is universal. The lesson was that "minimal" has to be defined in terms of what your runtime actually needs at runtime, not what your build process needs at build time.
Drift Watch And On-Call
The hardest part of the rollout was not migrating images. It was preventing drift back. Engineers under deadline pressure will reach for a familiar base and a quick install of the package they need. Without continuous enforcement the estate slides backwards within a quarter.
We added two controls. A nightly job ran across all running workloads, captured their image digests, and compared them to the approved list. Any drift produced a finding routed to the owning team. A second job tracked which approved images had not been refreshed against their upstream in fourteen days, and queued a refresh build automatically.
On-call documentation needed updates too. The hardened images do not include a shell in many cases, so the standard incident playbook of "kubectl exec into the pod and look around" stopped working. We rewrote the runbooks around ephemeral debug containers and structured log queries. After two months, on-call engineers actually preferred the new flow because it forced cleaner observability.
How Safeguard Helps
Safeguard plays four concrete roles in this rollout. The image scanner ingests every internal mirror sync and compares the SBOM and CVE state to the policy gate, refusing to admit images that regress on the unfixed CVE count or that arrive without verifiable provenance. The continuous monitor tracks drift across deployed workloads and flags any pod whose base layer hash falls outside the approved set. The remediation engine generates pull requests against application repositories when an upstream hardened image gets a security refresh, so teams do not have to chase updates. And the executive dashboard reports the org-wide unfixed CVE count week over week, giving leadership a single number to track the migration against, rather than a swarm of per-team JIRA queries. The result is a rollout that survives its first quarter and keeps drifting in the right direction.