Product

Self-Healing Containers Now Generally Available

Self-healing containers detect, remediate, and rebuild images when CVEs appear in their dependency closure. Here is how the GA feature works in practice.

Shadab Khan
Security Engineer
7 min read

Self-healing containers are generally available across all Safeguard.sh regions. A self-healing container, in our terminology, is one where the image's underlying representation — Dockerfile, lockfile, base image pin, build config — is continuously monitored against the Gold Registry and the CVE feed, and where Safeguard will produce a rebuilt, attested image automatically when a safe fix exists.

This is a feature that has been in private preview with about 80 design-partner teams for the last nine months. Going to GA means the behavior is stable, the limits are well-understood, and the feature is in scope for FedRAMP HIGH, IL7, and SOC 2 Type II environments. This post walks through what "self-healing" actually does, what it will not do, and how the loop fits into a normal CI/CD pipeline.

What does "self-healing" mean in practice?

Answer first: Safeguard watches the image's inputs, and when a CVE becomes reachable in the closure and a safe version exists, Griffin opens a PR with the minimum change required to fix it.

The inputs Safeguard watches are the Dockerfile, any files it references (package.json, pyproject.toml, Cargo.toml, go.mod, pom.xml, and the equivalent lockfiles), the base image digest, and any multi-stage builder images. When a new CVE matches something in the closure, Safeguard re-runs Eagle classification and reachability for the image. If the CVE is reachable and there is a safe fix — a patched version in the same minor line, a Gold Registry alternative, a newer base image — Griffin produces a plan and opens a PR against the repo that owns the Dockerfile.

"Minimum change" is the guiding principle for the generated PR. If a package bump to the next patch version fixes the CVE, that is the PR. If the patch line is not fixed and a minor bump is required, the PR bumps to the minor and notes it. Major-version bumps are not attempted unless the remediation workflow explicitly allows them (the same behavior as Griffin 3.0 elsewhere). Base-image changes are a separate PR from package changes, by default, so they can be reviewed and merged independently.

The loop is idempotent. If the same CVE fires twice because a merge was reverted, Griffin re-opens the PR rather than duplicating it. If the same PR has been open for seven days without a merge decision, Griffin posts a reminder and, optionally, can route to the image owner's on-call rotation via the workflow.

Where does the rebuild happen, and what is in the attestation?

The rebuild happens in a Safeguard-managed hermetic builder by default, and in your own CI as a fall-back if you prefer. Either way, the output is attested.

In managed mode, when the fix PR merges, a Safeguard builder pulls the repo at the merge commit, runs the container build in a hermetic sandbox, scans the output with Eagle 3.0, and publishes both the image and its attestation. The attestation bundle includes an SPDX SBOM for host OS packages, a CycloneDX SBOM for application packages, a VEX statement for any known CVEs that are present but unreachable, a SLSA provenance statement, and a Lino evidence item mapped to the relevant compliance frameworks. Everything is signed with Sigstore, and the Rekor log entry is public for commercial regions and internal for Gov.

In customer-CI mode, the builder runs in your pipeline (GitHub Actions, GitLab CI, Jenkins, Buildkite) and posts the resulting image plus inputs to Safeguard for attestation. The attestation set is the same; the build trust boundary is different. Most regulated customers choose managed mode for audit traceability; most commercial customers use customer-CI mode because they already have a mature build pipeline.

# .safeguard/workflows/self-heal-image.yml
name: self-heal-image
triggers:
  - on: cve.reachable
    scope: images/api-service
actions:
  - remediate:
      model: griffin@3.0
      allow_major_bump: false
      separate_base_image_pr: true
  - rebuild:
      mode: managed
      target: ghcr.io/acme/api-service
  - attest:
      framework: ["fedramp-high","soc2"]

How does it behave for production-facing images?

Conservatively, and with a gate you control.

By default, self-healing opens a PR and stops. It does not merge, it does not push to a registry, and it does not restart your workloads. The merge is a human decision and stays that way. Once the PR is merged, the rebuild produces a new image tag — usually a -sg.N suffix on your existing tag convention — but it does not promote the image to any environment. Promotion is your existing deploy pipeline's responsibility.

For teams that want a faster path, there is an auto-merge mode that will merge the PR when the test suite passes and the change is within a configured safety envelope (patch version bumps only, no base-image changes, no file outside the dependency manifests touched). Auto-merge is off by default and requires an approver to enable at the workspace level. We have seen it used well in staging-environment workloads and dev-tooling images; we do not recommend it for production-facing images without a canary step.

A related behavior worth calling out: the reachability check is re-run against the new image before the attestation is written. If the remediation accidentally reintroduces a different reachable CVE — rare, but it happens with packages that bundle their own vulnerable dependencies — the attestation is withheld and the PR is reopened with a note. This is the failure mode that bit us the most in private preview and why reachability is part of the post-rebuild gate.

What images are good candidates, and which are not?

Good candidates: language-runtime-based images built from a standard base, with a dependency manifest that Safeguard can parse. Node, Python, Java, .NET, Go, Rust, Ruby, PHP images all work well. Distroless variants work well. Alpine works, though apk-origin packages get patched via the base-image bump rather than in-place.

Less-good candidates, where you will get value but less full-auto: images built from custom base layers that Safeguard does not have visibility into, images with significant shell-script-driven build steps, and images with binary artifacts that are not represented in a manifest. For these, self-healing will open PRs for the parts of the closure it understands and skip the parts it does not. Partial remediation is still remediation — it just leaves more for a human.

Not good candidates: legacy images without a Dockerfile in version control, images with no test suite, and images with reproducibility problems (build-time network fetches of unpinned artifacts are the most common case). Self-healing works on top of the primitives — determinism, attestation, classification — and if those are broken upstream, the loop cannot close safely.

The TPRM module extends self-healing to vendor-supplied images. If a vendor ships an image and provides an SBOM and reproducible build instructions, Safeguard can run the same loop and produce an attested rebuild; if they do not, Safeguard tracks the original image's posture and opens findings against the vendor rather than attempting a rebuild.

How Safeguard.sh Helps

Self-healing containers compose the other Safeguard primitives: Eagle 3.0 classifies, reachability analysis decides what matters, Griffin 3.0 remediates, the Gold Registry provides safe base images and packages to pin to, and Lino 2.0 writes compliance evidence into the audit trail. The managed-builder path is FedRAMP HIGH, IL7, and SOC 2 Type II validated so regulated workloads can run the same loop as commercial ones. You can drive self-healing from the web app at app.safeguard.sh, the desktop application, or the MCP Server — the workflow definitions are portable — and the Local Runner is the right tool for developers who want to test a remediation locally before it becomes a PR. In practice, teams adopting self-healing end up handling a fraction of the CVE volume they used to, and the volume they do handle is concentrated on the genuinely reachable, genuinely unfixed findings.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.