Container Security

Cosign for Container Signing: A Production Setup

A working production setup for Cosign image signing across CI, registries, and Kubernetes admission, including the parts that break at scale and how to recover.

Shadab Khan
Security Engineer
7 min read

Cosign is the default answer to "how do we sign container images" and has been for about three years. The tutorials will get you a signed image in an afternoon. What they will not get you is a production setup that survives for a year without someone calling at 3 a.m. because a policy controller is rejecting every deploy.

This is the setup that has worked in two different organizations with three-digit engineer counts, with notes on the parts that are not obvious from the quickstart.

Should you use keyed or keyless signing?

Keyless, with almost no exceptions. This is the first real decision and it is the one teams get wrong.

Keyless signing with Sigstore uses OIDC identity at the moment of signing — your GitHub Actions runner, your GitLab pipeline, your Buildkite job proves its identity to Fulcio, which issues a short-lived certificate, and the signature is recorded in the Rekor transparency log along with the OIDC claims. Verification later checks that the signer identity matches what you expected.

Keyed signing means you generate a long-lived keypair, protect the private key, and sign with it. Every organization I have seen start with keyed signing has ended up with the same problems: the private key gets committed to a repo once, it lives in a KMS that costs more than expected, rotation is a quarterly fire drill, and the team cannot tell from a signature alone whether the key was compromised during the signing window.

Keyless avoids all of that. The "key" is the signer's identity — https://github.com/acme/api/.github/workflows/release.yml@refs/heads/main signs images produced by that workflow, and your verification policy says "only accept signatures from that identity." There is no secret to rotate, no KMS bill, and no recovery problem.

Use keyed signing only in air-gapped environments where Fulcio is unreachable, or when a regulator specifically demands a non-Sigstore-root PKI.

How do you structure CI so signing does not become a bottleneck?

Sign in a dedicated job that runs after your tests pass, not inside the build job, and cache the Fulcio certificate for reuse across signatures in the same pipeline run.

The common mistake is to run cosign sign inline in the build step. This works for one image. It breaks down when your release pipeline builds an image, a multi-arch manifest, and four SBOM attestations, because each cosign sign call re-authenticates to Fulcio and each signature writes a new entry to Rekor. You end up with a five-minute release becoming a fifteen-minute release and intermittent rate-limit errors from the public Rekor instance.

The pattern that works: one signing job that takes the digests from the build job as inputs, signs them in parallel with a single OIDC token exchange, and pushes signatures, SBOMs, and attestations together. If you are at a scale where the public Sigstore root is rate-limiting you, run a private Sigstore stack — scaffold, Rekor, Fulcio — inside your cluster. It is about a week of setup and it gets you off of the public SLA permanently.

A small but important detail: Cosign writes signatures back to the registry as co-located OCI artifacts. Your registry must allow that. GitHub Container Registry does. Docker Hub does. Artifactory does if you are on a recent version. ECR supports it as of late 2024. Self-hosted Harbor works. If you are on an older private registry, signing will silently succeed and pushing the signature will fail in a confusing way.

What does a real verification policy look like?

A real verification policy checks four things, not one. First, that a signature exists. Second, that the signature comes from an identity you expect. Third, that the signature was recorded in Rekor within a recent time window. Fourth, that the attached SBOM attestation matches the image being verified.

Most teams stop at step one and call themselves done. This is worse than no signing, because it produces a green dashboard with no security meaning. An attacker who can push to your registry can also create a valid Sigstore signature from any OIDC identity they control. The signature being present proves nothing. The signature being from your release pipeline proves something.

A Kyverno verifyImages policy that actually works looks like this in shape: it specifies the expected Fulcio issuer, the expected subject pattern (typically a regex matching your org's repo path and branch), and a Rekor log entry requirement. Additional layers verify that an in-toto attestation of type slsaprovenance is present, and that the attestation's builder identity matches. Layer on a requirement that the attestation's materials include a source commit matching the commit you expect for the release, and you have made it substantially harder for anyone to forge a deployable artifact.

How do you handle signature verification at scale?

Cache verification results at the admission controller level, not at the individual pod start level. Otherwise every pod restart does a fresh Rekor lookup and you will feel it during a cluster-wide rollout.

Cosign verification involves a network call to Rekor to confirm the transparency log entry. If you have 2,000 pods restarting during a node pool replacement, that is 2,000 Rekor lookups. Public Rekor will rate-limit you. Your own Rekor will get slow. Either way the cluster slows down.

The fix is that admission controllers like Kyverno and the Sigstore Policy Controller cache verification results by image digest. Once an image is verified, subsequent admissions for the same digest are fast. Make sure this cache is enabled — Kyverno has it on by default in recent versions but double-check — and set the TTL appropriately. Too short and you will hit the upstream every hour. Too long and a signature revocation will not propagate. An hour is usually fine.

For very large clusters, run a separate signature verification service that pre-warms the cache ahead of a release, so the first deployment after a push is not the one paying the verification latency cost.

What do you do when a signature does not verify?

Do not panic, do not immediately disable the policy, and do look at whether Rekor is actually reachable. The most common cause of verification failure in production is not a signature problem — it is a network problem between your cluster and Rekor, or a certificate expiry on your private Rekor instance. I have walked into war rooms twice where the response was to remove the admission policy because "signatures are broken" when the actual problem was a DNS change that broke the connection to the transparency log.

Have a break-glass process documented before the incident happens. A named senior engineer can enable a 15-minute admission policy bypass with a Slack confirmation, and the policy reverts itself automatically. Every use logs a ticket. Without a documented break-glass, teams will disable the policy and forget to re-enable it, and your supply chain program quietly dies.

How Safeguard.sh Helps

Safeguard.sh consumes Cosign signatures and Rekor entries as part of its reachability graph, so you are not just verifying that an image is signed — you are verifying that the specific reachable code paths in that image are from the build provenance you expect, which combined with reachability reduces alert noise by 60 to 80 percent. Griffin AI writes Kyverno and Sigstore Policy Controller policies directly from your CI configuration and keeps them synchronized as signing identities change, pulling in SBOMs at 100-level dependency depth and correlating them with signed attestations. TPRM tracks the signing identity for every vendor image entering your environment, and the container self-healing module automatically verifies and rolls out signed patch releases when upstream projects publish new signatures, so signature verification is both a gate and an accelerator rather than just a gate.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.