Cloud Security

AWS Signer with Notation: Designing a Trust Policy That Survives Contact

AWS Signer integrates with Notation for OCI image signing. The hard part is not signing — it is the trust policy that decides what gets to run. We walk through one that holds up.

Michael
Security Engineer
7 min read

Signing container images is the easy part. AWS Signer's profile-based signing model, plus Notation's CNCF-graduated reference implementation, will produce a signed image with about four commands of setup. The hard part is the trust policy that decides which signatures are valid at deployment time. Trust policies are where signing programs that look healthy on a slide deck collide with the operational reality of dozens of teams, mergers, rotations, expired profiles, and incident-driven rebuilds. A trust policy that is too permissive is theater. A trust policy that is too strict breaks production at three in the morning when a profile rotation propagates faster than the configuration distribution. This is a defender's guide to writing one that survives.

What is the trust policy actually doing?

When notation verify runs against an image, it consults a trust policy document on the verifying host. The document maps registry-scope patterns (which images this policy applies to) to trust stores (which signing identities are trusted) and verification levels (how strict the verification is). The verifier then checks the image's signature manifest against the policy, confirms the signing certificate chains to a trusted root or trust anchor, validates timestamping, and checks revocation. The decision the policy encodes is essentially: "for images matching pattern X, the signature must come from one of the identities in store Y, signed within the validity window, with revocation checking at level Z." Get the pattern matching wrong and you trust signatures on images you did not intend to cover. Get the trust store wrong and you accept signatures from rotated identities or unintended teams. Get the verification level wrong and you either fail open during transient outages or fail closed during legitimate revocation events.

Why is multi-team ownership the most common failure?

The minimal example in the AWS Signer documentation shows a single profile and a single trust store. Real organizations have many teams, each with their own profile, each rotating on its own schedule. The naive scaling pattern is to add every team's signing identity to a shared trust store. This works until one team's identity is compromised, at which point removing it from the shared store revokes trust for every image signed by that team, including ones that need to keep running while remediation happens. The better pattern is one trust store per signing identity, one registry-scope rule per team, and a separate "shared infrastructure" store for organization-wide signing identities like base-image signers. The registry-scope patterns then encode the ownership map: payments-service images come from the payments team's store, observability sidecars come from the platform team's store, and a compromise in one stays contained.

How does rotation actually work in practice?

AWS Signer profiles can be versioned, and the signing certificate has its own validity window inside the profile. Rotation produces a new certificate; old signatures remain valid for verification because they carry the certificate chain in the signature manifest, and the timestamping authority's countersignature proves they were created during the old certificate's validity period. The trust policy must accept both the old and new trust anchors during the transition. The mistake is to remove the old anchor on the same day the new one is rolled out. Production hosts pull the trust policy on a cache lifetime that is rarely shorter than fifteen minutes and often hours; a host that has not refreshed its policy will reject new signatures during the rotation window if the policy was flipped atomically. The right pattern is to add the new anchor first, wait at least one full cache lifetime plus a buffer, deploy a small fraction of newly signed images, verify, then remove the old anchor on a subsequent change.

{
  "version": "1.0",
  "trustPolicies": [
    {
      "name": "payments-prod",
      "registryScopes": ["registry.example.com/payments/*"],
      "signatureVerification": {"level": "strict"},
      "trustStores": ["signingAuthority:payments-signer-2026"],
      "trustedIdentities": [
        "x509.subject:CN=AWS Signer,OU=Payments,O=Example,L=Seattle,ST=WA,C=US"
      ]
    }
  ]
}

What verification level should production actually use?

Notation defines four levels: strict, permissive, audit, and skip. Production almost always wants strict, which enforces signature validity, certificate trust, expiry, and revocation. The temptation during incidents is to flip to permissive or audit "just to get through this." That temptation is a trap. A trust policy that drops to permissive during an incident is a trust policy that is functionally always permissive, because incidents recur and the override eventually becomes default. The right pattern is to keep production strict and to define a separate trust policy file for emergency rebuild workflows that is signed off by an incident commander each time it is invoked, with a TTL that auto-expires. AWS Signer's profile model supports short-lived signing profiles that can be created during an incident and destroyed afterward, leaving the durable production policy unchanged. Use that mechanism instead of relaxing verification.

How do revocation and timestamping interact?

A signature without a trusted timestamp becomes unverifiable the moment its signing certificate expires, because the verifier cannot prove the signature was created during the certificate's validity window. AWS Signer countersigns with a public timestamping authority, which means signed images remain verifiable past certificate expiry as long as the trust policy still trusts the timestamping authority and the original trust anchor. Revocation is the inverse: if a signing certificate is revoked because of a compromise, signatures that chain to it should be rejected even if the timestamp predates the revocation, because the verifier cannot tell whether the compromise happened before or after the signature was generated. The trust policy's revocation-checking configuration must therefore be aligned with the organization's incident response: hard-fail on revocation during normal operations, with a documented and time-limited override path for the case where the revocation list itself is the cause of the outage.

What should the deployment-time enforcement look like?

Signatures matter only if something checks them. The realistic enforcement points in AWS are: ECR image scanning with signature verification enabled, EKS admission controllers that call notation verify before pod creation, ECS task definitions that gate on a CodeBuild verification step, and Lambda container images verified at deploy time by the deployment pipeline. The strongest posture is multi-point enforcement: verify at push (so unsigned images cannot enter the registry), verify at promotion between environments (so a staging-signed image cannot be promoted to production without resigning), and verify at runtime admission (so a registry tampering event cannot bypass the earlier checks). Each layer's trust policy can be slightly different — staging may accept a broader set of identities than production — but they must be derived from a single source of truth, distributed through configuration management with provenance, and audited for drift weekly.

How Safeguard Helps

Safeguard inventories every Notation trust policy across your AWS organization, normalizes them against a baseline, and flags drift between production and pre-production policies that could let an unsigned or wrongly-signed artifact reach a customer. Policy gates block infrastructure-as-code changes that lower verification levels, broaden registry scopes, or add unmanaged trust anchors. Griffin AI maps every running workload to the signing identity that produced its image, the date and trust path of the signature, and the timestamping authority that countersigned it, giving incident response a verified-by-default view of provenance when a profile compromise is suspected. Continuous monitoring of AWS Signer profiles, ECR signature manifests, and admission-controller decisions turns the trust policy from a static configuration file into a defended, audited control surface that survives team changes, rotations, and the inevitable late-night incident.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.