Container Security

GCP Binary Authorization Policy Patterns

Policy design patterns for GCP Binary Authorization that hold up in production: attestor topology, exception handling, continuous validation, and the shapes that stop a deploy-time compromise without blocking legitimate rollouts.

Shadab Khan
Security Engineer
7 min read

Binary Authorization has been generally available on GCP since 2018 for GKE and October 2021 for Cloud Run. The feature is well documented. The policy patterns that actually hold up in production are not. This post is a collection of the patterns I have seen work, the patterns I have seen fail, and the compromises I have made when the clean design met the reality of a 3 a.m. incident.

The "one attestor per concern" pattern

The most common policy shape I see in green-field deployments is a single attestor called production that signs images that have "passed all checks." This is a bad pattern. It conflates independent signals into a single signature, and it gives you no useful information when a deploy is rejected.

The better shape is one attestor per independent concern. A typical production policy has four attestors: built-by-cloudbuild signs that the image came from an approved Cloud Build trigger, vulnerability-scanned signs that the image passed Artifact Registry vulnerability scanning at the agreed severity threshold, sbom-generated signs that an SBOM was produced and stored, and integration-tested signs that the image passed the integration test suite. Each attestor has its own KMS-backed key, its own signer identity, and its own signing pipeline.

The policy then requires all four attestations. When a deploy fails, the audit log tells you exactly which attestation was missing, which turns a "why won't this deploy" conversation into a ten-second answer. Granularity also means you can change the policy for a subset of clusters, rolling out a new requirement without touching everything at once.

Attestor keys in Cloud KMS, with rotation

Each attestor is backed by a signing key. Use Cloud KMS asymmetric keys (EC P256 is my default), with the attestor's service account granted roles/cloudkms.signerVerifier on the specific key version. The verification side grants roles/cloudkms.publicKeyViewer to the Binary Authorization service agent, which is service-PROJECT_NUMBER@gcp-sa-binaryauthorization.iam.gserviceaccount.com.

Rotation matters for attestor keys because a compromised signer defeats the whole policy. My default rotation cadence is 180 days for production attestors, with an overlap pattern where the new key version is added to the attestor's public-key list 30 days before the old version is retired. During the overlap, the signer switches to the new key immediately but the verifier accepts both. This avoids the window where freshly signed images are rejected because the verifier has not yet seen the new public key propagate.

Emergency overrides, done correctly

The first objection every production team raises is "what if there is an incident at 3 a.m. and we need to ship a hotfix that has not been through the normal pipeline." Binary Authorization has two answers to this, and the difference matters.

The first is breakglassKeywords, which lets a deployer pass a special annotation to bypass the policy. This is logged prominently in Cloud Audit Logs and triggers a high-severity finding in Cloud Logging. I enable this path and route the audit events to PagerDuty so the security team gets notified the moment it is used.

The second is a time-boxed exemption that relaxes the policy for a specific service or namespace for a specific window. I build this as a Cloud Function that takes a ticket ID, a cluster, a namespace, and a duration, and applies a policy update with a Cloud Scheduler job that reverts it. The ticket ID goes into the audit log so the paper trail is complete.

The wrong pattern, which I see too often, is to have a permanent exception for a namespace called emergency that nobody cleans up. That is not a safety valve; it is a permanent hole.

Scope is the hard part

Binary Authorization policies attach to projects or to specific clusters. On GKE, the policy can be scoped at the cluster level and can set per-namespace rules with Kubernetes annotations. On Cloud Run, the scoping is per project with per-service exceptions.

The pattern that works is to set a strict default at the project level, and use cluster or service rules to explicitly relax for well-understood exceptions. Strict default means defaultAdmissionRule with evaluationMode: REQUIRE_ATTESTATION and enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG. The opposite, DRYRUN_AUDIT_LOG_ONLY, is fine for the first two weeks of rollout, but it should never be the steady state. A dry-run policy is a policy that pretends to do work without doing it, and teams treat it accordingly.

For Kubernetes namespaces that legitimately run third-party images (a monitoring namespace running Grafana, for example), I use a separate attestor third-party-approved that signs images from an explicit allowlist of public registries. The signing pipeline is a Cloud Build trigger that pulls the upstream image, verifies the publisher's signature if one exists, scans it, and signs. This keeps the policy consistent (everything has to be signed) while accommodating the real world (you do not build Grafana yourself).

Continuous validation catches what admission cannot

Admission control runs once, at the moment of deploy. A policy that was satisfied at deploy time may not be satisfied now: the attestor key may have been rotated and the old key revoked, the image may have been deleted from the registry, a new vulnerability may have pushed the image over the severity threshold.

Binary Authorization's continuous validation feature, which reached general availability for GKE in January 2023, evaluates running pods against the current policy and emits Cloud Logging events when a pod no longer complies. Enable this on every production cluster. The log entries go to a security sink, and I have a Cloud Function that enriches them with the owning team and opens a Jira ticket automatically.

Continuous validation does not block the running workload, which is the right default. The job you are protecting is already running; killing it abruptly would be worse than the drift you just detected. The ticket and the paper trail let the owning team do the right thing on their schedule.

Write the policy in code

Binary Authorization policies can be edited through the Cloud Console. Do not edit them through the Cloud Console. Keep the policy in Terraform, with the attestor definitions, key references, and rule set under version control. Policy changes go through pull requests with a CODEOWNERS rule that requires platform-security approval.

The Terraform resource types are google_binary_authorization_policy and google_binary_authorization_attestor. The policy resource's admission_whitelist_patterns is where you list the images that are exempt from evaluation entirely (typically the gke-system images and Google-managed addons). Keep that list short and reviewed, because it is a trust anchor for the whole policy.

Test the policy, including the negative cases

Every policy should have a test suite. The positive test is "a compliant image deploys." The more important tests are the negative ones: an image with no attestations is rejected, an image with the wrong signer is rejected, an image signed by a revoked key is rejected. I keep a test harness in a dedicated project that deploys a known-bad image to a dedicated cluster and asserts the expected rejection.

Run the test suite on every policy change and on a weekly cron. The cron catches drift, because Binary Authorization's behaviour is enforced by service agents that Google updates, and silent behavioural changes have happened. They are rare, but a passing weekly test is cheap insurance.

How Safeguard Helps

Safeguard tracks every Binary Authorization policy across your GCP organization, records the attestor topology and key rotation cadence, and correlates attestation events with the SBOMs and CVE findings they are supposed to reflect. Continuous validation events, breakglass uses, and policy changes all land in the same timeline, so you can see drift between intent and enforcement without scraping Cloud Audit Logs yourself. Policy gates evaluate attestor-key freshness, signing-identity drift, and continuous validation coverage, and the findings attach to the specific cluster or service they affect. The result is a Binary Authorization program where the policy you wrote is the policy that is actually running.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.