DevSecOps

GCP Cloud Build Hardening in Production

Lessons from hardening Cloud Build pipelines in production environments: private pools, least-privilege service accounts, provenance, and the controls that actually stop lateral movement.

Shadab Khan
Security Engineer
7 min read

Cloud Build is one of those GCP services that quietly accumulates privilege over time. You start with a single trigger that builds a container image, and eighteen months later the default Cloud Build service account has roles/editor on three projects, reads from four Secret Manager secrets, and pushes into five Artifact Registry repositories. Nobody planned it that way. It is how pipelines grow.

I spent most of the first half of 2024 helping a payments company retrofit Cloud Build hardening across about 140 build triggers. The work was tedious, but the patterns that emerged were consistent enough that they generalize. This post walks through the configuration that actually mattered, the misconfigurations that repeatedly turned up, and the controls that paid for themselves the first time an auditor asked to see evidence.

Stop using the default Cloud Build service account

The legacy default service account, PROJECT_NUMBER@cloudbuild.gserviceaccount.com, carried broad roles out of the box for years. Google began tightening this in April 2024, but the billions of existing projects created before then still have the old grants, and customers who opted out of the new default behaviour also retain them. If you have not audited yours, assume it has roles/cloudbuild.builds.builder plus a handful of inherited permissions from IAM policy inheritance.

The first hardening step is to create a dedicated user-managed service account per trigger family, and point the trigger at that account using the serviceAccount field in the build configuration. A trigger that builds an internal tool image should not share an identity with a trigger that deploys to production Cloud Run. Grant each service account only the specific roles it needs, and attach conditions where possible. For an image-build trigger, that typically means roles/artifactregistry.writer scoped to one repository, roles/logging.logWriter, and roles/storage.objectViewer on the source bucket if you are using a GCS source. Nothing else.

The second step is to disable automatic role grants. In the organization policy, set constraints/iam.automaticIamGrantsForDefaultServiceAccounts to enforced. This prevents new projects from inheriting the broad default grants, and it is the single most effective organization-wide control I deployed in 2024.

Use private pools, and treat them like production compute

Public Cloud Build workers share a pool of ephemeral VMs that Google manages. They are fine for open-source builds and early prototypes, but they introduce two problems for production workloads. First, builds run on public IP ranges, which means your VPC firewalls see them as external traffic. Second, the network path to your internal artifact registries and package mirrors leaves Google's control plane momentarily, even though it stays inside Google's network. For regulated workloads that is enough to complicate the control narrative.

Private pools, introduced fully in 2021 and matured through 2023, solve this by running workers inside a VPC peering connection. I now treat the private pool configuration as infrastructure code. Each pool has a fixed egress CIDR that I allowlist on upstream registries, a private service connection to Artifact Registry, and Cloud NAT for the rare outbound dependency that cannot be pulled from an internal mirror. I size the pool at e2-standard-4 for most workloads and reserve e2-highcpu-32 workers for image builds that run heavy static analysis.

The one configuration that surprised the security team was that private pools do not automatically inherit VPC Service Controls. You have to add the pool's project to the perimeter explicitly, otherwise builds can still egress to the public Cloud Storage API and exfiltrate source code. This was a finding in the first VPC-SC audit I ran.

Lock the build configuration itself

Cloud Build reads cloudbuild.yaml from the source repository. Anyone who can commit to the repository can change the build steps. That is the designed behaviour, but it means that branch protection on your repository is part of your build-security posture. I enforce signed commits on main, require pull request review for any change to cloudbuild.yaml or the Dockerfiles, and gate merges on a CODEOWNERS entry that names the platform-security team.

Inside the build configuration, pin every step image to a digest. gcr.io/cloud-builders/docker is not a version; gcr.io/cloud-builders/docker@sha256:... is. The convenience images Google publishes are rebuilt periodically, and while I have never seen one silently compromised, the provenance assumption is weaker than people realize. Digest pinning takes five minutes per trigger and removes an entire class of unpinned-image drift.

For secrets, use the secretEnv mechanism that pulls from Secret Manager at build time, and scope the Secret Manager IAM to the service account that runs the build. Never use --build-arg for secrets, because Docker records build args in image history by default, and I have watched a junior engineer ship a production API key into a public image that way.

Generate and verify provenance

Cloud Build has native support for SLSA v1.0 provenance as of September 2023. When you enable it on a trigger, every build emits a signed in-toto attestation that records the source commit, the builder identity, and the resolved dependencies. The attestation is stored alongside the artifact in Artifact Registry and can be verified with gcloud artifacts docker images describe --show-provenance.

Provenance is only useful if something consumes it. I wire the attestation into the Binary Authorization policy so that Cloud Run and GKE only admit images with a valid Cloud Build attestation from the expected builder pool. The policy uses the attestor's public key stored in a Cloud KMS key ring with roles/cloudkms.cryptoKeyVerifier granted to the Binary Authorization service agent. The first time this policy fired in production, it caught an engineer trying to deploy a locally built image during an incident, which was exactly the case the control was designed to prevent.

Audit, alert, and rehearse

Cloud Audit Logs capture every Cloud Build API call. The interesting events are google.devtools.cloudbuild.v1.CloudBuild.CreateBuild with a non-standard service account, UpdateBuildTrigger on a production trigger outside change windows, and any SetIamPolicy that touches a build-related resource. I route these to a Cloud Logging sink that writes to a BigQuery dataset in a separate security project, and I run a scheduled query every six hours that alerts on anomalies.

For detection content that catches active abuse, the two high-value rules are: a build that runs with a service account it has never used before, and a build that egresses to a destination not in the private pool's allowlist. The first is trivial to write against the audit logs, the second requires VPC Flow Logs enabled on the pool's subnet.

Finally, rehearse the recovery path. If a build trigger is compromised, can you disable it in under five minutes? Do you know which images it produced in the last 30 days? Can you revoke the attestor key without breaking legitimate deployments? I ran a tabletop exercise in April and the answer to all three was slower than it should have been. Documenting the runbook and practising it was the cheapest part of the whole project.

How Safeguard Helps

Safeguard ingests Cloud Build attestations and correlates them with the SBOM and vulnerability data for every image you ship. When a build trigger, service account, or attestor key changes unexpectedly, Safeguard flags the deviation against your baseline before the artifact reaches production. Policy gates evaluate provenance freshness, signing-key rotation, and builder-identity drift, and the findings surface in the same dashboard your developers already use for CVE triage. The result is a single place to see whether your Cloud Build pipelines are producing artifacts that meet the supply-chain controls you have committed to on paper.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.