Build Security

Software Provenance: An End-to-End Guide

Provenance answers where software came from and how it was built. Here is how to implement end-to-end provenance tracking from source to deployment.

Shadab Khan
Security Engineer
6 min read

Software provenance is the ability to trace an artifact back to its origins -- the source code it was built from, the build system that produced it, the dependencies it includes, and every transformation it underwent along the way. It is the supply chain equivalent of a chain of custody.

In a world where software supply chain attacks are increasingly common, provenance is moving from "nice to have" to "essential." If you cannot prove where your software came from, you cannot prove it was not tampered with.

This guide covers end-to-end provenance implementation, from source code to deployed artifact.

What Provenance Proves

Good provenance answers four questions:

  1. What source code was used? The exact commit, repository, and branch.
  2. Who built it? The identity of the build system (not just the person who triggered the build).
  3. How was it built? The build configuration, environment, and steps.
  4. What went in? All inputs -- source code, dependencies, build tools, base images.

Each answer should be backed by cryptographic evidence, not just metadata that could be forged.

Layer 1: Source Provenance

Source provenance establishes that your code originated from your repository and was authored by authorized contributors.

Signed Commits

Git commits are trivially forgeable by default. The --author flag lets anyone claim to be anyone. Signed commits add a cryptographic signature that binds the commit to a verified identity.

# Configure GPG signing
git config --global commit.gpgsign true

# Or use SSH signing (simpler key management)
git config --global gpg.format ssh
git config --global user.signingkey ~/.ssh/id_ed25519.pub

For organizations, enforce signed commits through branch protection rules. GitHub, GitLab, and Bitbucket all support requiring signed commits on protected branches.

Gitsign (Sigstore for Git)

Gitsign uses Sigstore's keyless signing for git commits. Instead of managing GPG or SSH keys, developers authenticate through their identity provider (Google, GitHub, Microsoft) and receive a short-lived certificate for signing.

# Install gitsign
go install github.com/sigstore/gitsign@latest

# Configure
git config --global commit.gpgsign true
git config --global gpg.x509.program gitsign
git config --global gpg.format x509

The advantage is that signing is tied to organizational identity (your corporate email/SSO) rather than personal key management. The signing event is recorded in the Rekor transparency log, providing an immutable audit trail.

Source Code Archival

For regulatory or compliance purposes, you may need to prove that the source code existed in a specific state at a specific time. Git commit hashes provide content-addressable identification, but they do not prove when the commit was created (timestamps can be forged).

Solutions:

  • RFC 3161 timestamps -- cryptographic timestamps from a trusted third party
  • Rekor transparency log -- record commit hashes in Sigstore's transparency log
  • Git signed tags -- sign release tags with a verifiable identity

Layer 2: Dependency Provenance

Your software is not just your code. It includes every dependency, and each dependency's provenance matters.

Lock File Integrity

Lock files (package-lock.json, go.sum, Cargo.lock, poetry.lock) record the exact versions and integrity hashes of your dependencies. Protecting lock file integrity is critical:

  • Commit lock files to version control. Always.
  • Review lock file changes carefully. Automated tools can flag unexpected changes.
  • Verify hashes during install. Use npm ci (not npm install), pip install --require-hashes, or equivalent strict installation commands.

Dependency SBOM

Generate an SBOM for your dependencies at build time, not just at release time. The build-time SBOM captures the exact dependency resolution that went into the build.

Include:

  • Package name and version
  • PURL for universal identification
  • SHA-256 hash of the actual package artifact (not just the version number)
  • Download location (registry URL)
  • License
  • SLSA provenance (if available from the package)

Verifying Upstream Provenance

Increasingly, packages publish their own provenance. npm, PyPI, and Go modules all support some form of provenance attestation. When available, verify it:

# Verify npm package provenance
npm audit signatures

# Verify container image with cosign
cosign verify --certificate-identity=... --certificate-oidc-issuer=... image:tag

For dependencies that do not publish provenance, hash verification against the registry is your baseline. It proves you got the same artifact the registry served to everyone else, even though it does not prove the registry itself was not compromised.

Layer 3: Build Provenance

Build provenance is where SLSA comes in. It proves that your build system produced this specific artifact from these specific inputs.

Generating Build Provenance

For GitHub Actions (SLSA L2/L3):

# .github/workflows/release.yml
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.0.0
        with:
          base64-subjects: |
            $(sha256sum dist/app.js | base64 -w0)

For Google Cloud Build:

steps:
  - name: 'gcr.io/cloud-builders/npm'
    args: ['ci']
  - name: 'gcr.io/cloud-builders/npm'
    args: ['run', 'build']
options:
  requestedVerifyOption: VERIFIED

Reproducible Builds

The gold standard for build provenance is a reproducible build -- given the same source code and build environment, any party can produce a bit-for-bit identical artifact. If your build is reproducible, provenance verification is trivial: rebuild and compare hashes.

Achieving reproducible builds requires:

  • Eliminating non-determinism (timestamps in artifacts, random orderings, etc.)
  • Pinning build tool versions
  • Using hermetic build environments (no network access during build)
  • Documenting the complete build environment

Full reproducibility is hard and not always worth the effort. But partial reproducibility -- ensuring builds are deterministic given the same inputs -- is achievable and valuable.

Layer 4: Deployment Provenance

The final layer ensures that what you deployed is what you built.

Artifact Signing

Sign your build artifacts before deployment:

# Sign a container image
cosign sign --key cosign.key your-registry/app:v1.0.0

# Or keyless signing with Sigstore
cosign sign your-registry/app:v1.0.0

Deployment Verification

Your deployment system should verify signatures and provenance before deploying:

# Kubernetes admission policy
apiVersion: policy.sigstore.dev/v1alpha1
kind: ClusterImagePolicy
metadata:
  name: require-signed-images
spec:
  images:
    - glob: "your-registry/*"
  authorities:
    - keyless:
        identities:
          - issuer: https://accounts.google.com
            subject: build@your-org.iam.gserviceaccount.com

Kubernetes admission controllers (like Sigstore's policy-controller or Kyverno) can enforce that only signed, verified artifacts are deployed to your cluster.

Deployment Record

Record what was deployed, when, where, and by whom. This closes the provenance chain -- from source commit to production deployment, every step is documented and verifiable.

Putting It All Together

A complete provenance chain looks like this:

  1. Developer signs commit with gitsign -> recorded in Rekor
  2. CI/CD builds artifact with pinned dependencies -> SLSA provenance generated
  3. Build-time SBOM generated with dependency hashes
  4. Artifact signed with cosign -> recorded in Rekor
  5. Admission controller verifies signature and provenance
  6. Deployment recorded with artifact hash, timestamp, and deployer identity

At any point, you can trace the deployed artifact back through each step to the original source commit, verifying cryptographic integrity at every link.

How Safeguard.sh Helps

Safeguard.sh provides the provenance verification and tracking layer for your software supply chain. Our platform verifies SLSA provenance for your dependencies, validates Sigstore signatures, and maintains a complete provenance record for every artifact in your inventory. Guardrails can enforce provenance requirements -- block unsigned artifacts, require minimum SLSA levels, and flag dependencies without verifiable provenance. When an auditor asks "where did this software come from," Safeguard provides the cryptographically verified answer.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.