Build Security

Build Reproducibility: A Verification Guide

If you cannot reproduce a build bit-for-bit, you cannot verify it was not tampered with. This guide covers deterministic builds, reproducibility verification, and why it matters for supply chain trust.

The SolarWinds attack demonstrated a fundamental problem with modern software distribution: users had no way to independently verify that the binary they received corresponded to the source code the vendor published. The attackers injected malicious code during the build process, and the resulting binary was signed and distributed through normal update channels. The source code in version control was clean. The build artifact was compromised.

Reproducible builds solve this problem. If you can take the same source code, the same dependencies, and the same build environment and produce a bit-for-bit identical output, then anyone can verify that a binary was built from a specific set of sources. If the outputs differ, something changed -- and that something needs investigation.

What Makes Builds Non-Reproducible

Most build processes are not reproducible by default. The same source code built on two different machines, or even on the same machine at two different times, typically produces different binaries. The differences come from several sources:

Timestamps. Compilers, linkers, and archive tools embed build timestamps in their output. A build at 10:00 AM produces a different binary than the same build at 10:01 AM. This is the most common source of non-reproducibility and the easiest to fix.

File ordering. When a build tool processes files in a directory, the order depends on the filesystem. Different filesystems (ext4 vs. NTFS vs. APFS) return directory listings in different orders, producing different output even from identical inputs.

Randomization. Some compilers use randomized algorithms for optimization decisions, hash table layouts, or symbol ordering. These produce valid but non-identical outputs across runs.

Absolute paths. Many build tools embed the absolute path of source files in debug information, error messages, or metadata. Building in /home/alice/project produces different output than building in /home/bob/project.

Environment variables. Build tools may capture environment variables (locale, timezone, username) and embed them in output.

Non-deterministic dependency resolution. If dependencies are not pinned to exact versions with integrity hashes, different builds might use different dependency versions.

Compiler version differences. Even minor compiler version changes can alter code generation, producing functionally identical but byte-different binaries.

Achieving Reproducibility by Language

C/C++

C and C++ builds are among the hardest to make reproducible because of the complexity of the toolchain.

Use -Werror=date-time to catch accidental use of __DATE__ and __TIME__ macros. Better yet, use SOURCE_DATE_EPOCH -- when this environment variable is set, GCC and Clang use its value instead of the current time for __DATE__ and __TIME__.

Use -ffile-prefix-map to strip or normalize source directory paths from debug info and error messages. This replaces absolute paths with a canonical prefix.

Stabilize link order. Ensure that object files are passed to the linker in a deterministic order. This usually means sorting file lists explicitly rather than relying on glob expansion.

Pin the toolchain. Use a containerized build environment with a specific compiler version, system libraries, and tool versions. Nix, Guix, and Docker all work for this purpose.

Java

Java builds have their own reproducibility challenges, primarily around JAR file construction.

JAR file metadata. JAR files are ZIP archives, and ZIP entries contain timestamps and file ordering. Use the --reproducible flag in modern versions of jar, or configure Maven/Gradle to normalize ZIP entries.

Maven reproducible builds. As of Maven 3.9.x, add project.build.outputTimestamp to your POM to enable reproducible builds. Gradle supports reproducibility through the reproducibleFileOrder and preserveFileTimestamps settings in archive tasks.

Annotation processors. Some annotation processors generate code with timestamps or random identifiers. Audit your annotation processors and configure them for deterministic output.

Go

Go is one of the more reproducible languages out of the box. Since Go 1.13, builds are mostly reproducible if you control for:

GOPATH and module cache location (use -trimpath to strip local paths)
Go toolchain version (pin it in go.mod with the toolchain directive)
CGo usage (CGo invokes the system C compiler, reintroducing all C/C++ reproducibility issues)
Build tags and GOOS/GOARCH environment variables

JavaScript/Node.js

Node.js applications are typically distributed as source bundles rather than compiled binaries, which sidesteps many reproducibility issues. However:

Webpack/esbuild/Rollup output may include timestamps, chunk hashes based on content plus build metadata, or randomized module IDs. Configure your bundler for deterministic output.

node_modules layout. Different versions of npm, yarn, and pnpm produce different node_modules layouts from the same lock file. Pin your package manager version (e.g., via Corepack) and use npm ci or pnpm install --frozen-lockfile.

Container Images

Container images are notoriously non-reproducible. Building the same Dockerfile at two different times almost always produces different images because:

apt-get update fetches different package lists
apt-get install installs different package versions (unless pinned)
Layer timestamps differ
Base image tags (like ubuntu:22.04) point to different digests over time

Mitigations:

Pin base images by digest, not tag: FROM ubuntu@sha256:abc123...
Pin all OS package versions: apt-get install libssl3=3.0.2-0ubuntu1.10
Use docker build --build-arg SOURCE_DATE_EPOCH=0 and tools like crane to normalize timestamps
Consider Nix-based image builders (nixpacks) or Bazel's rules_oci for fully reproducible container images
Chainguard images provide reproducible, minimal base images with pinned package versions

Verification Workflows

Reproducibility is only useful if someone actually verifies builds. There are several verification models:

Independent Rebuilding

A third party takes your published source code and build instructions, performs the build independently, and compares the output hash to your published artifact. If they match, the artifact is verified. The Reproducible Builds project (reproducible-builds.org) has driven this approach for Debian packages.

CI-Based Verification

Run the build twice in your CI pipeline -- once to produce the release artifact, once to verify reproducibility. If both runs produce identical outputs, the build is reproducible. This does not prove the build was not tampered with (both runs could be compromised), but it establishes a baseline for third-party verification.

Attestation-Based Verification

Rather than reproducing the entire build, verify attestations about the build process. SLSA (Supply chain Levels for Software Artifacts) defines a framework for build provenance attestations. A SLSA Level 3 attestation cryptographically binds a build artifact to its source code, build process, and build environment. Tools like Sigstore and in-toto generate and verify these attestations.

This is a pragmatic middle ground. Full reproducibility is ideal but difficult to achieve universally. Attestations provide strong verification guarantees without requiring bit-for-bit identical builds.

The SBOM Connection

Reproducible builds and SBOMs are complementary. An SBOM tells you what components are in a build artifact. A reproducible build proves that the artifact was produced from those specific components. Together, they create an auditable chain from source to deployment.

If your build is not reproducible, your SBOM is an assertion -- you claim these are the components, but you cannot prove it. If your build is reproducible, your SBOM is verifiable -- anyone can rebuild from the declared components and verify the output matches.

How Safeguard.sh Helps

Safeguard.sh integrates build reproducibility into your supply chain security posture. The platform tracks build provenance alongside SBOMs, verifying that declared components match actual build inputs. When SLSA attestations are available, Safeguard.sh validates them automatically and flags artifacts that lack provenance documentation. For organizations working toward reproducible builds, Safeguard.sh identifies the specific sources of non-determinism in your build pipeline by comparing SBOM-declared dependencies against actual artifact contents, giving you a concrete roadmap to full reproducibility.

reproducible builds build security verification supply chain CI/CD

Back to all articles

More on #reproducible builds

View all →

DevSecOps

Reproducible Builds: The Gold Standard for Supply Chain Integrity

8 min read

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Build Reproducibility: A Verification Guide

What Makes Builds Non-Reproducible

Achieving Reproducibility by Language

C/C++

Java

Go

JavaScript/Node.js

Container Images

Verification Workflows

Independent Rebuilding

CI-Based Verification

Attestation-Based Verification

The SBOM Connection

How Safeguard.sh Helps

More on #reproducible builds

Reproducible Builds: The Gold Standard for Supply Chain Integrity

Related articles in Build Security

Software Provenance: An End-to-End Guide

Software Attestation Frameworks Compared: SLSA, in-toto, and Sigstore

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers