The SolarWinds attack demonstrated a fundamental problem with modern software distribution: users had no way to independently verify that the binary they received corresponded to the source code the vendor published. The attackers injected malicious code during the build process, and the resulting binary was signed and distributed through normal update channels. The source code in version control was clean. The build artifact was compromised.
Reproducible builds solve this problem. If you can take the same source code, the same dependencies, and the same build environment and produce a bit-for-bit identical output, then anyone can verify that a binary was built from a specific set of sources. If the outputs differ, something changed -- and that something needs investigation.
What Makes Builds Non-Reproducible
Most build processes are not reproducible by default. The same source code built on two different machines, or even on the same machine at two different times, typically produces different binaries. The differences come from several sources:
Timestamps. Compilers, linkers, and archive tools embed build timestamps in their output. A build at 10:00 AM produces a different binary than the same build at 10:01 AM. This is the most common source of non-reproducibility and the easiest to fix.
File ordering. When a build tool processes files in a directory, the order depends on the filesystem. Different filesystems (ext4 vs. NTFS vs. APFS) return directory listings in different orders, producing different output even from identical inputs.
Randomization. Some compilers use randomized algorithms for optimization decisions, hash table layouts, or symbol ordering. These produce valid but non-identical outputs across runs.
Absolute paths. Many build tools embed the absolute path of source files in debug information, error messages, or metadata. Building in /home/alice/project produces different output than building in /home/bob/project.
Environment variables. Build tools may capture environment variables (locale, timezone, username) and embed them in output.
Non-deterministic dependency resolution. If dependencies are not pinned to exact versions with integrity hashes, different builds might use different dependency versions.
Compiler version differences. Even minor compiler version changes can alter code generation, producing functionally identical but byte-different binaries.
Achieving Reproducibility by Language
C/C++
C and C++ builds are among the hardest to make reproducible because of the complexity of the toolchain.
Use -Werror=date-time to catch accidental use of __DATE__ and __TIME__ macros. Better yet, use SOURCE_DATE_EPOCH -- when this environment variable is set, GCC and Clang use its value instead of the current time for __DATE__ and __TIME__.
Use -ffile-prefix-map to strip or normalize source directory paths from debug info and error messages. This replaces absolute paths with a canonical prefix.
Stabilize link order. Ensure that object files are passed to the linker in a deterministic order. This usually means sorting file lists explicitly rather than relying on glob expansion.
Pin the toolchain. Use a containerized build environment with a specific compiler version, system libraries, and tool versions. Nix, Guix, and Docker all work for this purpose.
Java
Java builds have their own reproducibility challenges, primarily around JAR file construction.
JAR file metadata. JAR files are ZIP archives, and ZIP entries contain timestamps and file ordering. Use the --reproducible flag in modern versions of jar, or configure Maven/Gradle to normalize ZIP entries.
Maven reproducible builds. As of Maven 3.9.x, add project.build.outputTimestamp to your POM to enable reproducible builds. Gradle supports reproducibility through the reproducibleFileOrder and preserveFileTimestamps settings in archive tasks.
Annotation processors. Some annotation processors generate code with timestamps or random identifiers. Audit your annotation processors and configure them for deterministic output.
Go
Go is one of the more reproducible languages out of the box. Since Go 1.13, builds are mostly reproducible if you control for:
GOPATHand module cache location (use-trimpathto strip local paths)- Go toolchain version (pin it in
go.modwith thetoolchaindirective) - CGo usage (CGo invokes the system C compiler, reintroducing all C/C++ reproducibility issues)
- Build tags and
GOOS/GOARCHenvironment variables
JavaScript/Node.js
Node.js applications are typically distributed as source bundles rather than compiled binaries, which sidesteps many reproducibility issues. However:
Webpack/esbuild/Rollup output may include timestamps, chunk hashes based on content plus build metadata, or randomized module IDs. Configure your bundler for deterministic output.
node_modules layout. Different versions of npm, yarn, and pnpm produce different node_modules layouts from the same lock file. Pin your package manager version (e.g., via Corepack) and use npm ci or pnpm install --frozen-lockfile.
Container Images
Container images are notoriously non-reproducible. Building the same Dockerfile at two different times almost always produces different images because:
apt-get updatefetches different package listsapt-get installinstalls different package versions (unless pinned)- Layer timestamps differ
- Base image tags (like
ubuntu:22.04) point to different digests over time
Mitigations:
- Pin base images by digest, not tag:
FROM ubuntu@sha256:abc123... - Pin all OS package versions:
apt-get install libssl3=3.0.2-0ubuntu1.10 - Use
docker build --build-arg SOURCE_DATE_EPOCH=0and tools likecraneto normalize timestamps - Consider Nix-based image builders (nixpacks) or Bazel's
rules_ocifor fully reproducible container images - Chainguard images provide reproducible, minimal base images with pinned package versions
Verification Workflows
Reproducibility is only useful if someone actually verifies builds. There are several verification models:
Independent Rebuilding
A third party takes your published source code and build instructions, performs the build independently, and compares the output hash to your published artifact. If they match, the artifact is verified. The Reproducible Builds project (reproducible-builds.org) has driven this approach for Debian packages.
CI-Based Verification
Run the build twice in your CI pipeline -- once to produce the release artifact, once to verify reproducibility. If both runs produce identical outputs, the build is reproducible. This does not prove the build was not tampered with (both runs could be compromised), but it establishes a baseline for third-party verification.
Attestation-Based Verification
Rather than reproducing the entire build, verify attestations about the build process. SLSA (Supply chain Levels for Software Artifacts) defines a framework for build provenance attestations. A SLSA Level 3 attestation cryptographically binds a build artifact to its source code, build process, and build environment. Tools like Sigstore and in-toto generate and verify these attestations.
This is a pragmatic middle ground. Full reproducibility is ideal but difficult to achieve universally. Attestations provide strong verification guarantees without requiring bit-for-bit identical builds.
The SBOM Connection
Reproducible builds and SBOMs are complementary. An SBOM tells you what components are in a build artifact. A reproducible build proves that the artifact was produced from those specific components. Together, they create an auditable chain from source to deployment.
If your build is not reproducible, your SBOM is an assertion -- you claim these are the components, but you cannot prove it. If your build is reproducible, your SBOM is verifiable -- anyone can rebuild from the declared components and verify the output matches.
How Safeguard.sh Helps
Safeguard.sh integrates build reproducibility into your supply chain security posture. The platform tracks build provenance alongside SBOMs, verifying that declared components match actual build inputs. When SLSA attestations are available, Safeguard.sh validates them automatically and flags artifacts that lack provenance documentation. For organizations working toward reproducible builds, Safeguard.sh identifies the specific sources of non-determinism in your build pipeline by comparing SBOM-declared dependencies against actual artifact contents, giving you a concrete roadmap to full reproducibility.