Most supply chain security discussions focus on source code -- reviewing commits, scanning dependencies, auditing build scripts. But the software you actually run is compiled binaries. And there's no guarantee that the binary matches the source code it claims to come from.
The SolarWinds attackers understood this. They injected their backdoor during the build process, so the source code was clean while the distributed binary was compromised. Source code review would have found nothing. Only binary analysis -- or reproducible builds -- could have detected the discrepancy.
The Source-to-Binary Gap
When source code is compiled, a lot happens:
- The compiler translates high-level code to machine instructions.
- The linker combines object files and libraries.
- Build scripts may inject version strings, timestamps, or configuration values.
- Optimizations may restructure the code significantly.
- Debug symbols may be stripped.
Each of these steps is an opportunity for tampering. A compromised compiler, a modified linker, a malicious build script -- any of these can inject malicious functionality into the binary without modifying the source.
Ken Thompson's famous 1984 paper "Reflections on Trusting Trust" demonstrated this concept: a compromised compiler could inject backdoors into programs it compiled, even when compiling its own replacement. The source code of the compiler would be clean, but the binary would perpetuate the backdoor.
Binary Analysis Techniques
Static Binary Analysis
Static analysis examines the binary without executing it. Techniques include:
Disassembly: Converting machine code back to assembly language using tools like Ghidra, IDA Pro, or Radare2. Disassembly reveals the actual instructions that will execute, regardless of what the source code says.
Control flow analysis: Mapping the program's execution paths. Unexpected branches, unreachable code that's still present, or unusual call patterns can indicate tampering.
String analysis: Extracting strings from the binary. Malicious implants often contain hardcoded URLs, IP addresses, encryption keys, or command strings.
Import/export analysis: Examining what system APIs and libraries the binary uses. A build tool that imports networking APIs when it shouldn't need network access is suspicious.
Entropy analysis: Measuring the randomness of binary sections. Encrypted or compressed payloads (common in malware) show high entropy compared to normal compiled code.
Signature matching: Comparing against known malware signatures or known-good patterns. YARA rules are commonly used for this purpose.
Dynamic Binary Analysis
Dynamic analysis executes the binary in a controlled environment and observes its behavior:
Sandboxed execution: Running the binary in a sandbox and monitoring system calls, file access, network connections, and process creation.
API monitoring: Hooking system APIs to log every call the binary makes. This reveals actual behavior that static analysis might miss, especially when the binary uses obfuscation.
Memory analysis: Examining the binary's memory at runtime. Self-modifying code, unpacked payloads, and runtime-decrypted strings become visible.
Network monitoring: Capturing all network traffic generated by the binary. Unexpected connections to external servers indicate potential backdoors or data exfiltration.
Differential testing: Running the binary with various inputs and comparing behavior against expected results.
Reproducible Builds
The gold standard for binary verification is reproducible builds -- ensuring that compiling the same source code with the same toolchain produces a bit-for-bit identical binary.
If a binary is reproducibly built, anyone can verify it by:
- Obtaining the source code.
- Using the documented build environment and toolchain.
- Building the binary.
- Comparing the output hash against the distributed binary.
A mismatch indicates tampering somewhere in the chain -- compromised source, compromised toolchain, or compromised distribution.
Projects with reproducible builds include Debian, Tor Browser, Bitcoin Core, and an increasing number of others. The Reproducible Builds initiative (reproducible-builds.org) tracks progress across the ecosystem.
Differential Binary Analysis
When you have multiple versions of a binary, differential analysis identifies what changed. Tools like BinDiff and Diaphora compare two binaries and highlight differences.
For supply chain verification, this is useful for:
- Comparing a vendor's release against a self-built version.
- Analyzing what changed between two release versions.
- Detecting unexpected modifications in patched binaries.
Practical Application to Supply Chain
Vendor Binary Verification
When you receive a binary from a vendor:
- Check signatures: Verify code signing signatures against the vendor's known public key.
- Compare against previous versions: Use differential analysis to see what changed.
- Scan for known malware: Run against antivirus and YARA rules.
- Analyze imports: Check if the binary uses APIs inconsistent with its stated purpose.
- Run in a sandbox: Execute in a controlled environment and monitor behavior.
- Verify reproducibility: If the vendor supports reproducible builds, rebuild and compare.
Container Image Analysis
Container images are binaries too. Analyzing container images involves:
- Examining each layer for unexpected files or modifications.
- Checking that base images match their expected hashes.
- Scanning all binaries within the image.
- Verifying that the image was built from the expected Dockerfile.
Firmware Analysis
Firmware is often distributed as binary blobs. Binary analysis is essential for:
- Extracting and analyzing the firmware file system.
- Identifying embedded credentials or backdoors.
- Verifying that firmware updates match expected contents.
- Detecting unauthorized modifications.
Challenges
Scale
Organizations consume thousands of binaries. Manual analysis of each one isn't feasible. Automated analysis tools help but require tuning to reduce false positives.
Obfuscation
Legitimate software may use obfuscation for intellectual property protection, making it harder to distinguish between intentional obfuscation and malicious hiding.
Toolchain Variability
Different compiler versions, optimization levels, and build environments produce different binaries from the same source. This complicates reproducible build verification.
Resource Requirements
Deep binary analysis requires specialized skills and expensive tools. Not every organization has a reverse engineering team.
Building a Binary Verification Program
Start with these steps:
- Inventory binaries: Know what compiled software you're running and where it comes from.
- Verify signatures: Check code signing on everything. Automate this.
- Implement reproducible builds: For software you build, make builds reproducible. For vendor software, request reproducible builds.
- Automated scanning: Use automated tools (Ghidra scripting, YARA rules, sandbox analysis) for initial triage.
- Prioritize analysis: Focus manual analysis on high-risk binaries -- those with broad access, those from less-trusted sources, and those that changed unexpectedly.
- Monitor behavior: Even after initial analysis, monitor binary behavior in production for anomalies.
How Safeguard.sh Helps
Safeguard.sh bridges the source-to-binary gap by providing comprehensive SBOM analysis that tracks components at both the source and artifact level. When binaries are produced by your build pipeline, Safeguard.sh catalogs their components and verifies them against expected manifests, catching discrepancies that indicate tampering. The platform's continuous vulnerability scanning covers compiled dependencies that might not be visible in source-level analysis, while policy gates ensure that only verified artifacts with complete provenance records reach production. By maintaining a clear chain from source through build to deployment, Safeguard.sh provides the verification framework that binary analysis alone can't sustain at scale.