Best Practices

Build Server Compromise Investigation

A hands-on investigation guide for compromised build servers, from initial containment through rootkit checks and clean rebuild.

Nayan Dey
Senior Security Engineer
7 min read

A compromised build server is a nightmare with a long tail. Unlike a workstation, a build server holds signing keys, production deploy credentials, and the authority to ship code that your customers will trust by default. By the time you suspect a build server is compromised, the attacker has probably already used it to produce tainted artifacts. This post is the investigation sequence I run on suspected build server compromise, with the practical commands that survive the stress of 3 AM pages.

First Move: Isolate Without Destroying Evidence

The instinct is to shut the server down. Do not. A running compromised server with the network cut is an investigator's gift. A shut-down server is a pile of questions you can no longer answer with memory forensics.

Isolate with a network policy that blocks all egress except to your IR jumpbox. On AWS, this is a one-line security group change:

aws ec2 modify-instance-attribute --instance-id i-0abc123 \
  --groups sg-ir-isolation

The sg-ir-isolation group allows inbound SSH only from your bastion and denies all egress except to your evidence bucket and DNS to your internal resolver. Verify with ss -tan from the server that established connections have dropped. If the attacker has a persistent backdoor that reconnects, the isolation policy will show repeated outbound connection attempts — those logs are evidence in themselves.

Post an incident banner on the server's MOTD and job queue so that anyone logging in knows the machine is in IR. If it is a shared build server, pause the scheduler so no new jobs land on it.

Baseline Capture

Before you start poking, capture the baseline. This is the state that every subsequent action will be measured against. The capture has to cover five dimensions: running processes, open sockets, loaded kernel modules, filesystem snapshot, and configured cron/systemd units.

ps auxf > /ir/processes.txt
ss -tanp > /ir/sockets.txt
lsmod > /ir/modules.txt
systemctl list-units --all > /ir/units.txt
crontab -l -u root > /ir/root-cron.txt 2>/dev/null || true
for u in $(cut -d: -f1 /etc/passwd); do
  crontab -l -u $u > /ir/cron-$u.txt 2>/dev/null
done
find / -newer /etc/passwd -type f 2>/dev/null > /ir/newer-than-passwd.txt

The find query catches files modified after the system was built, which on a well-managed build server should be a small set — CI working directories and log files, mostly. Anything else needs explanation.

Memory and Disk Imaging

Capture memory next. On Linux, LiME works if you have kernel headers available; avx or AVML are good fallbacks:

sudo ./avml /ir/memory.lime
sha256sum /ir/memory.lime > /ir/memory.lime.sha256

For cloud-hosted build servers, the EBS snapshot is your disk image:

aws ec2 create-snapshot --volume-id vol-0abc123 \
  --description "IR-build-prod-3 $(date -u +%FT%T)"

Tag the snapshot with the incident ID so it does not get garbage collected by a cost cleanup script. I have lost evidence to cost cleanup more than once; tag everything.

Looking for Persistence

On a compromised build server, the attacker wants to maintain access even if the build job that planted the payload completes. Persistence mechanisms hide in a predictable set of places. Walk each one.

Systemd units. Look for recently created .service files, especially ones with ExecStart pointing at binaries in unusual paths:

find /etc/systemd /usr/lib/systemd -name '*.service' -newer /etc/os-release | \
  xargs -I{} sh -c 'echo "=== {} ==="; cat {}'

Cron and timers. Already captured in the baseline. Compare against a known-good build server if you have one.

SSH authorized_keys. Any key on a build server that is not part of your managed inventory is suspicious. Enumerate every user with a home directory and check:

for h in $(getent passwd | awk -F: '$3 >= 1000 {print $6}' /etc/passwd; echo /root); do
  if [ -f "$h/.ssh/authorized_keys" ]; then
    echo "=== $h/.ssh/authorized_keys ==="
    cat "$h/.ssh/authorized_keys"
  fi
done

Kernel modules. A rootkit at the kernel level will not show in lsmod if it is good, but the weight of evidence from dmesg, /proc/modules, and /sys/module often still reveals the tampering. Compare against /boot/initrd contents for the current kernel.

Container runtime. If the build server runs containers, the attacker may have planted a malicious image or modified the runtime config. Check /etc/docker/daemon.json, /etc/containers/, and crictl images for anything that was not deployed by your config management.

Build Job Forensics

The most useful evidence on a build server is the history of build jobs. Jenkins, GitLab Runner, GitHub Actions runners, Buildkite — all of them leave job logs and workspaces on disk. Walk the logs for the last 30 days and look for anomalies.

What does anomalous look like? A job that took 10x longer than baseline. A job whose workspace contains files outside the expected build output. A job whose environment variables include secrets that should not be in that job. A job that was triggered by an unusual actor or at an unusual time.

# Jenkins example
find /var/lib/jenkins/jobs -name build.xml -newer /etc/os-release | \
  xargs grep -l -E '(curl|wget|nc|bash -i)' | head

For each suspicious job, pull the full log, the workspace archive, and the build number. These go into evidence and become part of the timeline you will reconstruct.

Signing Key Investigation

Build servers often hold or can access code signing keys. This is the highest-stakes part of the investigation. Questions to answer with evidence:

Was the signing key on disk? If yes, compute its hash and compare to what you have in your secure enclave. Any difference is bad.

Was the signing key accessed during the suspect window? audit.log on the key file, if you had auditd configured, tells you every process that opened it. If you did not, the build job logs that invoked the signing step are the next-best source.

Were any artifacts signed that should not have been? Pull every signed artifact from the suspect window and match its signature timestamp to an authorized build job. Any signature without a matching authorized job is a candidate malicious signing event.

Rebuild Plan

A compromised build server cannot be cleaned; it has to be rebuilt. The rebuild plan includes: provisioning from a fresh golden image, re-enrolling in config management with a new machine identity, re-issuing all service account credentials, and re-verifying every artifact produced on the old server. The old server stays isolated as evidence until the investigation closes.

The hardest part of the rebuild is the build server's trust relationships. Every system that trusted the old server (registries, deploy targets, signing infrastructure) has to be updated to trust the new one and stop trusting the old one. Work that list before the cutover.

How Safeguard Helps

Safeguard watches build servers as first-class assets, correlating the artifacts they produce with the source commits and dependencies that went in. When a build server shows signs of compromise — unexpected artifact hashes, unauthorized signing events, or policy drift — Safeguard flags it and automatically enumerates every downstream artifact that was built in the suspect window. During investigation, the platform's immutable build provenance store gives you a verifiable record of what each build produced versus what was actually signed and shipped, which turns the signing key investigation from a forensic exercise into a query. For the rebuild phase, Safeguard's attestation tracking ensures the new server's trust relationships are established cleanly before the old one is decommissioned.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.