Best Practices

Package Registry Forensic Log Analysis

Extracting investigative signal from package registry logs — publish events, download patterns, and account activity — during a supply chain incident.

When a package goes bad, the registry logs are the single best source of truth about what happened and when. The problem is that most teams have never pulled registry logs before the day they urgently need them, and the learning curve under incident pressure is brutal. This post is the cheat sheet I give new IR analysts on my team so that the first time they look at a packument history is not the morning of an active incident.

What a Registry Actually Logs

Every package registry records at least three classes of events: publish, yank (or unpublish), and install. The richness varies wildly. Public npm gives you publish metadata on the packument but no authoritative install logs unless you work at GitHub. Private Artifactory gives you everything including client IPs and request headers. PyPI gives you publish events via the JSON API and aggregated download stats via BigQuery. Maven Central gives you artifact upload times but not download logs.

The first step of any forensic analysis is to inventory what you actually have. I keep a small matrix per engagement:

Registry: npm public
Publish log: packument time map (authoritative)
Yank log: packument deprecated field (approximate)
Install log: not available publicly
Account log: /-/user/org.couchdb.user:$user (limited)

That matrix tells me what questions I can answer with evidence versus what questions I can only infer.

Extracting Publish Metadata

The packument is the registry's canonical record of a package. For npm it lives at https://registry.npmjs.org/<package> and contains a time object mapping versions to ISO timestamps. During an incident I pull the full packument and diff it against a historical snapshot if I have one:

curl -sH 'Accept: application/json' \
  https://registry.npmjs.org/suspect-pkg > /evidence/pkg-now.json
jq '.time' /evidence/pkg-now.json > /evidence/time-now.json
diff <(jq -S . /evidence/time-last-week.json) \
     <(jq -S . /evidence/time-now.json)

If you do not have historical packuments, the Wayback Machine and the libraries.io historical API sometimes have them. Do not rely on the current packument alone, because a malicious maintainer can rewrite timestamps in some cases by republishing with version manipulation.

For PyPI, the JSON endpoint at /pypi/<project>/json includes upload_time_iso_8601 for each file. Pull every release and record the uploader via releases[version][n].uploaded_by if available through the legacy XML-RPC interface.

Account Activity

Publish events without account context are half a story. The question you are trying to answer is not "when was this version published" but "who published it and from where." Public registries expose very little here by default. If you are the maintainer or the registry operator, you have more.

For npm maintained packages, the npm owner ls <package> command plus npm profile get will tell you who currently has publish rights. For a real account compromise investigation, you will need npm support to pull account login history, IP addresses, and token usage. Start that request early, because it can take days.

For private registries, this data is usually sitting in your SSO logs. I pull Okta system logs for the publisher account and correlate to the publish timestamp:

okta-admin logs -q 'actor.alternateId eq "maintainer@example.com" \
  and published gt "2024-07-20T12:00:00Z"' | \
  jq '.[] | {time: .published, event: .eventType, ip: .client.ipAddress}'

A publish event at 14:22 UTC from a Lagos IP when your maintainer lives in Berlin is the kind of smoking gun that ends investigations quickly.

Download Patterns

Download logs tell you two things: how widely the compromise spread, and sometimes who else cared about this package in a suspicious way. For npm, the public download stats API is rate-limited but good enough for scope:

curl -s 'https://api.npmjs.org/downloads/range/2024-07-01:2024-07-31/suspect-pkg' | \
  jq '.downloads | group_by(.day[0:7]) | map({month: .[0].day[0:7], total: map(.downloads) | add})'

For private registries, the ingress logs are your friend. ALB access logs, CloudFront logs, or the registry's own request logs contain per-request detail. Ship them to a SIEM and query with something like:

index=artifactory sourcetype=access path="*suspect-pkg*" \
  | stats count by src_ip, user_agent, requested_version
  | sort - count

Watch for two anomalies in particular. First, a sudden burst of fetches from an IP or ASN you do not recognize — that is often the attacker validating their malicious publish. Second, fetches that predate the public announcement of the compromise — those are often insiders testing their own malware.

Yank and Republish Patterns

A clever attacker will sometimes yank the malicious version and republish a clean one under the same version string, hoping your caches have already been poisoned and your detections key off version numbers. The packument's deprecated field and the time.modified entry are the forensic hooks here. Also compare artifact hashes across every snapshot you have — if two snapshots report the same version with different tarball hashes, you have a republish event that is worth escalating.

shasum -a 256 /evidence/snapshot-day1/*.tgz | sort > /tmp/day1.hash
shasum -a 256 /evidence/snapshot-day2/*.tgz | sort > /tmp/day2.hash
diff /tmp/day1.hash /tmp/day2.hash

Writing It Up

Registry log analysis produces a lot of data. Most of it will not make the final report. My rule is that every claim in the summary must be backed by a specific log line that I can copy verbatim into the appendix. If I cannot produce the log line, I cannot make the claim. That discipline keeps me honest during long engagements when memory starts to fog.

When writing the report, lead with the publish timeline, then the account activity, then the distribution scope. That is the order executives and regulators want to read it. Engineers will flip to the queries in the appendix so they can rerun them against their own environments — give them copy-pastable commands, not screenshots.

How Safeguard Helps

Safeguard continuously ingests registry metadata for every ecosystem your organization consumes, so the packument snapshots you wish you had taken last week are already in the platform's evidence store. During an investigation, Safeguard surfaces publish anomalies — unusual maintainer IPs, rapid version churn, hash drift between snapshots — without you having to write the queries by hand. The platform correlates registry publish events against your build ingestion logs, so the gap between "published" and "pulled into your pipeline" is always a visible number instead of a forensic exercise. For teams without a dedicated threat hunting practice, that correlation alone turns a two-day investigation into a two-hour one.

Forensics Registries Log Analysis

Back to all articles

More on #Forensics

View all →

Best Practices

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

Package Registry Forensic Log Analysis

What a Registry Actually Logs

Extracting Publish Metadata

Account Activity

Download Patterns

Yank and Republish Patterns

Writing It Up

How Safeguard Helps

More on #Forensics

Build Server Compromise Investigation

Dependency Compromise Timeline Reconstruction

Supply Chain Incident Forensics Playbook

Software Supply Chain Forensics: Investigation Techniques After a Compromise

Related articles in Best Practices

Open Source vs Commercial Security Scanners 2026

Buyer Guide: Software Supply Chain Security 2026

Best Secret Scanning Tools 2026 Comparison

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers