You get the advisory at 8:47 AM. A package in your stack — let's call it popular-parser@4.2.0-4.2.7 — shipped a post-install script that reads AWS credentials, GitHub tokens, and .env files, and exfiltrates them to an attacker-controlled endpoint. Every release in that version window is malicious. The legitimate maintainer regained control 40 minutes ago. You have 72 hours before the blast radius becomes a breach you have to notify customers about.
This is the playbook. Concrete steps, concrete timelines, concrete commands. It's written for the actual pattern we've seen across the ua-parser-js, event-stream, xz-utils, tj-actions/changed-files, and lottie-player incidents, not for a generic framework.
How do you confirm exposure in the first 30 minutes?
Inventory first, panic second. You need to answer one question: does this package, in this version range, exist anywhere in my organization — production, staging, developer laptops, CI caches, container registries, Nexus/Artifactory mirrors, customer-facing SDKs? An SBOM inventory makes this a query. Without one, you are grepping package-lock.json files across repositories and hoping nothing transitive slipped through.
If you have CycloneDX or SPDX SBOMs indexed, this is sbom-query --package popular-parser --version "4.2.0 - 4.2.7" and you get a list of every service, image, and build artifact within five minutes. If you don't, the fastest alternative is npm ls popular-parser across every repository and comparing the resolved versions — but this misses anything in containers you didn't build, vendor SDKs, and CI caches. Budget 20-30 minutes for the inventory step if it's automated; 4-6 hours if you're doing it by hand under pressure, which is exactly the wrong time to discover you needed an SBOM program.
The deliverable out of this step is an exposure list: every service that has the vulnerable version in its dependency closure, plus every build environment that resolved it during the compromise window. Note the window carefully — if the malicious versions were live between 2026-02-22T14:03:00Z and 2026-02-24T08:07:00Z, any build inside that window is suspect even if the lockfile now shows a clean version.
What does reachability tell you that a version check doesn't?
Reachability answers: "is the malicious code actually executed?" A version match says the vulnerable package is installed. Reachability says the call graph from your entry points reaches a function in the compromised version. For a post-install script compromise, reachability is less relevant — the malicious code ran at npm install time, not at application runtime. Assume every build environment that resolved a malicious version is compromised.
For a runtime-loaded compromise — the compromised function is called during request handling, not install — reachability is the difference between "rebuild everything" and "rebuild the three services that actually touch the affected code path." On the event-stream incident (2018), the malicious payload targeted a specific Bitcoin-wallet library and was a no-op for everyone else; reachability would have cut the remediation scope by more than 90%. For xz-utils (2024), the payload activated only in sshd contexts, making reachability the central triage question — a server without sshd linkage wasn't exploitable even with the malicious xz version installed.
Spend 30-60 minutes on reachability before declaring full rebuild scope. If you can't answer it quickly, default to the conservative "every affected service" scope and accept the overhead.
How do you contain the blast radius in hour 2?
Three simultaneous actions. First, block the malicious versions at the registry proxy — Artifactory, Nexus, or the internal npm mirror — so no further builds can pull them. A blocklist entry of popular-parser@>=4.2.0 <=4.2.7 on the proxy prevents rebuilds from silently re-introducing the compromise while you're remediating. Second, quarantine every build artifact produced inside the compromise window — tag the container images, don't delete them yet (you need them for forensics), and stop any automated promotion from staging to production. Third, and this is the step that gets skipped under pressure, rotate every secret that was present in a build environment during the window.
That third step is where the post-install-script compromise pattern hurts. If your GitHub Actions runners had GITHUB_TOKEN, NPM_TOKEN, AWS_* credentials, or Docker registry credentials in the environment during any build between the compromise start and your block, assume all of them are exfiltrated. Rotate the GitHub fine-grained PATs, rotate the npm tokens (use trusted-publisher OIDC for the replacement), rotate AWS access keys, and — easy to forget — rotate any service-account credentials that the runner had permission to kubectl get secret against. Budget 2-4 hours for secret rotation; it is the slowest containment step because every rotation has a blast radius of its own.
When can you call the environment clean?
After remediation is deployed and you've validated from two independent signals that the malicious code is gone. Remediation means upgrading to a patched version if one exists — popular-parser@4.2.8 in our hypothetical — or pinning to the last known-good version below the window (4.1.9). If neither exists, forking is a legitimate option; in the xz-utils response, several distros temporarily reverted to pre-5.6 versions while the upstream project was still being triaged. Whatever the replacement is, pin it explicitly in lockfiles, verify the hash against an out-of-band source (the GitHub release page, not just the registry), and redeploy every affected service.
Two independent validation signals: an SBOM diff showing the bad version is absent from every running artifact, and a runtime check — network egress monitoring to confirm no connections to the known exfiltration endpoints, process listing to confirm no persistence mechanism from the post-install script. If the advisory includes IoCs (indicators of compromise) like specific file paths, scheduled tasks, or outbound IPs, script the check and run it across every host that built during the window, not just the ones you think were affected.
The realistic clean-environment declaration timeline is 24-48 hours after the advisory, not 4-6. Teams that compress this aggressively tend to miss the secret-rotation step or the CI-cache purge, and those gaps become the next incident.
What should the retrospective actually produce?
A list of controls that would have shortened any phase of the response, ranked by cost. For most teams coming out of a first real supply chain IR, the top three are always: comprehensive SBOM coverage (so inventory is a query, not a grep), dependency pinning and lockfile commits across every repository (so "the compromise window" is a definable thing), and separation of build credentials from runtime credentials (so rotation scope is bounded). The next tier is reachability tooling, runtime egress monitoring, and a designated incident channel that doesn't require someone to remember the Slack name under pressure.
The retrospective output is not a document that gets filed. It is three to five specific engineering tickets with owners, deadlines, and "this would have saved us N hours in this incident" attached. If the retro doesn't produce those, the next incident will run on the same timeline as this one.
What should you do before the next advisory lands?
Run the playbook dry. Pick a real package in your stack — axios, lodash, requests, log4j-core, whichever — and answer: "if this shipped a malicious release in the next hour, how long would each phase take?" Measure, don't estimate. Teams that have never done the dry run consistently underestimate the inventory phase by an order of magnitude and overestimate their ability to coordinate secret rotation across teams. Thirty minutes of rehearsal saves six hours of improvisation the next time it's real.
How Safeguard.sh Helps
Safeguard.sh collapses the inventory phase from hours to seconds — SBOM ingestion and generation at 100-level dependency depth means a query against every service, image, and build artifact resolves before the IR bridge is even fully joined. Reachability analysis cuts the remediation scope by 60-80% by identifying which services actually execute the compromised function path, so you rebuild three services instead of three hundred. Griffin AI autonomous remediation opens pull requests across every affected repository simultaneously — pinning the patched version, updating lockfiles, and re-running CI with a clean builder — while container self-healing rolls running workloads back to known-good images the moment the compromise signal fires. The TPRM module extends the same response across vendor-supplied components, so a compromised SDK in a third-party service shows up in the same incident view as your own dependencies.