Every AppSec team eventually realizes their dependency CVE response is a pile of tribal knowledge living in three people's heads. When one of those people leaves, the response time doubles for six months until the new muscle memory develops. Runbooks are the fix, but most runbooks we have reviewed are either too abstract to execute at 2am ("assess impact and remediate") or too narrow to survive contact with the next CVE ("run command X on file Y"). The runbooks below sit in the middle: they prescribe roles, timing, and the specific queries to run, but they are generalized across ecosystems so one runbook covers most npm disclosures, one covers Python, and one covers container base images. We run these at a logistics SaaS with 120 microservices across Go, Python, and TypeScript, and they held up through 23 disclosure events in the last 12 months with a median triage-to-close of 4.2 days for highs.
What belongs in every disclosure runbook regardless of ecosystem?
Every runbook has the same seven sections: trigger conditions, roles, 0-1 hour triage checklist, 1-24 hour remediation path, communications template, verification steps, and closure criteria. Standardizing the structure means any on-call engineer can pick up any runbook and know where to look; the ecosystem-specific content lives inside those sections.
Trigger conditions are explicit: "this runbook activates when a CVSS 7.0+ CVE is published against any package in the company's SBOM catalog." Roles name the on-call triage engineer, the affected product's tech lead, and the AppSec reviewer. The 0-1 hour checklist is 5-8 concrete steps; the 1-24 hour path branches based on whether a fixed version exists. Communications templates are pre-written; the engineer fills in blanks, they do not draft from scratch at midnight.
What does the npm disclosure runbook look like?
The npm runbook triggers on GitHub Security Advisory publication or Snyk vuln DB entry for a package in the lockfile inventory. The 0-1 hour checklist: confirm the advisory source, pull the current lockfile occurrences (target: under 60 seconds via SBOM query), identify which services use the affected version range, run reachability check to separate direct from transitive-only usage, determine whether a fixed version is published, and assign a product owner for each affected service.
The 1-24 hour remediation path branches on whether a fixed version exists. If yes, the runbook prescribes npm update with the specific range, test suite execution, and PR creation with a pre-filled template referencing the CVE. If no fixed version, the branches are: override to a forked build (rare, requires AppSec approval), accept risk with documented compensating control (WAF rule or feature flag), or take the feature offline temporarily. Time-box each decision to 4 hours; escalate to AppSec lead at that mark if unresolved.
How does the Python runbook differ?
The Python runbook looks structurally identical but has three ecosystem-specific wrinkles. First, pip's resolution is less deterministic than npm's, so the first triage step is to pin the current resolved versions (via pip freeze) before attempting remediation; this prevents accidental upgrades of unrelated packages during the fix. Second, many Python vulnerabilities are in C extensions, where the fix may require recompilation against a newer system library; the runbook calls out checking the container base image and rebuilding from source.
Third, Python has the highest rate of abandoned-maintainer events in our telemetry (roughly 2.5x the npm rate), so the runbook includes a check for maintainer activity: if the package has not seen a commit in 18 months and the CVE is high or critical, the remediation path includes "evaluate replacement" as a parallel track even if a patched version exists. Budget 90 minutes of analyst time for the replacement evaluation; the outcome feeds into the quarterly dependency health review.
What about container base image disclosures?
Container base image runbooks are owned by Platform Security rather than AppSec, and the 0-1 hour triage focuses on blast radius across the fleet, not a single service. Step one is the registry query: how many images reference the affected base image tag, across how many services, in which environments. A single base image CVE typically affects 40-200 services at our scale, so parallelization matters.
The remediation path for base images has three modes. Rolling rebuild is the default: update the base image tag, trigger CI rebuilds across all affected services, merge in batches of 10-20 services per hour to preserve CI capacity. Hot-patch-in-place with a runtime agent is an interim mitigation only, used when the rebuild will take more than 48 hours and the CVE is actively exploited. Base image pin rollback is the emergency brake if the new base image breaks too many services; it buys 12-24 hours to cherry-pick the security fix into the old base.
How do you plug runbooks into automation?
Runbooks become executable when each step has a corresponding CI job, chatops command, or API call. In our stack, the 0-1 hour triage is fully automated: a new CVE triggers a webhook, a scanner runs the reachability and inventory queries, a Slack message posts with the affected services and draft PRs already linked, and the on-call engineer's job is to review and approve, not to execute.
Not every step should be automated. Decision points (accept risk, take offline, swap library) stay manual with explicit human approval captured in the runbook record. The rule we follow: automate retrieval and analysis, keep judgment with humans, automate execution after approval. This keeps audit trails clean and prevents the runbook from becoming a black-box pipeline that nobody understands when it breaks.
How do you keep runbooks fresh?
Runbooks rot if not exercised. We run two maintenance cadences: a quarterly runbook review in the AppSec retro where each runbook is walked by a different team member (not the author) to catch drift, and an annual tabletop exercise simulating a high-severity disclosure. The tabletop is run in a dedicated Slack channel over 90 minutes with an injected scenario and a live facilitator; it consistently finds 3-5 action items per session.
Version runbooks in git, treat them like code, require a pull request and an AppSec reviewer for any edit. Changes to runbooks should be announced in the weekly supply chain standup; silent edits will cause confusion at 2am when the on-call reaches for a step that has moved.
How Safeguard Helps
Safeguard wires directly into these runbooks at the automation layer. When a CVE is disclosed, Safeguard's compromised packages ingest fires within minutes, reachability analysis via Griffin AI filters the affected service list down to those with actual exploitable paths, and SBOM queries return the exact version-to-service mapping that the runbook's 0-1 hour triage requires. Pre-filled PRs with the remediation command are generated automatically, ServiceNow or Jira tickets are created with the product owner pre-assigned, and policy gates prevent any re-introduction of the vulnerable version during the remediation push. TPRM integration flags whether a vendor is upstream of the disclosure, adding the vendor contact steps to the runbook execution record without manual lookups.