DevSecOps

npm Token Rotation: An Enterprise Strategy

Rotating a few npm tokens is easy. Rotating a few thousand across a shared CI fleet is a project. A practical strategy that survives real organizations.

Nayan Dey
Senior Security Engineer
7 min read

I once inherited a CI fleet with 2,148 npm tokens scattered across Jenkins credentials, GitHub Actions secrets, Kubernetes secrets, and, inexplicably, a shared Google Sheet. The rotation plan the previous security team had drafted was three pages of bullet points and had been in "draft" status for eleven months. Six weeks later, after a legitimate compromise of a single developer's laptop that turned out to be the catalyst, we rotated every token in the fleet in fourteen calendar days. The experience taught me what actually works in token rotation at enterprise scale, and what looks good on a whiteboard but disintegrates on contact with production.

Start With The Inventory

You cannot rotate what you cannot see. The first week of any rotation program is building a comprehensive inventory of every active npm token, where it is stored, what package scopes it can publish to, and who owns it. This is unglamorous and non-negotiable.

The sources I now check, in order: GitHub Actions secrets (org, repo, and environment scoped), Jenkins credentials store, Kubernetes secrets across every cluster (grep for _authToken), Vault, SSM Parameter Store, CircleCI contexts, and developer laptops. The last one is the hardest; the best you can do is a company-wide prompt to rotate, a deadline, and a list of packages that should alarm if an unexpected publish identity is detected.

For the npm-side view, the npm token list command surfaces classic tokens. Granular tokens, as of npm 10.5, do not fully enumerate via CLI and require the web UI or a direct call to https://www.npmjs.com/settings/<org>/tokens per account. Scripting this against organization accounts is painful but possible.

Classes Of Tokens

Not all tokens need the same rotation cadence. I bucket them into four classes.

Class A is publish tokens for high-download packages (top 1,000 by weekly downloads, roughly). These carry the highest blast radius. An attacker with a Class A token can attempt to ship malicious versions to millions of consumers. Rotate on 30-day cadence minimum, with 7-day cadence for the top 100.

Class B is publish tokens for internal or low-download public packages. Blast radius is bounded by the number of consumers. Rotate quarterly.

Class C is read-only tokens used by CI to install private packages. Compromise leaks source code. Rotate every 90 days and constrain network egress from the CI job that uses them.

Class D is developer tokens on personal machines. Rotate annually; rely on 2FA and short-lived session cookies rather than trying to enforce a tight rotation.

Granular access tokens enforce expirations at creation (max one year), which means Class A and B will aggressively expire on their own once migrated. Classic tokens do not, which is why the migration itself is a prerequisite to a working rotation program.

The Overlap Window

The single most important operational concept is the overlap window. During rotation, both the old and new tokens must be valid for some period. If you revoke before the new token is deployed, publishes fail. If you deploy the new token without the old one being revoked, a compromised old token is still live.

My rule: 24 hours of overlap for CI tokens, 7 days for tokens used by widely-distributed developer tooling where you cannot force a simultaneous update. Any longer than 7 days and people forget to complete the rotation.

Granular tokens support generating a new token without revoking the old one. Classic tokens also support this. What neither supports natively is pushing out the new token to every downstream consumer; that is your CI system's job. GitHub Actions supports organization secrets, which means a single update propagates to all repos using that secret. Jenkins requires per-job updates unless you've standardized on a shared credentials binding.

Automating The Deploy

I lean on a small script, run from a hardened bastion, that does four things in sequence: creates a new granular token via the npm web API, writes the new value to every downstream secret store that holds the old value, triggers a canary CI job that uses the new token to install a sentinel package, and only after that canary passes, schedules the old token for revocation 24 hours out.

The canary is non-negotiable. I have twice rotated tokens and discovered, through production pages, that one of the secret stores silently failed to update. The canary catches this before the old token is revoked.

Rotation In The Face Of Incidents

The 14-day rotation I mentioned earlier was triggered by a developer laptop compromise. The developer had .npmrc files for four different organizations on the laptop. The incident responder's working assumption was that every token those files touched was compromised.

The correct sequence for an incident rotation is different from a scheduled one. You revoke first and deploy second, accepting the downtime. An active adversary with a stolen token can exploit any overlap window, so the window must be zero. The cost is a publish outage of however long it takes to deploy new tokens everywhere, which for us was about four hours.

If your organization cannot tolerate four hours of publish outage during an incident, the answer is not a longer overlap window; it is OIDC trusted publishers, which have no long-lived token to compromise.

Metrics That Matter

Three numbers track rotation health in a real program.

First: median token age. This should trend down over the first quarter of the program and then stabilize. If it doesn't stabilize, you're creating tokens faster than you're expiring them.

Second: count of classic tokens still active. This should go to zero within the first six months. Any classic token alive at month seven is a lapse in the program.

Third: count of tokens without package scope. This is a specific flag: a granular token that grants publish on "all packages" is operationally classic-equivalent. I track this separately and drive it to zero.

I do not track "rotation success rate" because it is a gameable metric. You can get 100 percent by only rotating the easy ones.

What Doesn't Work

Three patterns I have seen fail.

A rotation calendar shared via wiki. It is accurate for three weeks and then drifts. Use a system of record that the actual token creation and revocation flow through.

"Self-service" rotation where each team is responsible for their own tokens. Some teams will be excellent; others will not rotate for years. The security organization owns the rotation, not the individual teams, even if teams operate the mechanism.

Rotation via dashboard notification. An engineer seeing a yellow warning next to a token does nothing. Rotation must be an enforced workflow with a deadline and an escalation when the deadline passes.

How Safeguard Helps

Safeguard maintains a live inventory of every npm token visible across GitHub, GitLab, Bitbucket, and the secret stores we integrate with, tagged by class and age, so you know what you have without running a weekend grep. Our policy engine enforces rotation deadlines by opening tickets in the owning team's Jira or ServiceNow queue as the deadline approaches, and escalates to security leadership if the deadline passes. For OIDC migration, Safeguard identifies which packages and workflows are eligible and generates the trusted-publisher configuration you need to apply. During incidents, Safeguard's rotation runbook automates the revoke-first sequence with a human approval gate, so you can move in minutes rather than coordinating across eight tools.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.