In April 2023, security researcher Darcy Clarke publicly disclosed a vulnerability he called "manifest confusion" in the npm registry. The issue was deceptively simple: the metadata that npm displays about a package (the "manifest") can differ from the actual package.json inside the package tarball. This means the dependencies, scripts, and other properties you see on npmjs.com might not match what actually gets installed on your system.
This wasn't a bug that was just discovered. Clarke had reported it to GitHub (npm's parent company) in November 2022. Five months later, it still wasn't fixed. That's when he went public.
The Technical Problem
When a package is published to npm, two pieces of information are stored:
- The manifest: Metadata extracted and stored separately in the registry database. This is what
npm infoshows, what the npm website displays, and what many security tools analyze. - The tarball: The actual package files, including the real
package.json.
The critical flaw is that npm doesn't validate that these two match. A package author can publish a tarball where the package.json inside contains different information than what's recorded in the registry manifest.
What This Enables
Hidden dependencies: The manifest might show zero dependencies, while the actual package.json inside the tarball specifies a dozen — including malicious ones. Security tools that check the manifest would see a clean package, but npm install would install all the hidden dependencies.
Hidden install scripts: The manifest could omit preinstall or postinstall scripts, while the actual package contains them. Tools designed to flag packages with install scripts would miss these.
Version mismatches: The manifest could claim one version of a dependency while the actual package.json specifies a different (potentially vulnerable) version.
License discrepancies: The manifest could show an MIT license while the actual package is under GPL, creating compliance risks.
Why Security Tools Were Blind
Most npm security tools operated on the manifest data because:
- It's faster — no need to download and extract the tarball
- It's the "official" metadata from the registry
- The npm API returns manifest data by default
- Nobody expected the manifest and tarball to differ
This meant that npm audit, Snyk, Socket, and other security tools could potentially miss hidden dependencies or scripts in any package that exploited this confusion. The entire security tooling ecosystem was building on an assumption that turned out to be false.
The Attack Surface
Consider a scenario where an attacker publishes a package with:
Manifest (what security tools see):
{
"name": "useful-utility",
"version": "1.0.0",
"dependencies": {},
"scripts": {}
}
Actual package.json (what gets installed):
{
"name": "useful-utility",
"version": "1.0.0",
"dependencies": {
"malicious-package": "^1.0.0"
},
"scripts": {
"postinstall": "node steal-credentials.js"
}
}
Security scanners checking the manifest would see a clean package with no dependencies and no scripts. But when a developer runs npm install useful-utility, they'd get the hidden dependency and the postinstall script would execute.
The Scope of the Problem
Clarke's research found that this wasn't theoretical. He analyzed the npm registry and found:
- Over 2 million packages had some form of discrepancy between their manifest and their tarball
- Many discrepancies were benign (minor formatting differences), but the mechanism for exploitation was clearly available
- There was no systematic way to identify which discrepancies were intentional and malicious versus accidental
The 2 million number doesn't mean 2 million malicious packages. Most discrepancies were likely from build tools or publishing workflows that inadvertently modified the manifest. But the fact that the discrepancy was possible — and invisible to security tools — was the real problem.
npm's Response (or Lack Thereof)
What frustrated the security community most was the timeline:
- November 2022: Clarke reports the issue to GitHub/npm
- Early 2023: No fix deployed despite ongoing communication
- April 2023: Clarke publishes the vulnerability publicly
- Post-disclosure: GitHub acknowledged the issue and began working on validation
The delay was particularly concerning because the fix seemed straightforward: validate that the manifest in the registry matches the package.json in the tarball. The difficulty lay in backward compatibility — strict validation could break existing packages that had benign discrepancies.
Industry Implications
Trust Model Breakdown
The npm registry's implicit trust model is: "the metadata we show you accurately represents the package." Manifest confusion broke that model. If you can't trust the registry metadata, you can't trust any security analysis based on that metadata.
Cascading Tool Impact
Every tool in the npm ecosystem that reads manifest data was affected:
- npm audit: Might miss vulnerable dependencies hidden in the tarball
- Lockfile generators:
package-lock.jsonis generated from manifest data - Security scanners: SCA tools using registry APIs could be bypassed
- License checkers: License compliance tools could be fooled
Broader Ecosystem Questions
If npm had this issue, what about other registries? PyPI, RubyGems, Maven Central, and others all have their own metadata models. The manifest confusion disclosure prompted security researchers to audit other registries for similar discrepancies.
Protecting Your Projects
Verify Tarball Contents
Don't rely solely on npm registry metadata. Download and inspect the actual tarball contents for critical dependencies:
npm pack package-name
tar -xzf package-name-1.0.0.tgz
cat package/package.json
Use Lockfiles
package-lock.json records the actual resolved dependencies at install time, providing a more accurate picture than the manifest alone.
Monitor for Discrepancies
Tools began updating to compare manifest and tarball data. Use tools that perform this comparison.
Pin Dependencies
Use exact versions rather than ranges to limit the window for manifest confusion attacks.
How Safeguard.sh Helps
Safeguard.sh addresses the trust gap exposed by manifest confusion:
- Tarball-Level Analysis: Safeguard.sh analyzes actual package contents, not just registry metadata, ensuring that hidden dependencies and scripts are detected regardless of manifest discrepancies.
- Dependency Verification: Safeguard.sh verifies that the dependencies actually installed match what's declared, catching discrepancies between manifests and package contents.
- SBOM Accuracy: Safeguard.sh generates SBOMs from actual installed packages, not from registry metadata, ensuring your software inventory reflects reality.
- Behavioral Analysis: Beyond static manifest checking, Safeguard.sh analyzes package behavior to identify suspicious patterns like hidden network calls or credential access.
Manifest confusion demonstrated that supply chain security can't be built on unverified assumptions about metadata accuracy. You need to verify what's actually in the packages you install, not what the registry says is there.