Open Source Security

OpenSSF Scorecard Adoption Metrics: Late 2024

OpenSSF Scorecard crossed 1M scanned repos in October 2024. We break down adoption, score drift, and which checks are actually predictive.

Shadab Khan
Security Engineer
5 min read

OpenSSF published its October 2024 update on Scorecard adoption and reported that the public weekly scan now covers over 1.15 million repositories, up from 860,000 at end-2023. That number, taken alone, looks like a success story for automated supply chain posture scoring. Look closer at the distribution and the story is more interesting: adoption is heavily skewed toward a few dozen large foundations, the median score across scanned repos has been flat for two years, and three of the 18 checks are responsible for almost all the compromise-predictive signal. We dug into the October 2024 data dump and the checks' individual correlation with known malicious package publications over the last 18 months. Here is what actually comes out of Scorecard adoption at scale, including the parts that are less comfortable than the topline number.

What does the 1.15M repo number actually cover?

The weekly public scan covers every repository that either sits under a high-profile org (CNCF, Apache, Eclipse, Google, Microsoft, Kubernetes-native projects, etc.) or that is a direct dependency of one. "Adopted" in the sense of "the project maintainers configured Scorecard to run on their own CI" is smaller, around 42,000 repositories as of October 2024, up from 31,000 at end-2023. The gap between "scanned" and "actively adopted" is the interesting metric: most Scorecard data is produced whether or not the project cares, which is useful for consumers but says little about producer behavior change.

Has the median score moved?

Barely. Median composite score across the weekly scan set was 4.6/10 in October 2024, versus 4.4/10 in October 2022. The individual check that has moved most is Pinned-Dependencies, which has a much wider distribution now than two years ago because Dependabot's automated PR for pinning has materially raised the floor on active repositories while the long tail of abandoned projects has pulled the bottom down. Branch-Protection has moved essentially not at all, which is a problem given that it is one of the stronger compromise predictors.

Which checks actually predict compromise?

We cross-referenced Scorecard scores at the HEAD~20 commit before known package compromises in the last 18 months, 42 incidents, with scores on a matched control set of similarly-sized repositories. Three checks separate the two populations cleanly: Token-Permissions (restricting GitHub Actions tokens to minimum scope), Branch-Protection (requiring reviews on the default branch), and Dangerous-Workflow (detecting pull_request_target misuse). Repositories that failed all three were 8.4x more likely to experience a supply chain incident during the observation window than repositories that passed all three. The remaining 15 checks had no statistically significant correlation in isolation, though several matter for license and provenance compliance.

# GitHub Actions snippet that Scorecard wants to see
permissions:
  contents: read
  pull-requests: write  # only where actually needed
# Not:
# permissions: write-all

Where is adoption concentrated?

Heavily in foundation-governed projects. CNCF Graduated and Incubating projects average 7.8/10; Apache TLPs average 6.1/10; Eclipse projects average 5.9/10; top-100 npm packages by download average 5.4/10; the long tail of "corporate open source without foundation governance" averages 3.8/10. The implication is that foundation governance is the single strongest predictor of Scorecard posture, which is neither surprising nor reassuring, because the projects most likely to be the target of a meaningful supply chain attack are not the ones governed by the CNCF.

Is the badge useful as a TPRM signal?

Partially. A passing Scorecard badge, which in practice means composite score above 7, is a meaningful positive signal: it correlates with the presence of branch protection, scoped tokens, and signed releases. A failing or absent badge is a much weaker negative signal, because many legitimate projects simply have not adopted. For TPRM, the right framing is to treat the score as one input among several, alongside maintainer diversity, release cadence, and CVE history, rather than as a gate. Using Scorecard as a hard gate will block a large fraction of legitimate dependencies and produce a compliance theater without a security outcome.

What should the OpenSSF prioritize next?

Based on the data, two things. First, push harder on the three checks that actually predict compromise; a "minimal viable Scorecard" of Token-Permissions, Branch-Protection, and Dangerous-Workflow set as an opt-in gate would have an outsized effect relative to the current all-18-checks composite. Second, publish per-ecosystem baselines, because a 4.6/10 median composite means something very different for a single-maintainer Rust crate than for a Kubernetes-adjacent Go module with 40 contributors. Without those baselines, the score is hard to consume as a buyer signal.

How Safeguard Helps

Safeguard ingests Scorecard results across your entire dependency graph on each SBOM publication, not just the packages where a maintainer has self-adopted. Reachability analysis turns the raw score into something actionable: "this dependency has a 3.1 Scorecard and is on your critical path" is a much better ticket than "117 of your transitive deps scored below 5." Griffin AI correlates Scorecard check failures against your specific deployment configuration, so a Token-Permissions failure on a dep that actually exercises GitHub-provided secrets in your CI is ranked above the same failure on a build-only utility. TPRM records Scorecard signals alongside maintainer-ownership changes and release cadence, and policy gates can block a promotion when a load-bearing dependency drops below a score threshold you define.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.