Open Source Security

OpenSSF Scorecard v6 and the OSPS Baseline: Turning Probe Evidence Into Registry Trust Signals

The Scorecard v6 roadmap introduces conformance labels (PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED) layered over the same probe evidence, aligning Scorecard output with the OSPS Baseline for registry-side trust decisions.

Michael
DevSecOps Architect
7 min read

OpenSSF Scorecard has been the canonical "is this open-source project following basic security hygiene" signal since 2020. By 2026 it is in transition: the v6 roadmap, proposed in scorecard PR #4952, introduces a conformance evaluation layer that produces PASS/FAIL/UNKNOWN/NOT_APPLICABLE/ATTESTED labels on top of the same probe evidence Scorecard already collects. The aim is to align Scorecard output with the Open Source Project Security Baseline (OSPS Baseline) so registry operators, downstream consumers, and policy engines can read a single coherent trust signal instead of a per-check numeric score that requires expert interpretation. This post walks through what v6 changes from a defender perspective, how the OSPS Baseline alignment works, and how to consume the signal in CI and registry policy.

What did the v6 roadmap actually propose?

Three changes anchor the v6 design. The first is parallel evaluation: check scores from 0 to 10 continue to exist for backward compatibility, but they are accompanied by conformance labels evaluated in a single probe run. The second is the OSPS Baseline conformance engine, which takes the same probe evidence and maps it against the maturity-leveled OSPS Baseline controls (the OpenSSF baseline document organized by category and maturity level published in February 2025). The third is the OSPS output format, a versioned schema that downstream tools can consume directly, with control-to-probe mapping files that make the mapping auditable. The applicability engine recognizes when a control is not applicable to a given project (for example, a project without any CI pipeline cannot meaningfully pass CI-related controls) and labels them NOT_APPLICABLE rather than failing them.

How was the change coordinated with registry operators?

The v6 design has been worked through public roadmap discussion since mid-2025, with input from the Securing Software Repositories WG, the OSPS Baseline authors, and several registry operators. The Microsoft .NET team published "OpenSSF Scorecard for .NET and the NuGet ecosystem" describing how NuGet consumers can use Scorecard scores to evaluate package dependencies, which is the consumer side of the same conversation. The OpenSSF Scorecard v5.1 release in 2024 brought Azure DevOps support, and the v5 release before that broadened the platform coverage; v6's focus is on the output and consumption side rather than additional source platforms. The registry-side coordination matters because registries are increasingly looking at Scorecard or OSPS Baseline scores as one input among many in their "is this package trustworthy" decision; a project below a defined threshold may not be quarantined, but it may be flagged in the registry UI or excluded from "verified" tiers.

What signals can a consumer read with v6?

A defender consuming Scorecard v6 sees three useful artifacts per project. The first is the OSPS Baseline conformance report, which lists each Baseline control and its evaluation: PASS, FAIL, UNKNOWN, NOT_APPLICABLE, or ATTESTED. The ATTESTED label is new in v6 and indicates that the project has produced a signed attestation about a particular control's status, which is a stronger signal than the probe-based evaluation alone. The second is the per-check score, which remains useful for trend analysis. The third is the probe-level evidence, which is the raw data the conformance engine acts on; for sophisticated consumers, the probe evidence allows custom policy logic beyond what the conformance labels express. The OSPS output format is consumable by registry tooling, internal policy engines, and dependency-review pipelines, which makes "minimum Scorecard threshold for tier-one dependencies" a much easier policy to write than it was under v5.

How do you actually use the signal in CI and policy?

The Scorecard CLI and the Scorecard GitHub Action both support v6 evaluation. The CLI emits OSPS-format output that policy tools can consume directly.

# Run Scorecard v6 against a dependency and emit OSPS conformance output
scorecard --repo github.com/upstream-org/critical-package \
  --osps-baseline \
  --format osps \
  --output reports/critical-package-osps.json

# Apply a per-org policy threshold to the conformance report
python tools/check_osps_threshold.py \
  --report reports/critical-package-osps.json \
  --min-level "maturity-level-2" \
  --require-attested "code-review,sbom-published"

For policy gating, the GitHub Action's policy-file parameter lets you encode "fail if any tier-one dependency has a PASS rate below threshold" without writing custom orchestration. Per-org policy can also be expressed in OPA/Rego or in the policy engines that Sonatype Nexus Firewall, JFrog Curation, and similar tools already support, with the OSPS output format as the common input.

What policy gate catches the "trust degradation" class going forward?

Three gates align with the v6 conformance model. Gate one is "minimum OSPS Baseline maturity level for any tier-one dependency," set per-org to a level that reflects the actual risk tolerance for headline names. Gate two is "alert on any decrease in OSPS conformance for an existing dependency between releases," which catches degradation that a fresh-eye review would miss. Gate three is "require ATTESTED rather than probe-evaluated PASS for selected high-risk controls," which is a stronger signal that the maintainer organization has explicitly committed to a control rather than passing it by happenstance. These gates do not replace CVE scanning or provenance verification; they add a hygiene layer that catches projects drifting away from good practice before that drift becomes a security incident.

How does this fit with the broader registry trust story?

Scorecard v6 sits at the intersection of three trust signals defenders now use in combination. The first is provenance: Sigstore attestations from Trusted Publishing tell you the artifact came from where the maintainer claimed. The second is malicious-package feeds: OpenSSF malicious-packages and registry quarantine streams tell you whether an artifact has been flagged. The third, where Scorecard v6 lives, is project hygiene: OSPS Baseline conformance tells you whether the maintainer organization is following baseline security practices that make a future compromise less likely. None of these is sufficient alone, but together they let a policy engine make a defensible "consume or block" decision for each dependency.

What still has to mature?

Two gaps stand out. The first is coverage: Scorecard runs against GitHub, GitLab, Azure DevOps, and Bitbucket today, but projects hosted on other forges or self-hosted Git instances are not covered, and the OSPS conformance model is harder to apply where evidence cannot be probed. The second is attestation: the ATTESTED label is most useful when maintainers actually produce signed attestations, and the tooling for doing so is still maturing. The OpenSSF community has been clear that mass attestation is a multi-year goal rather than a 2026 deliverable, and defenders should not expect ATTESTED rates to be high outside the top tier of well-resourced projects for some time.

How Safeguard Helps

Safeguard ingests OpenSSF Scorecard scores, the v6 OSPS conformance output, and Sigstore attestations for every direct and transitive dependency in your tenant. Per-tier policies enforce minimum maturity levels for selected critical packages, require ATTESTED conformance for specific high-risk controls (Code-Review, SBOM, Trusted-Publishing), and alert on any decrease in conformance between releases of an existing dependency. The dashboard surfaces the full evidence chain behind each finding, so an investigator can trace a low score back to the specific Baseline control and the specific probe evidence that drove it, rather than chasing an opaque number. Policy gates feed back into the registry-trust decision: a tier-one upgrade can be blocked when conformance drops, can require explicit security-team approval when the conformance level changes, or can be allowed when the new release carries valid attestations. The result is that Scorecard v6's defender-friendly output translates directly into actionable, auditable policy across every package ecosystem you consume.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.