The Open Source Security Foundation (OpenSSF) continued to iterate on Scorecard throughout 2023, refining the automated tool that assesses open source projects against a set of security heuristics. For organizations managing hundreds or thousands of open source dependencies, Scorecard has become an essential signal—imperfect but invaluable—for understanding the security posture of the software they depend on.
What Scorecard Measures
Scorecard runs a battery of automated checks against a project's GitHub repository and produces a score from 0 to 10 for each check. The checks fall into several categories:
Code Review and Contribution Practices
Branch Protection: Does the project require reviews before merging? Are status checks required? Is force-push to the default branch disabled?
Code Review: What percentage of recent changes went through a review process? Scorecard analyzes recent commits to determine if a single individual can push unreviewed code.
Contributors: Does the project have contributions from multiple organizations? Single-organization or single-developer projects carry higher bus-factor risk.
Security Practices
Vulnerabilities: Does the project have known, unpatched vulnerabilities? Scorecard checks the OSV database for outstanding advisories.
Security Policy: Does the project have a SECURITY.md file or security policy? This indicates that the maintainers have a process for handling security reports.
Signed Releases: Are releases signed with GPG or Sigstore? Signed releases provide verification that artifacts haven't been tampered with.
Token Permissions: Are GitHub Actions workflow permissions scoped to the minimum necessary? Overly permissive tokens in CI/CD workflows are a common supply chain attack vector.
Build and Release Practices
CI Tests: Does the project run automated tests in CI? Projects without CI are harder to evaluate for quality and more likely to introduce regressions.
Dependency Update Tool: Does the project use Dependabot, Renovate, or similar tools to keep dependencies current?
Pinned Dependencies: Are dependencies in CI workflows pinned to specific versions or hashes? Unpinned dependencies (especially GitHub Actions referenced by tag rather than SHA) are vulnerable to tag-manipulation attacks.
SAST: Does the project use static analysis security testing tools like CodeQL, Semgrep, or SonarQube?
Fuzzing: Is the project integrated with OSS-Fuzz or ClusterFuzz? Fuzzing catches bugs that unit tests miss, including security-relevant memory corruption issues.
Maintenance Signals
Maintained: Has the project been actively maintained in the last 90 days? Abandoned projects don't receive security fixes.
License: Does the project have a valid open source license? This is more of a legal than security check, but projects without clear licenses create risk.
2023 Improvements
Throughout 2023, the Scorecard team made several significant improvements:
Reduced false positives. Earlier versions of Scorecard generated noisy results, particularly around branch protection and code review checks. The 2023 updates improved detection accuracy, reducing cases where projects with good practices received low scores due to detection limitations.
Probe-based architecture. Scorecard moved toward a probe-based architecture that separates data collection (probes) from scoring decisions. This makes it easier to add new checks and customize scoring for different use cases.
GitHub Actions integration. The Scorecard GitHub Action became easier to integrate into CI/CD pipelines, allowing projects to track their scores over time and catch regressions.
Ecosystem coverage. While Scorecard primarily targets GitHub repositories, work continued on supporting other forges and expanding the checks that work across platforms.
Limitations and Criticisms
Scorecard is useful but not infallible. Common criticisms include:
Goodhart's Law applies. When a metric becomes a target, it ceases to be a good metric. Some projects have been observed making superficial changes (adding a SECURITY.md with minimal content, enabling branch protection without enforcement) to improve scores without improving actual security.
Not all checks are equally important. A project with a perfect score on branch protection but no fuzzing or SAST may be less secure than one with imperfect contribution practices but thorough testing. The aggregate score can obscure this.
Small projects are penalized. Many Scorecard checks favor projects with organizational-scale processes. A well-maintained single-developer project may score poorly because it doesn't have multi-reviewer code review or organizational diversity.
It's a snapshot, not a guarantee. A high Scorecard score on the day you adopt a dependency doesn't protect you from a compromised maintainer account six months later.
Using Scorecard Effectively
Don't use it as a gate—use it as a signal. Blocking dependencies based solely on Scorecard scores will create friction without proportional security benefit. Use it as one input into a broader dependency evaluation process.
Focus on critical checks. Not all checks carry equal weight for your use case. Token permissions, signed releases, and vulnerability status may matter more to you than contributor diversity.
Track scores over time. A declining score can signal maintainer burnout or project abandonment—useful leading indicators of future security risk.
Contribute back. If Scorecard gives a false negative or misses something, file an issue. The project benefits from community feedback.
Combine with other data. Scorecard data combined with download counts, dependency depth, and vulnerability history provides a much richer risk picture than any single metric.
How Safeguard.sh Helps
Safeguard.sh integrates OpenSSF Scorecard data into its dependency risk analysis, combining Scorecard scores with vulnerability data, maintenance status, and usage patterns to provide a comprehensive view of open source dependency risk. Our platform tracks Scorecard scores for all your dependencies over time, alerts you when scores drop below thresholds, and contextualizes the data within your specific usage—because a low-scoring dependency in your most critical application deserves more attention than the same dependency in an internal prototype.