Best Practices

Metrics Program For Supply Chain SecOps

Most supply chain SecOps metrics measure activity instead of outcomes. Here is how to design a metrics program that survives leadership scrutiny and changes behavior.

Nayan Dey
Senior Security Engineer
6 min read

The first metrics dashboard I ever built for a supply chain SecOps team had eleven charts on it, and not one of them changed a single decision in a year. It tracked tickets opened, tickets closed, scans completed, packages flagged, severity distributions, weekly trend lines, and a vanity number called "components secured" that nobody could define. The dashboard looked busy. The team looked busy. The risk did not move.

A good metrics program for supply chain SecOps does the opposite. It is small, it is opinionated, it forces decisions, and most of its charts make somebody uncomfortable. This article describes how to design that program.

The four questions a metrics program must answer

Every metric you publish should answer one of four questions. If it does not, cut it. The questions are: How exposed are we right now? How fast do we close exposure? How well does new code behave? And what is leaking past our gates?

Notice what is missing. There is no question about how busy the team is. There is no question about how many scans ran. Activity is not a metric. Activity is a side effect of doing the work. If you measure activity, the team optimizes for activity, and the risk does not move.

Question one: how exposed are we right now?

This is the exposure question. The honest answer requires three numbers. The count of critical and high vulnerabilities currently present in production assets. The count of components in production that are end-of-life or unmaintained. And the count of projects that have not been scanned in the last seven days.

The first number is your reachable risk. The second is your latent risk, the kind that does not have a CVE today but will. The third is your blind spot, the projects you are not even watching.

Pull these from safeguard_get_security_dashboard for the live exposure picture, and from safeguard_list_projects filtered by last scan time for the blind spot count. Publish all three on the same chart, weekly. The blind spot number is usually the one that gets attention, because it is fixable in a way that reachable risk often is not.

Question two: how fast do we close exposure?

This is the velocity question. Mean time to remediate is the canonical answer, but mean is the wrong statistic. Use the seventy-fifth and ninety-fifth percentiles instead. The mean hides the long tail, and the long tail is where breaches happen.

For each severity bucket, publish the seventy-fifth percentile time from advisory publication to deployed fix, and the ninety-fifth percentile. Publish both, side by side, against your service-level objective. If the ninety-fifth percentile is more than three times the seventy-fifth, you have a tail problem and the work is to find which projects are causing it.

safeguard_get_risk_trends with a remediation lens gives you the time-to-remediate distribution. Bucket by severity. Slice by project. The projects in the worst quartile are your remediation targets for the next quarter. Naming them in the metrics review is uncomfortable. Do it anyway. The discomfort is what changes the behavior.

Question three: how well does new code behave?

This is the leading-indicator question. Exposure and velocity are both lagging. You cannot run a program on lagging metrics alone. The leading indicator for supply chain SecOps is the behavior of new pull requests. Are they introducing new high-severity findings? Are they pinning versions? Are they adding dependencies that fail the policy gate?

Track three behaviors per project. The rate of pull requests that introduce a new high or critical finding. The rate that get blocked by the policy gate. And the rate that get bypassed via override. The override rate is the most interesting of the three. A project with a high override rate is a project where the policy is not trusted and is being routed around. That is a signal worth investigating before the override becomes the default behavior.

Use safeguard_evaluate_policy_gate results aggregated weekly to get the gate behavior numbers. Filter by project to see which teams are on the right side of the curve and which are not.

Question four: what is leaking past our gates?

This is the assurance question. Every program has gaps. The mark of a mature program is not the absence of gaps but the awareness of them. Track three leakage indicators. Findings discovered in production that did not appear in any earlier scan. Components in deployed assets that are not in any source repository. And findings closed without a corresponding code change.

The first is detection drift. The second is asset drift. The third is process drift, where someone marked a finding fixed without actually fixing it. All three are common. All three are repairable. None of them appear on a typical dashboard.

safeguard_get_anomalies surfaces these classes of drift directly. Sample the output weekly, and pick one drift case per week to investigate end to end. The investigation produces either a fix or a runbook change, and over time the leakage rate drops.

How to publish

Publishing is its own discipline. Three rules keep the program honest. First, every metric has an owner. The owner is named, not "the team." Second, every metric has a target. The target is set quarterly and reviewed quarterly. Third, every metric has a trend, not just a current value. A snapshot tells you nothing. A trend tells you whether the program is working.

Publish weekly to the team. Publish monthly to engineering leadership. Publish quarterly to the executive sponsor. Each audience needs a different cut, but the underlying numbers are the same. The trick is restraint. Do not show the executive sponsor the override rate broken down by team. Show them the aggregate, and have the breakdown ready if they ask.

What to do when a metric stops changing

A flat metric is a useful signal, not a failure. When a number stops moving, ask whether you have hit a real floor or whether the program has lost its leverage on that number. If you have hit the floor, change the target. If you have lost leverage, change the program. Either way, the metric stays on the board, because removing it would hide the fact that the team is no longer pushing on it.

Metrics programs fail in two ways. They fail by measuring the wrong things, and they fail by measuring the right things and not acting on them. The design choices in this article address the first failure. The acting-on-it part is up to you. Set the meeting, run the meeting, name the projects, and move the work. The metrics will follow.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.