Vulnerability Management

Risk-Based Prioritisation Beyond CVSS

CVSS tells you severity. It does not tell you risk. Here is how reachability, exploitability, and AI context produce a prioritisation model that survives reality.

Nayan Dey
Senior Security Engineer
8 min read

The score that started as a guideline and became a god

CVSS was designed to be a common vocabulary for describing the severity of a vulnerability. It was never designed to be a prioritisation algorithm. Somewhere in the past decade, that distinction got lost. Today, vast numbers of security programs route triage entirely on CVSS thresholds. Anything 9.0 or above is critical. Anything 7.0 to 8.9 is high. Anything below is queued indefinitely. The system is simple, defensible, and quietly broken.

The break shows up in two ways. The first is false positives at the top of the queue. A CVSS 9.8 vulnerability in a library you import but never call is not a high risk to your application. It is a high severity flaw in code you do not execute. Treating it as P1 wastes triage cycles on something that cannot hurt you. The second break is false negatives in the middle of the queue. A CVSS 6.5 vulnerability in a library that processes user-supplied input on a public endpoint may be a much higher real-world risk than the 9.8 you spent the morning on, but it sits unread because the threshold gate stopped it.

CVSS is doing exactly what it was designed to do. The problem is what we are using it for.

What risk actually means

Risk is not severity. Risk is the probability that a vulnerability is exploited in your environment, multiplied by the consequence if it is. CVSS contributes to the consequence side of that equation, but it says almost nothing about the probability side. To estimate probability, you need information CVSS does not have access to. Whether the vulnerable code is reachable. Whether an exploit exists in the wild. Whether the affected endpoint is exposed to the internet. Whether the data flowing through that code path is sensitive. Whether your environment has compensating controls that block the relevant attack chain.

A risk-based prioritisation model takes those signals as inputs and produces a score that is meaningfully different from CVSS. Two findings with identical CVSS scores can land at very different risk scores once these factors are folded in. That is not a flaw in the model. It is the model doing its job.

The shift from severity-based to risk-based prioritisation is the single largest improvement most vulnerability management programs can make. It changes the conversation from how do we get the criticals down to how do we focus on the findings most likely to actually matter. Those are not the same question.

Reachability as the strongest signal

Of the inputs that distinguish risk from severity, reachability is the most decisive. A vulnerability in code your application never calls cannot be exploited through your application surface. It might still be exploited through other vectors, like a compromised build pipeline that pulls the same library, but those vectors fall under different threat models with different controls. For application-level risk, unreachable means safe in practical terms.

Modern reachability analysis is good enough to be the primary filter on the queue. Static call-graph construction handles the bulk of the work, and dynamic instrumentation fills in the gaps where reflection or framework dispatch obscures the static picture. The combined accuracy is high enough that you can treat reachability as a binary in most cases: reachable findings move to the front of the prioritisation pipeline, unreachable findings move to a maintenance queue.

The volume reduction is substantial. In typical enterprise codebases, between 60 and 80 percent of findings are not reachable. That is not a rounding error. That is the difference between a triage queue that is operationally manageable and one that is not.

Exploitability changes the picture

Reachability tells you whether the code can be executed. Exploitability tells you whether anyone has figured out how to weaponise it. The two are independent inputs to risk, and combining them produces a prioritisation that is much more aligned with how attacks actually unfold.

The signal sources for exploitability have improved significantly in 2026. The CISA Known Exploited Vulnerabilities catalog continues to be the highest-confidence indicator. EPSS scores from FIRST provide a probabilistic estimate. Threat intelligence feeds report active exploitation in specific industries or geographies. None of these is perfect. Together, they form a much better picture than CVSS alone.

A reachable vulnerability with active exploitation in the wild is genuinely urgent. A reachable vulnerability with no known exploit and a low EPSS score can be queued for the next sprint. Without exploitability data, both look the same. With it, the prioritisation matches the reality of the threat landscape.

Context is what AI brings to the model

The factors that turn a generic risk score into a prioritisation specific to your environment are mostly contextual. Is the vulnerable endpoint authenticated? Does the affected service handle regulated data? Is the deployment internet-facing or internal? Is there a WAF rule that blocks the relevant request pattern? These questions cannot be answered by a scanner alone. They require knowledge of your architecture, your configuration, and your operational posture.

Griffin AI is built to integrate that context into the prioritisation step. It reads the finding alongside the codebase, the deployment configuration, and the runtime telemetry, and produces a context-adjusted risk score with the reasoning shown. An engineer reviewing the score can see exactly which contextual factors moved it up or down, and can override individual factors when they have information Griffin does not.

The output is not a black-box verdict. It is a structured argument that the engineer can interrogate. That matters because risk-based prioritisation only works if the team trusts the prioritisation. A score nobody trusts is a score nobody acts on, which is just CVSS in a new wrapper.

Closing the loop with auto-PRs

A risk-based prioritisation model is only useful if the prioritisation flows into action. The lowest-risk findings should not consume engineer time. The highest-risk findings should be on someone's calendar today. The middle band should resolve through automation wherever possible.

Automated PRs are the mechanism for the middle band. When a finding has been classified as needing remediation but does not require human investigation, the system opens a PR with the appropriate fix, runs CI, and routes the review to the right engineer. The security team owns the policy. The bot handles the mechanical work. The result is a queue where most of the volume drains without human intervention, and the human attention that remains is concentrated on the genuinely high-risk findings.

What changes when prioritisation is honest

Teams that move from CVSS-only prioritisation to a risk-based model usually report two things in the first quarter. The volume of findings on the active triage queue drops by an order of magnitude. The mean time to remediate the highest-risk findings drops to under a week, sometimes under a day. Both effects are downstream of the same change: when the prioritisation reflects real risk rather than theoretical severity, the team's attention follows the priority, and the work gets done.

CVSS is still useful as one input. It is not useful as the whole model. The teams that survive 2026 with their security signal intact are the ones that have made that distinction and built the workflow around it.

Calibrating the model with historical data

A risk-based prioritisation model is only as good as its calibration. The first version a team rolls out will get some scores wrong, and the right response is to learn from those misses rather than to abandon the approach. The calibration loop looks like this. For every finding the team has remediated, accepted, or mitigated, record the risk score the model produced and the actual outcome. Over time, look for patterns where the score consistently overestimated or underestimated the real risk.

The pattern that surfaces most often in early calibration is reachability false negatives. The model marks a finding as not reachable, but the team later discovers that a framework path or runtime configuration made it reachable in practice. This is correctable by tuning the call-graph analysis and adding the missing entry points. A few iterations of this kind usually get reachability accuracy into the high 90s.

The second pattern is exposure misclassification. The model assumes a service is internal, but the service is actually exposed through a partner integration that bypasses the main perimeter. This is correctable by improving the asset inventory data that feeds the model. Without good asset data, no risk model is reliable, which is why the prioritisation work and the asset management work end up coupled.

The teams that sustain a working risk-based model treat calibration as ongoing maintenance, not a one-time setup. The model gets better every quarter, the score becomes more trusted, and the prioritisation it produces becomes more durable. That compound improvement is what separates a program that uses risk-based prioritisation in name only from one that genuinely operates on it.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.