Vulnerability Management

Vulnerability Burndown Charts That Actually Work

Most burndown charts lie about progress. Here is how to build one that survives executive scrutiny by combining reachability, age cohorts, and inflow data.

Shadab Khan
Security Engineer
8 min read

The chart that everyone has and nobody trusts

Every security team has a burndown chart. It usually shows total open findings on the y-axis and time on the x-axis. The line wiggles. Sometimes it goes down. Sometimes it goes up. In a really good month, it goes down then up then down again, which is interpreted as either progress or chaos depending on who is presenting.

Executives have learned to distrust this chart. The number does not move predictably, the dynamics behind it are opaque, and a quarter of solid remediation work can be entirely cancelled by a scanner upgrade that surfaces a thousand new findings overnight. When the security lead presents the chart, half the room is wondering whether this month's number is actually meaningful or whether it is an artefact of which scanners ran, which environments were included, and which old findings got bulk-closed.

A burndown chart that does not survive scrutiny is worse than no chart at all, because it teaches the audience that the security program's metrics are not to be trusted. The fix is to build a chart that exposes the dynamics underneath the headline number. That requires separating the things that look like one number into the several numbers they actually are.

Inflow and outflow are different stories

The first decomposition is to split the chart into inflow and outflow. Inflow is the rate at which new findings appear, driven by scanner output, advisory publication, and codebase changes. Outflow is the rate at which findings close, driven by remediation, mitigation, acceptance, and bulk closures. The headline backlog number is the cumulative difference between these two rates.

Plotting them separately tells a much clearer story. A team can be doing excellent remediation work, closing 200 findings a week, and still see the backlog grow because inflow is 250. The headline chart looks bad. The outflow chart looks great. The honest answer is that the team is overwhelmed by input rate, and the right intervention is at the source of the input, not at the remediation rate.

Conversely, a team can show a flat backlog with very low outflow and very low inflow, which looks neutral but is actually a quiet failure. Nothing is being remediated, but nothing new is being detected either, usually because a scanner has been silently broken for weeks. The headline chart hides the pathology. The decomposed chart exposes it.

Reachability cohorts change the meaning

The second decomposition is by reachability. Reachable and unreachable findings are different categories of risk and should not be aggregated in the same headline number. A team that has 5,000 unreachable findings and 50 reachable ones is in a fundamentally different posture from a team with 50 unreachable findings and 5,000 reachable ones, even though the total is the same.

Plotting reachable findings separately gives executives a number that actually reflects security posture. Unreachable findings are essentially a hygiene queue, not an active risk register. They still need to be closed eventually, but their closure is not the same kind of progress as closing reachable findings.

In our deployments, the chart that lands best in executive review shows two stacked series: reachable findings as the prominent line, unreachable findings as a lighter band beneath. The reachable line is the one that should respond predictably to the team's remediation efforts. The unreachable band drains more slowly through routine maintenance.

Age cohorts surface the hidden tail

The third decomposition is by age. A backlog of 1,000 findings that are all under 30 days old is a different story from a backlog of 1,000 findings where 600 are over a year old. The first is operational. The second is structural.

The chart that shows this clearly is a stacked area plot with age bands: zero to 30 days, 30 to 90 days, 90 to 365 days, over a year. The shape of the stacked area tells the team where to focus. A growing band over one year old indicates that the formal acceptance and mitigation workflows are not running, because findings should not be sitting that long without a documented decision. A growing band in the 30 to 90 day range indicates that the routing into engineering is breaking down, because findings of that age should mostly be in active fix or formal acceptance.

The age decomposition also gives the team a directly actionable number: the count of findings older than the SLA. That number should always be falling. If it is not, every other metric in the program is suspect.

Severity and risk are not the same axis

The fourth decomposition is by risk, which is distinct from severity. CVSS-based severity charts are common but misleading, because severity does not equal risk. A more useful chart shows findings stacked by combined risk score, where the inputs include reachability, exploitability, and contextual factors specific to the environment.

The point of the risk-decomposed chart is to make sure the team's attention is going to the highest-risk findings, not just the most numerous ones. A burndown that looks great in volume terms can hide a problem if the highest-risk findings are precisely the ones that are not moving. Plotting risk separately catches that.

Where Griffin AI strengthens the data

The decompositions described above are only as good as the data underneath them. Reachability needs to be computed and refreshed regularly. Risk scores need to integrate exploitability and context. Age cohorts need accurate timestamps and clean closure data. The mechanical work of keeping these inputs current is significant, and most teams underestimate it.

Griffin AI handles much of this maintenance layer. It re-evaluates reachability when the codebase changes, refreshes risk scores when exploitability data updates, and surfaces anomalies in the burndown data, such as bulk-closure events that distort the apparent outflow rate. The chart the team presents is built on data that the AI has been maintaining continuously, not on a static export from last week.

Griffin also surfaces the why behind unexpected movements. When the chart shows a spike, Griffin can attribute it to specific scanners, specific advisories, or specific code changes. When the chart shows a sudden drop, Griffin can confirm whether it represents real remediation or a closure pattern that needs review. The audience for the chart, whether executive or auditor, gets answers rather than speculation.

Auto-PRs as a visible workstream

A useful addition to the burndown chart is a separate trace for findings closed through automated remediation. Auto-PR closures are mechanically distinct from manual remediation, and showing them as a separate series makes the team's automation investment visible. In our deployments, mature auto-PR workflows close between 40 and 60 percent of findings, which is a significant share of the outflow rate. Hiding that share inside the aggregate number undersells the program.

The chart that combines manual remediation, automated remediation, formal mitigation, and risk acceptance as four distinct outflow streams gives executives a complete picture of how findings are leaving the queue. Each stream tells a different story about the program's maturity, and presenting them together prevents the misreading that a high outflow rate must mean a lot of engineering work.

The chart that ends the awkward review

A burndown chart that decomposes inflow and outflow, splits reachable from unreachable, shows age cohorts, separates risk from severity, attributes outflow to its actual mechanism, and refreshes its inputs continuously is a chart that survives executive scrutiny. It does not need narrative defence. The data answers the questions before they are asked.

That is the difference between a metric that builds executive trust in the security program and one that quietly erodes it. Build the chart that is honest, and the awkward quarterly review becomes a productive one.

Choosing the cadence that matches the audience

A burndown chart that updates daily is good for the security team's own situational awareness, but it is the wrong cadence for executive review. Weekly updates show too much short-term volatility for monthly business rhythms. The chart that lands best with executives uses a weekly granularity for the trace and a monthly cadence for the headline summary, with a quarterly trend overlay that smooths the noise into a directional signal.

Beneath the executive view, the security team needs a finer-grained operational chart. Daily updates, decomposed by service, asset criticality, and assigned engineer, support the day-to-day prioritisation work. The two views should share the same underlying data so that when a question is escalated from operational to executive context, the numbers reconcile. Different dashboards drawing from different data sources is a recipe for the kind of inconsistency that destroys trust faster than any single bad quarter.

The simplest test of whether the cadence is right is what the audience asks for after a quarter. Executives who want more frequent updates are usually saying the chart is too noisy. Executives who want less frequent updates trust the program's direction. The second response is the goal.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.