Best Practices

You Cannot Secure What You Cannot See: Asset Discovery

Most breaches start with an asset nobody remembered owning. Continuous asset discovery is the foundation that every other control depends on.

Shadab Khan
Security Engineer
7 min read

Every postmortem starts the same way. Someone pulls up the spreadsheet, the CMDB, the Confluence page, or the Terraform repo, and the asset that got compromised is not in any of them. The host that ran the legacy reporting service was provisioned in 2021 by a contractor who left in 2022. The third party Python service deployed to production ran an unpinned version of a SQL driver that had been quietly transitively pulling in an outdated TLS library since the last refactor. The S3 bucket exposed in the breach belonged to a marketing team that the security org did not even know existed. None of these incidents are caused by sophisticated adversaries. They are caused by gaps in visibility.

The phrase "you cannot secure what you cannot see" has become so common in security marketing that it has lost most of its meaning. But the underlying claim remains true and underappreciated. Every control your team builds, every policy you write, every runbook you maintain, all of it sits on top of one invisible assumption: that you know what you have. The moment that assumption breaks, everything downstream breaks with it.

The shape of the visibility problem

Modern environments are not single systems. They are loose federations of clouds, clusters, repos, registries, runtimes, package ecosystems, and AI services that interact with each other in ways that are not centrally documented. A typical enterprise has assets spread across:

  • Cloud accounts in two or three providers, managed by different teams under different naming conventions
  • Hundreds or thousands of source repositories, with build pipelines that produce containers, binaries, and serverless artifacts
  • Container registries holding base images that nobody has audited since the team that built them was reorganized
  • Package mirrors, internal registries, and vendor-supplied artifacts that mix freely with public dependencies
  • AI models, prompts, embeddings, vector stores, and MCP servers that did not exist in any inventory category three years ago

Each of these surfaces has its own ownership model and its own lifecycle. Trying to maintain a unified picture by hand is not a tooling problem. It is a thermodynamics problem. Entropy increases faster than human curation can keep up.

Why static inventories keep failing

The default response to visibility gaps is to build an inventory. Someone is asked to "make a list of everything we run." That list usually starts as a spreadsheet, becomes a wiki, gets migrated into a CMDB, and ends life as a Terraform module that nobody updates. Static inventories fail for three structural reasons.

First, they assume that someone will maintain them. In practice, the person who created the inventory leaves, gets reassigned, or simply stops updating it once the original audit pressure subsides. The inventory degrades silently. By the time it is consulted in an incident, it is already months out of date.

Second, they assume a stable definition of what an "asset" is. In a cloud-native environment, this assumption fails immediately. Is a Kubernetes pod an asset? A container image? A specific image digest? A workload identity? A service account? An API route? The right answer depends on what question you are asking, and a single flat inventory cannot answer multiple questions well.

Third, they decouple the inventory from the controls that depend on it. Vulnerability management runs against scanned hosts. IR runs against EDR coverage. Compliance runs against the auditor's spreadsheet. Three different views of "the assets" coexist, all subtly inconsistent, and the inconsistencies create the gaps that attackers exploit.

Asset discovery as a continuous, opinionated capability

The alternative is to treat asset discovery as a continuous capability rather than a one-time exercise. Continuous discovery starts from the premise that the inventory is always wrong, and the goal is to minimize how wrong it is at any given moment. This requires three properties.

It must be sourced from systems of record. Discovery should pull from the cloud APIs, the container registries, the SCM platforms, the package registries, the model hubs, and the runtime telemetry that already exist. If a developer pushes a new image, deploys a new pod, or registers a new MCP server, that asset should appear in the inventory without anyone filing a ticket.

It must be normalized into a graph. Flat lists do not capture the relationships that matter. The fact that a vulnerable library is present is uninteresting. The fact that the vulnerable library is loaded by a service that is exposed to the internet and processes payment data is the real signal. A graph model lets you traverse from artifact to deployment to ownership to data sensitivity in a single query.

It must be reconciled across views. The same asset will appear in multiple sources with different identifiers. A container image has a digest, a tag, a registry path, a build provenance, and a runtime instance. The discovery layer has to understand that all of these refer to the same logical thing, or every downstream control will produce duplicate noise and missed detections.

What "good" looks like in practice

Mature continuous discovery has a few visible characteristics. The mean age of records in the inventory is measured in hours, not weeks. New assets appear automatically with their owners attached, derived from SCM contributors, deployment metadata, or cloud tags rather than from manual entry. Decommissioned assets disappear or are explicitly marked as retired, so risk scoring does not waste cycles on dead infrastructure.

The inventory answers questions like "which production services use a vulnerable transitive dependency in version range X" without anyone running a one-off script. It answers "which AI models are loaded by services that talk to customer data" without anyone building a side spreadsheet. It answers "which MCP tools have been exposed to agents in the last 30 days" without a ticket to the platform team.

When an incident happens, the responder does not start from a blank page. They start from a graph that already has the affected component, its callers, its data flows, and its responsible team attached.

What blocks teams from getting there

Teams that struggle with continuous discovery usually struggle for the same reasons. They have invested in point tools that each maintain their own narrow inventory, and the narrow inventories never reconcile. They have a CMDB that the security team does not trust because it was designed for ITSM workflows rather than threat modelling. Or they have a discovery effort that is owned by a single team but depends on data from teams that have no incentive to keep their feeds clean.

The way out is not another spreadsheet. It is to pick a single asset graph as the source of truth, push everything else to be a consumer rather than a parallel source, and accept that the graph is a product that needs ongoing investment.

How Safeguard Helps

Safeguard treats asset discovery as the foundation of the platform, not a feature bolted onto vulnerability management. Continuous discovery ingests from your SCM platforms, container registries, cloud accounts, build pipelines, and runtime environments to produce a single normalized graph of every software and AI asset you operate. Components, services, repositories, images, models, and MCP servers are reconciled into a unified identity model so the same logical asset never appears twice.

For software, the AI-BOM and SBOM pipeline records every component, transitive dependency, and provenance signal at build and ingestion time, then ties them to the runtime services that load them. For AI workloads, the AI-BOM captures models, datasets, weights, and prompt artifacts so shadow models stop hiding in feature branches. The MCP registry tracks every server and tool exposed to agents, with ownership, version, and last-seen telemetry, so the agent surface area is never a black box.

The result is an inventory that updates itself, knows what it does not know, and gives every other Safeguard control, vulnerability management, policy gates, and incident response, a trustworthy place to start.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.