Best Practices

Purple Team Exercises With Supply Chain Focus

Most purple team exercises stop at the perimeter. A supply-chain-focused exercise probes the dependency graph, the build pipeline, and the trust assumptions in your SBOM.

Nayan Dey
Senior Security Engineer
6 min read

A purple team exercise is supposed to teach defenders something they did not know. The version most teams run does not, because the scenarios are recycled and the supply chain is treated as an afterthought. The attacker drops a phishing payload, the defender catches it, everyone congratulates themselves, and the report is filed next to last year's report which says the same thing.

A supply-chain-focused purple team exercise refuses that script. The scenarios assume the attacker is already inside a dependency, a build artifact, or a registry account, and the question is how far they can move before defenders catch them. The lessons are different and almost always uncomfortable.

What makes a supply chain exercise different

The standard purple team exercise tests detection on the network and the endpoint. The supply chain version tests detection on the build pipeline, the registry, the dependency graph, and the deployed asset. The defender's tools are different, the timelines are different, and the success criteria are different.

Three properties make supply chain exercises uniquely useful. They reveal trust assumptions that are otherwise invisible. They exercise muscle that the SOC does not normally use. And they force the team to instrument places that are usually treated as out of scope, like the build runner and the package proxy.

The unfortunate flip side is that supply chain exercises are harder to run. The attack patterns are slower, the indicators are subtler, and the blast radius is broader. That is the point. The patient attacker is the one who hurts you most, and that is exactly the attacker the exercise should simulate.

Designing the scenarios

Start with three scenarios. They are deliberately concrete and deliberately bounded.

The first scenario is the malicious version bump. A direct dependency in one of your repositories ships a new minor version that adds a backdoor. The build picks it up, the artifact is published, and a deployed asset begins making outbound requests that should not exist. The defender's job is to detect the new dependency before the artifact reaches production, or failing that, to detect the runtime behavior afterward.

The second scenario is the build runner compromise. An attacker has obtained credentials to a self-hosted CI runner. They modify a build step to inject a small payload into one of your published artifacts. The artifact still passes its tests, the SBOM still looks right, and the deployment proceeds. The defender's job is to detect the discrepancy between source and artifact.

The third scenario is the upstream dependency takeover. A maintainer of a transitive dependency several levels deep in your graph is compromised. They publish a release that quietly exfiltrates environment variables on import. Your repositories never reference the package directly. The defender's job is to detect that you are exposed at all, and then to determine reachability.

Each scenario has a defined start state, a defined end state, and a defined success criterion for the defender. Without those three, the exercise becomes a story instead of a test.

The blue team setup

The defenders need to know an exercise is happening. This is the purple part. Red exercises that ambush blue teams produce stress, not learning. The agreement is that the blue team is told the window, the broad category of the exercise, and the rules of engagement. They are not told the specific scenarios.

The blue team uses their normal tools. No special instrumentation is added for the exercise. If their existing tooling cannot see the attack, that is the finding. Adding new sensors during an exercise to make a metric look better is a violation of the protocol.

For supply chain exercises, the normal tools should include the SBOM platform, the policy gate, and the asset inventory. Use safeguard_search_components and safeguard_get_dependency_graph as the inventory anchor, safeguard_evaluate_policy_gate to check whether the policy would have caught the malicious version bump, and safeguard_list_assets to see whether the deployed artifact is even visible to the defender.

The red team setup

The red team operates with explicit scope. They do not actually publish malicious packages to public registries. They work in a sandboxed registry mirror, a sandboxed CI environment, or a dedicated test product in your platform. The simulation has to be high-fidelity enough that the defender's tools see the same signals they would see in a real incident, but low-risk enough that a mistake does not cause a production breach.

The simulation fidelity is the part that takes the most planning. Use a separate Safeguard product for the exercise so the findings do not contaminate production dashboards. Pre-stage the malicious dependency in the test environment, run the exercise, and capture the red team's actions and the blue team's responses on a shared timeline.

Measuring the outcomes

The exercise produces four numbers worth tracking across exercises. Time to detect from the moment the attack signal first appeared in the defender's tools. Time to scope, meaning the moment the defender knew the full set of affected assets. Time to contain, meaning the moment the malicious component was blocked at the proxy or gate. And the count of detection gaps, meaning specific signals that were available but not surfaced by current tooling.

The first three numbers tell you whether your response is fast enough. The fourth tells you where your tooling needs work. Detection gaps are the most actionable output of the exercise. They become tickets, the tickets become detections, and the next exercise tests whether the detections actually fire.

Cadence and reporting

Run a supply chain purple team exercise twice a year. More often than that and the team treats it as overhead. Less often and the muscle atrophies. Each exercise runs over a week, with a half-day kickoff, three days of operations, and a half-day debrief on the last day.

The report has three parts. The narrative timeline of the exercise. The list of detection gaps with proposed fixes. And the comparison to the previous exercise's metrics. The comparison is the part that matters. A single exercise tells you a snapshot. A series of exercises tells you whether the program is improving.

What to do with findings

Findings from a purple team exercise are different from findings in production. They are not vulnerabilities. They are gaps in the defender's posture. Track them in the same task system you use for real incidents, but tag them so they can be reported separately. The exercise findings should close on a faster timeline than production findings, because they have no business impact and no customer pressure to slow them down.

A program that runs supply chain purple team exercises and never closes the resulting findings is doing security theater. A program that closes the findings between exercises is one of the rare programs that gets measurably better year over year.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.