Case Studies

Spotify's Dependency Management at Scale

Inside Spotify's approach to managing thousands of dependencies across hundreds of microservices, balancing developer autonomy with supply chain security.

Bob
Security Researcher
7 min read

Spotify runs over 2,000 microservices built by hundreds of autonomous engineering squads. Each squad chooses its own technology stack, libraries, and development practices. This radical decentralization powers Spotify's innovation speed, but it creates a dependency management challenge that would give most security teams nightmares.

Their approach to solving it reveals practical lessons about securing software supply chains in organizations where centralized control isn't culturally viable.

The Squad Model and Its Security Implications

Spotify's organizational model is built around small, autonomous squads. Each squad owns a set of services end-to-end: development, deployment, and operations. Squads are grouped into tribes, chapters, and guilds, but the squad is the fundamental unit of delivery.

For dependency management, this means there's no single team that decides which libraries get used. A squad building a recommendation engine might use different libraries than a squad working on the payment system. Multiply that autonomy across hundreds of squads and you get a dependency landscape that's diverse, evolving, and difficult to track.

The security implications are significant. A centralized team can't review every dependency decision. Updates happen at different rates across different squads. A vulnerability in a shared library might be patched by one squad in hours and ignored by another for weeks.

Spotify recognized early that solving this required tooling, not process mandates. You can't ask hundreds of squads to follow a manual dependency review process and expect consistency.

Backstage: The Platform That Changed Everything

Backstage, Spotify's internal developer portal (now open-sourced), became the foundation for their dependency management strategy. Originally built to unify developer experience across the company, Backstage provides a service catalog that tracks every microservice, its ownership, its tech stack, and its dependencies.

For supply chain security, Backstage provides:

Service ownership mapping. Every service has a clear owner. When a vulnerability is found in a dependency, there's no ambiguity about who's responsible for fixing it.

Dependency visibility. Backstage catalogs the dependencies used by each service, providing a company-wide view of the dependency landscape. This answers the question "which services use library X?" in seconds rather than days.

Standardized scaffolding. When squads create new services, Backstage provides templates that include security best practices by default: dependency scanning, vulnerability alerting, and automated update mechanisms. New services start secure.

Plugin ecosystem. Security tooling integrates with Backstage through plugins, putting security information where developers already work rather than requiring them to use separate security-specific tools.

The genius of Backstage, from a security perspective, is that it solved a developer experience problem and a security problem simultaneously. Squads adopted it because it made their work easier. Security got visibility as a side effect.

Automated Dependency Updates

Spotify invested heavily in automating dependency updates. Manual dependency management doesn't scale when you have thousands of services, and falling behind on updates is how known vulnerabilities persist in production.

Their automation works in tiers:

Low-risk updates (patch versions, minor versions of well-tested libraries) are updated automatically. A bot creates a pull request, runs the service's test suite, and if tests pass, merges the update. Human review is optional.

Medium-risk updates (major versions, libraries with complex APIs) get automated pull requests with human review required. The PR includes a changelog summary, vulnerability information, and known breaking changes.

High-risk updates (framework upgrades, language version changes) are coordinated manually but supported by tooling that identifies affected services and tracks migration progress.

This tiered approach recognizes that not all updates carry the same risk. Treating a patch bump with the same process as a major framework migration wastes engineering time. Treating a major migration with the same casualness as a patch bump invites breakage.

Vulnerability Response at Scale

When a critical vulnerability like Log4Shell emerges, Spotify's response depends on the infrastructure they've built during calm times:

  1. Discovery: Automated scanning identifies every service using the affected library, including transitive dependencies.
  2. Triage: Services are prioritized based on exposure. Internet-facing services, services handling user data, and services in the payment path get priority.
  3. Remediation: For services using standard configurations, automated updates are pushed. For non-standard configurations, squad owners are notified with specific guidance.
  4. Verification: Post-update scanning confirms that the vulnerability is remediated. Services that haven't been updated are escalated.
  5. Retrospective: After the incident, they analyze response times, identify bottlenecks, and improve processes.

The key enabler is the service catalog. Without knowing which services use which dependencies, step one is impossible. And without step one, everything else is guesswork.

The Dependency Health Score

Spotify developed an internal metric they call the dependency health score. Each dependency is scored based on:

  • Maintenance activity: How recently was the last release? Are issues being addressed?
  • Vulnerability history: How often are vulnerabilities found? How quickly are they patched?
  • Community health: How many maintainers? Is the bus factor above one?
  • Usage within Spotify: How many services use this dependency? Is it growing or shrinking?
  • Update freshness: How current is the version used compared to the latest release?

The health score isn't used as a hard gate. Squads aren't blocked from using low-scoring dependencies. But the score is visible in Backstage, in code review tools, and in pull request checks. It provides context for decision-making without removing autonomy.

Over time, Spotify observed that making the health score visible shifted behavior. Squads naturally migrated away from low-scoring dependencies when better alternatives existed. The information changed decisions without requiring mandates.

Handling the Java Ecosystem

Spotify's backend is heavily Java-based, which means they deal with the Maven dependency ecosystem's particular challenges: deep transitive dependency trees, version conflicts, and the diamond dependency problem.

Their approach includes:

Dependency convergence enforcement. Build tooling flags transitive dependency conflicts and requires explicit resolution. This prevents the scenario where two libraries require different versions of the same transitive dependency and the build silently picks one.

Bill of Materials (BOM) management. For common library combinations, Spotify maintains internal BOMs that define compatible versions. Squads using the BOM get tested, compatible dependency sets without managing individual versions.

Transitive dependency tracking. Their scanning doesn't stop at direct dependencies. The full transitive tree is analyzed, because a vulnerability three levels deep in the dependency graph is still a vulnerability in your service.

What Doesn't Work

Spotify is transparent about what doesn't work in their model:

Long-tail services. Some services are rarely changed and their dependencies go stale. Automated update bots help, but services with insufficient test coverage can't be updated with confidence.

Cross-squad coordination. When a vulnerability requires coordinated response across many squads, the decentralized model creates coordination overhead. Some squads respond quickly, others slowly.

Dependency proliferation. Autonomy means squads sometimes adopt different libraries for the same purpose. This spreads the dependency surface wider than necessary.

Resource constraints. The platform team building dependency management tooling competes for resources with product teams. Investment in security infrastructure requires constant justification.

Lessons for Other Organizations

Spotify's experience offers several practical takeaways:

  1. Build security into the developer platform. Don't create separate security tools. Integrate security into the tools developers already use.
  2. Visibility changes behavior. Making dependency health visible influences decisions without mandates.
  3. Automate by risk tier. Not all updates need the same process. Automate the routine, review the risky.
  4. Own the service catalog. Knowing what you run and who owns it is foundational to everything else.
  5. Accept imperfection. A decentralized model won't achieve 100% compliance. Aim for consistent improvement rather than perfection.

How Safeguard.sh Helps

Safeguard.sh provides the dependency visibility and health monitoring that Spotify built into Backstage, as a standalone platform that integrates with your existing development tools. It catalogs dependencies across all your services, scores them for health and risk, automates vulnerability detection and remediation prioritization, and tracks update freshness over time. For organizations that want Spotify-style developer-friendly security without building a custom developer portal, Safeguard.sh delivers the core supply chain intelligence layer out of the box.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.