Infrastructure Security

Rate Limiting in Package Registries: Balancing Security and Developer Experience

Docker Hub's rate limits broke builds worldwide. Rate limiting is necessary for registry security, but getting it wrong disrupts entire engineering organizations.

Nayan Dey
Security Engineer
5 min read

In November 2020, Docker Hub implemented rate limits for image pulls: 100 pulls per six hours for anonymous users, 200 for authenticated users. The change was necessary to prevent abuse and manage infrastructure costs. The result was chaos. CI/CD pipelines across the industry started failing. Kubernetes clusters could not pull images for new pods. Organizations that had built their entire infrastructure assuming unlimited Docker Hub access scrambled to implement workarounds.

Docker Hub's rate limiting rollout is a case study in the tension between registry security and developer experience. Rate limiting is essential for protecting registry infrastructure. But the implementation determines whether it is a security control or a denial-of-service against your own users.

Why Registries Need Rate Limiting

Infrastructure Protection

Package registries serve terabytes of data daily. Without rate limiting, a single misconfigured CI/CD pipeline running in a tight loop can generate millions of requests, consuming bandwidth and compute resources that should serve the broader community.

Abuse Prevention

Automated tools scan registries for vulnerabilities, scrape metadata, or attempt to exfiltrate private packages. Rate limiting constrains the speed at which these tools can operate.

Cost Management

Bandwidth is expensive. Cloud-hosted registries pay for every byte transferred. Rate limiting controls costs by preventing excessive consumption.

Fair Access

Without rate limits, heavy users can crowd out lighter users during peak times. Rate limiting ensures equitable access to shared infrastructure.

Rate Limiting Strategies

Token Bucket Algorithm

The token bucket algorithm is the most common approach for registry rate limiting. Each client has a bucket that holds a maximum number of tokens. Each request consumes a token. Tokens refill at a constant rate. When the bucket is empty, requests are rejected.

This allows bursts of activity (a build pipeline pulling 20 images in rapid succession) while maintaining a sustainable average rate.

Sliding Window

Sliding window rate limiting tracks requests over a moving time window. It provides smoother rate enforcement than fixed windows, which can allow double the intended rate at window boundaries.

Tiered Limits

Different client categories get different limits. Anonymous users get the lowest limits. Authenticated users get higher limits. Paying customers get the highest limits or no limits at all.

This is the model Docker Hub adopted. It incentivizes authentication, which improves the registry's ability to track and manage abuse.

Implementing Rate Limits for Internal Registries

If you operate internal registries (Harbor, Nexus, Artifactory), implementing your own rate limiting protects your infrastructure while avoiding the disruption that external rate limits cause.

Identify Rate Limit Dimensions

Rate limit by client identity rather than IP address. In Kubernetes environments, many pods share the same egress IP. IP-based rate limiting treats an entire cluster as a single client, which is rarely the intent.

Use API tokens, service account identifiers, or client certificates as the rate limit key. This provides per-service or per-pipeline granularity.

Set Limits Based on Measured Usage

Before implementing rate limits, measure your actual usage patterns. What is the normal request rate for your busiest CI/CD pipeline? What does a typical developer workstation generate? Set limits above normal usage but below the level that would cause infrastructure problems.

Implement Graceful Degradation

When a client exceeds its rate limit, return appropriate HTTP status codes and headers. 429 Too Many Requests with a Retry-After header tells the client exactly when to retry. This is far better than silently dropping connections or returning generic errors.

Communicate Limits Clearly

Document your rate limits. Include them in API responses via standard headers. Provide dashboards where teams can see their current usage relative to their limits.

Surviving External Rate Limits

Authenticate Everything

Most registries provide higher rate limits for authenticated requests. Ensure every system that pulls from external registries is authenticated. This includes CI/CD runners, Kubernetes nodes, developer workstations, and staging environments.

Implement Pull-Through Caches

A pull-through cache sits between your infrastructure and the upstream registry. The first request for an image goes to the upstream registry. Subsequent requests are served from the cache. This dramatically reduces the number of upstream requests.

Harbor, Nexus, and Artifactory all support pull-through caching for Docker images and other package formats.

Pre-Pull Critical Images

For images that are critical to cluster operation (ingress controllers, monitoring agents, operators), pre-pull them to all nodes rather than relying on just-in-time pulling. This ensures that node scaling is not blocked by registry rate limits.

Use Multiple Registries

Mirror critical images across multiple registries. If Docker Hub rate limits are causing problems, also push your images to ghcr.io, AWS ECR, or your internal registry. Configure fallback sources in your deployment manifests.

Monitoring Rate Limit Impact

Track 429 Responses

Monitor for HTTP 429 responses across your infrastructure. A spike in 429 errors from an upstream registry indicates you are hitting rate limits and need to adjust your pulling strategy.

Measure Pull Latency

Rate-limited requests often have increased latency before they are rejected. Monitor pull latency alongside error rates to detect rate limiting before it causes outright failures.

Alert on Build Failures

Correlate build failures with registry rate limit metrics. If builds start failing with timeout errors, the root cause may be rate limiting rather than a code problem.

How Safeguard.sh Helps

Safeguard.sh helps organizations understand and optimize their dependency pull patterns by providing visibility into which images and packages are consumed across the organization. This data helps right-size your caching infrastructure, identify redundant pulls that waste rate limit budget, and plan your registry strategy to minimize exposure to external rate limits. Safeguard.sh's SBOM generation also helps identify which dependencies are most critical and should be prioritized for local caching.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.