Incident Analysis

Docker Hub Exposed Secrets at Scale 2024

Researchers keep finding valid AWS, GitHub, and cloud credentials baked into public Docker Hub images. What the 2024 data shows and how to stop shipping secrets.

Shadab Khan
Security Engineer
8 min read

Secret leakage via container images is not a new finding, but the 2024 round of research across Aqua, Orca, Wiz, and academic groups produced numbers sharp enough to force a conversation. Public Docker Hub continues to carry tens of thousands of images with valid, high-privilege credentials. The leaked material covers AWS IAM keys, GitHub personal access tokens, GCP service-account JSON, Azure client secrets, Slack webhooks, and database credentials. This is not a long-tail problem on abandoned images. A meaningful fraction appear in images built by organizations you have heard of, pulled millions of times per month.

This post summarizes what the 2024 research found, why secrets still ship in images despite a decade of "do not do this" advice, and what durable engineering fixes actually work.

What did the 2024 Docker Hub research actually find?

The 2024 research found that a significant minority of public Docker Hub images contain at least one exposed credential and that a non-trivial fraction of those credentials are live at the time of discovery. Aqua's "Shadow Repositories" work, the Binarly trivy-scan dataset, and academic analyses published at USENIX and RAID converge on a few stable numbers:

  • Around 10 percent of sampled public images contain material that a secret scanner flags with medium-to-high confidence.
  • Between 4 and 7 percent of flagged secrets are confirmed live when responsibly tested against the corresponding API endpoint.
  • AWS IAM credentials dominate the cloud category. GitHub tokens dominate the source-control category. Slack webhooks dominate the business-communication category.
  • A single image tag can contain between one and several hundred credentials when build artifacts like .env files, CI variables, or bundled secrets directories are accidentally included.

The research is consistent year over year. The 2024 numbers are not materially different from 2022 or 2023. What changed in 2024 is the sophistication of the scanning pipeline public researchers now use, which means more subtle leakage - secrets in JWT form, in encrypted-at-rest artifacts with accompanying keys, in dev databases dumped into layers - is now catchable.

How do credentials end up in public images?

Credentials end up in public images through four recurring engineering mistakes: build-time environment variables that get baked into layers, .env files copied with COPY . ., CI caches that include secret material, and developer machines pushing local scratch images to public repositories. Each of these has a cleaner pattern that avoids the leak, but the default path in many build systems is the unsafe one.

The build-time environment variable issue deserves specific attention because developers assume ARG and ENV values disappear after the build. They do not. Any RUN command that references the value embeds the output in a layer, and any ENV set in the final image appears in its metadata. An attacker pulling the image can extract both by running docker history and inspecting layer tarballs with standard tooling.

The COPY . . pattern is the most common offender. A developer tells Docker to copy the repository root, the repository contains .env, secrets.yaml, or .aws/credentials, and the resulting image contains all of it. .dockerignore exists to prevent this but is routinely missing or incomplete.

The CI cache issue is newer and more subtle. Buildkit caches downloaded layers and metadata. In shared CI environments without per-job cache isolation, a secret exposed in one job can be cached and replayed into a later job's image. Organizations that run large multi-tenant build clusters should audit their cache configurations specifically for this path.

What does the actual blast radius of a leaked image look like?

The actual blast radius of a leaked image looks like full compromise of whatever the leaked credential touches, which is often larger than the developer who created the image expected. A single AWS access key with PowerUserAccess grants control of every service in the account. A single GitHub PAT with repo scope grants read and write across every repository the user can access, including private ones. A Slack webhook grants spoofed-message capability into the linked workspace.

The time-to-exploit for leaked credentials in public images is short. Multiple independent researchers have demonstrated that newly pushed public images containing secrets are scraped and tested within minutes by automated harvesters. Permiso's and GitGuardian's 2024 write-ups included honeypot data showing that fresh AWS keys began to see enumeration traffic within ten minutes of being posted.

Post-exploitation by criminal actors is almost entirely commodity at this point: LLMJacking for GenAI abuse, cryptomining for low-friction monetization, and S3 data exfiltration with extortion follow-up. Nation-state actors use the same credentials more quietly for reconnaissance and lateral movement.

Why hasn't Docker Hub's own scanning closed the problem?

Docker Hub's own scanning has not closed the problem because scanning catches only a fraction of exposure patterns, because image owners retain the authority to republish or retag, and because the scanning happens after upload rather than as a gate. Docker Scout and the underlying scanning pipeline are good but not perfect at secret detection. High-entropy strings that are not shaped like known credential formats routinely slip through, and format-aware detectors miss secrets in compressed or base64-encoded form.

Even when scanning flags a known secret, the response is to notify the image owner, not to block the pull. Users continuing to pull a flagged image do so at their own risk, and the flagged status does not propagate to downstream images that inherit from a leaky base.

A structural fix would require upload-time rejection of images containing high-confidence secrets, which would be operationally disruptive but effective. The policy debate has been ongoing for years, and the status quo is that secret scanning is a detective control, not a preventive one.

What engineering practices actually prevent secret leakage?

Engineering practices that actually prevent secret leakage are multi-stage builds that exclude the build context from the final image, BuildKit secret mounts, and runtime secret injection via orchestration. These are not new recommendations; they are the ones that hold up under audit when implemented correctly.

Multi-stage builds keep the build tooling, source tree, and intermediate artifacts in the builder stage. The final stage inherits only the compiled binary or runtime artifact. This structurally prevents COPY . . from leaking repository contents because the final image's copy source is the builder stage, not the host filesystem.

BuildKit secret mounts (--mount=type=secret) make a secret available to a specific RUN command without embedding it in any layer. The secret is sourced from the build host at invocation time and is not persisted to the image. This is the correct pattern when a build legitimately needs a private registry credential or a license key.

Runtime injection - Kubernetes Secrets, HashiCorp Vault, AWS Secrets Manager, cloud-provider workload identity - means the image never contains the credential. The orchestrator injects it at container start. This is the right model for production credentials and has the added benefit of supporting rotation without rebuilds.

Three additional practices close common holes:

  1. Use a .dockerignore by default in every repository. Maintain it like source code.
  2. Scan every image produced by CI before it reaches any registry, public or private, and fail the build on high-confidence matches.
  3. Rotate credentials automatically when scanning finds them. Notification without rotation is a slow leak.

What do incident responders do when a leaked image is confirmed?

Incident responders treat a confirmed leaked image as a disclosed credential event for every secret in that image, rotating immediately and investigating whether any of the secrets were used before rotation completed. The first-hour checklist:

  • Identify every tag of the image that has ever been pushed and the time window each was publicly available.
  • Extract every layer, run credential-aware scanning, and produce a ranked list of secrets found.
  • Rotate each secret at its issuing authority. For cloud keys, disable immediately and review access logs for the window.
  • Check registry pull logs to estimate exposure volume. Docker Hub surfaces pull counts per tag.
  • Review cloud and source-control logs for any API calls authenticated by the leaked credentials during the window.

After the initial rotation, the follow-up work is to identify how the secret entered the image and close that path. In our experience, the root cause is almost always the same four categories listed earlier, which means the fix is structural rather than training.

How Safeguard.sh Helps

Safeguard.sh reachability analysis correlates container images with their actual runtime reachability so the secrets-in-image problem is scoped to the workloads that matter, filtering out inert image scans and cutting CVE-plus-secret noise by 60 to 80 percent. Griffin AI autonomous remediation rotates exposed cloud credentials, regenerates build pipelines to use secret mounts, and rebuilds affected images with verified clean layers rather than leaving the cleanup on a ticket. Eagle malware classification flags secondary payloads - LLMJacking agents, cryptominers, data-exfil clients - that opportunistic attackers drop using harvested credentials. SBOM generation with 100-level dependency depth surfaces inherited base-image risk and secrets buried several layers down, container self-healing restores compromised workloads to known-clean images, and TPRM extends the same visibility to vendor images and third-party registries in your supply chain.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.