AI Security

AI-Generated Dockerfile Vulnerability Patterns

LLM-generated Dockerfiles repeat the same six or seven mistakes. Here is the pattern catalog and how to catch them before they ship.

AI coding assistants write a lot of Dockerfiles. They write them fast, they write them confidently, and they write them with an almost uncanny repertoire of recurring security mistakes. After reviewing a few hundred AI-generated Dockerfiles across different tools in 2025, the same patterns keep showing up. This post catalogs them and describes the detection rules that actually catch them in CI.

Why do AI-generated Dockerfiles trend insecure?

Because the training corpus is public Dockerfiles, and public Dockerfiles are overwhelmingly examples and tutorials rather than production-hardened images. The typical Dockerfile in a GitHub README is optimized for "works on a developer's laptop" and has no incentive to be minimal, non-root, or pinned. An LLM trained on that corpus produces plausible-looking Dockerfiles with the same property. The model is not wrong about what a "normal" Dockerfile looks like; it is correctly reproducing an insecure norm.

This means the fixes are not about the model being smarter. They are about either giving the model better defaults (stronger system prompts, curated example sets) or catching the issues in the review pipeline. Most teams should assume the latter and build for it.

What does the pattern catalog actually look like?

Six patterns account for the vast majority of issues. First, FROM without a digest pin. The model writes FROM python:3.12 or, slightly better, FROM python:3.12-slim, but almost never FROM python:3.12-slim@sha256:.... This leaves the image floating on a mutable tag, and tag poisoning has been observed in public registries (less common than typosquatting, but not zero). Digest pinning is trivial and almost never happens in AI-generated Dockerfiles.

Second, installing packages without version constraints. RUN apt-get install -y curl wget gives you whatever is current on the mirror at build time. Reproducibility goes out the window, and if one of those packages ships a compromised version between your CI runs, you silently pull it. The production pattern is apt-get install -y curl=X.Y.Z-N wget=X.Y.Z-N, and AI assistants almost always skip it.

Third, running as root. The default for most base images is root, and unless the Dockerfile explicitly creates and switches to a non-root user, the container runs as root. AI-generated Dockerfiles frequently skip the USER instruction entirely. This turns any arbitrary code execution inside the container into at least a root-in-container exploit, which in poorly-configured runtimes becomes a breakout.

Fourth, leaking build secrets. A Dockerfile that runs RUN git clone https://user:token@github.com/... bakes the token into an image layer. Even if a later layer removes the file, the layer with the token is still in the image. AI assistants cheerfully write this pattern when given a task like "pull our internal repo during build," and reviewers often miss it because it does not look wrong at first glance.

Fifth, broad COPY . . when the repo contains secrets or build artifacts. A .env, a .aws, or a node_modules directory with cached credentials ends up in the final image. The right mitigation is a strong .dockerignore, which the AI assistant does not write by default and which the reviewer rarely checks.

Sixth, curl-pipe-bash for installations. RUN curl https://install.example.com/install.sh | bash is a pattern that shows up in a striking number of AI-generated Dockerfiles, often copied from vendor documentation. It breaks reproducibility and makes your build a moving target for whoever controls install.example.com. If that domain gets compromised, or if you hit the wrong mirror, your next build includes whatever they serve. This is an identical risk class to the PyTorch nightly compromise, just upstream of the package manager.

How should we catch these in CI?

Static Dockerfile analysis is the right layer. Hadolint, Trivy's Dockerfile checks, and Conftest with an OPA policy set will catch most of the catalog above. The rules to enable: require digest pinning on FROM, require pinned package versions on apt, apk, yum, and dnf, require a USER instruction that is not root, flag curl | bash patterns, flag build-time secrets in RUN commands, and enforce .dockerignore presence.

These rules are not new. The Dockerfile security community has been publishing them for years. What is new is that the rate of generated Dockerfiles arriving in PRs has gone up sharply, which means the rules need to run on every PR rather than being a one-time audit. The CI time cost is negligible; the policy noise, if the rules are well tuned, is also negligible.

What about runtime verification?

Scan the built image, not just the Dockerfile. A Dockerfile review catches build-time mistakes; an image scan catches whatever actually ended up in the layers. The combination is necessary. An image scanner will find the CVEs in the base image and the installed packages; the Dockerfile review will find the structural problems that cause those CVEs to arrive in the first place.

The other runtime control worth mentioning: restrict capabilities. securityContext in Kubernetes, --cap-drop=ALL --cap-add=... in plain Docker. AI-generated manifests for Kubernetes mirror the AI-generated Dockerfile problem and frequently skip capability dropping. A non-root user in a container that still has NET_ADMIN and SYS_PTRACE is less useful than it sounds.

How should we handle multi-stage builds?

Multi-stage is mostly a win for supply chain security because it lets you build with a full toolchain and ship with a minimal runtime. AI-generated Dockerfiles increasingly use multi-stage, but they often carry over unnecessary artifacts in the COPY --from=builder step, defeating part of the purpose. The rule here is explicit: the final stage should copy only what it needs to run, from specific paths, not the entire /app directory of the builder stage.

Multi-stage also opens a subtle class of bugs where the builder stage has a dependency that the final stage implicitly needs but does not install. This is not a security issue per se, but it makes the final image hard to reason about, and hard-to-reason-about images are where security regressions hide.

What about the distroless path?

Distroless or scratch final stages are the gold standard for minimizing attack surface, and AI assistants have gotten better at suggesting them when prompted. They rarely suggest them unprompted. If your team produces a lot of services with similar shapes, a template that defaults to distroless (via a base image or a tool like ko for Go services) removes most of the Dockerfile generation surface and leaves the AI to fill in a small, constrained set of instructions. This is the highest-leverage intervention for teams shipping many services.

What about policy as code for the container stack?

Policy as code is how you stop relitigating these issues in every PR review. A set of OPA or Kyverno policies that encode the rules above (digest-pinned base images, non-root user, no curl-pipe-bash, no secrets in RUN steps) runs once in CI and applies to every Dockerfile in the org. When an AI assistant writes a Dockerfile that violates policy, the PR fails before a human has to notice the same mistake for the hundredth time.

The subtle benefit is that policy as code also lets you upgrade your rules centrally. When a new class of issue is discovered (say, a new supply chain attack pattern on a specific base-image family), you write one rule and every downstream service gets the check. Without central policy, you are asking every service owner to remember and apply the new rule, which in practice means they do not. The Ultralytics PyPI incident showed how quickly a single bad package can land in dozens of Docker images; having central policy means you can add a deny-rule for a specific known-bad version and trust it to apply everywhere.

How Safeguard.sh Helps

Safeguard.sh applies reachability analysis to the software in each container image, cutting 60 to 80 percent of the CVE noise that makes Dockerfile-level security reviews feel futile. Griffin AI flags the specific patterns listed above (unpinned FROM, root user, curl-pipe-bash, leaked build secrets) as part of PR review and ties the findings to the affected SBOM components. SBOM generation covers every layer of the built image, with 100-level dependency depth catching transitive compromises even when they arrive through a supposedly-minimal base image. Container self-healing rebuilds images when base-image or dependency fixes land, so production does not run last quarter's vulnerable image while the engineers debate whether the finding is reachable.

ai-security docker containers llm supply-chain

Back to all articles

More on #ai-security

View all →

AI Security

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.

AI-Generated Dockerfile Vulnerability Patterns

Why do AI-generated Dockerfiles trend insecure?

What does the pattern catalog actually look like?

How should we catch these in CI?

What about runtime verification?

How should we handle multi-stage builds?

What about the distroless path?

What about policy as code for the container stack?

How Safeguard.sh Helps

More on #ai-security

API Surface Reviewed: Griffin AI vs Mythos

Real-World Deployment: Griffin AI vs Mythos

Scaling Across Repos: Griffin AI vs Mythos

Tool-Call Hijacking: Griffin AI vs Mythos

Related articles in AI Security

Building an Eval Suite for Your Security LLM Workflows

Zero-Day Discovery With LLM-Augmented Reachability: A Safeguard Engine Walkthrough

Frontier LLM Vendors Are Not Your Supply Chain Security Vendor

Never miss an update

Product

Solutions

Compare

Resources

Company

Legal

Developers