SBOM & Compliance

Witness Attestation Collection Workflow

Witness turns build steps into a chain of signed attestations. Here is how we use it in production pipelines, what it does well, and where the edges still cut.

Shadab Khan
Security Engineer
7 min read

If you have read the in-toto specification and wondered how you are actually supposed to emit attestations across a multi-step build, Witness is the most mature answer. Witness, maintained by the TestifySec team and donated to the in-toto project, is a CLI that wraps build commands, collects attestations about what happened during the command, signs them, and optionally pushes them to an Archivista backend for later retrieval. It is the closest thing the community has to a general-purpose attestation collector.

We have been running Witness in production pipelines since mid-2023, and this post is the workflow we arrived at after enough rewrites. We will cover the command structure, the attestor set that makes sense in practice, the policy format, and the Archivista integration. We will also cover the parts that hurt: the learning curve, the quiet assumptions about Git state, and the places where the tool's error messages do not point to the actual problem.

What does the witness run command actually do?

The central command is witness run. It takes a subcommand to execute (the actual build step), a set of attestors to run alongside, and a signing configuration. The command structure looks roughly like witness run --step build --attestors git,environment,command-run --signer-file-key-path key.pem -- make build. The double-dash separates Witness flags from the wrapped command.

Under the hood, Witness forks the subcommand, captures stdout/stderr, records the process exit code, and collects attestations from each enabled attestor. At the end, it produces a DSSE envelope containing an in-toto Statement with a https://witness.dev/attestations/collection/v0.1 predicate type. The predicate body contains an array of sub-attestations, one per attestor. The whole thing is signed and either written to a local file or pushed to Archivista.

The Witness v0.4 release in late 2023 changed the attestation collection predicate to support multiple named steps in a single envelope, which is what you want for CI pipelines that do not want to emit a new file for every step. v0.6 added the Kubernetes attestor and the SLSA provenance attestor, which shifted Witness from "generic collector" to "first-class SLSA emitter."

Which attestors are actually worth running?

Witness ships with a long list of attestors, and running all of them is both slow and noisy. In practice, the set we recommend is: git, environment, material, command-run, product, and one of either slsa or oci.

The git attestor records the current commit SHA, the branch, the remote URL, and the clean/dirty state of the working tree. This is the attestor most often misconfigured, because it fails silently when the working directory is not a Git repository. If you wrap a command that runs in a subdirectory outside the repo (for example a test that changes directory to /tmp), the Git attestor will return empty results rather than the top-level repo state. The fix is to pass --workingdir explicitly.

The environment attestor records environment variables, which is valuable for reproducibility but dangerous for secret exposure. Witness redacts common secret names (AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN, etc.) by default, but the redaction list is not exhaustive. We keep a project-specific deny list and pass it via --attestor-env-filter-sensitive-key.

The material and product attestors record file hashes before and after the command runs. Material is the set of input files, product is the set of outputs. Together they form the link in an in-toto link attestation, which is what a Witness policy later verifies. These attestors are the most CPU-expensive because they hash every file in the working directory; exclude generated directories aggressively via --attestor-material-exclude-pattern.

The slsa attestor, added in v0.6, synthesizes a SLSA Provenance v1 predicate from the other attestor data. This is the attestor that turns a Witness run into something directly consumable by slsa-verifier. Turn it on whenever the pipeline is intended to produce SLSA-compliant provenance.

How does the policy language work?

Witness policies are the consumer-side half of the story. A policy is a YAML document that declares a set of expected steps, the attestors required for each step, the public keys or Sigstore identities authorized to sign each step, and optional Rego-based functionaries. witness verify takes a policy, a set of attestations, and an artifact, and checks that the attestations cover the policy's requirements.

The policy concept to internalise is that Witness verifies a chain of steps, not a single attestation. A typical policy might declare source, build, and publish steps, each with its own required attestors and signer keys. The verifier walks the chain, confirms that each step's product hashes match the next step's materials, and only passes if the artifact being verified matches the final step's product.

This is powerful, and also where teams get stuck. The chain breaks if any step fails to produce the expected product hash, which is common when the build involves non-deterministic outputs (timestamps in archives, build IDs in binaries). The fix is to normalise outputs, typically by setting SOURCE_DATE_EPOCH and using reproducible build flags for the toolchain. Without reproducibility, Witness policy verification becomes flaky in a way that erodes trust in the whole system.

Where does Archivista fit?

Archivista is the companion storage service. It accepts DSSE envelopes, indexes them by subject digest and predicate type, and exposes a GraphQL API for retrieval. Running Archivista means you can push attestations from CI without writing them to the artifact registry, and policy verifiers can pull the relevant attestations by artifact hash at verification time.

The operational reality is that Archivista is a lot of infrastructure for small teams. It needs PostgreSQL, a blob store (S3 or compatible), and persistent network reachability from both CI and deployment-time verifiers. If your team has fewer than twenty pipelines, attaching attestations directly to OCI artifacts via cosign and skipping Archivista is almost certainly the right choice. Archivista pays off when you have cross-pipeline verification needs or when attestations need to live beyond the retention policy of the artifact registry.

What are the rough edges?

Three edges have cost us meaningful time. First, Witness's error messages during policy verification rarely point to the actual failing check. A policy that requires the command-run attestor will report "missing attestation" when the attestor ran but produced no signed output because the signing key was misconfigured. Turn on --log-level debug and grep for skipping to find the real cause.

Second, the Sigstore keyless signing flow in Witness uses cosign's underlying libraries, which means it depends on the same OIDC token availability that cosign does. In GitHub Actions this works out of the box; in self-hosted CI it requires explicit OIDC provider configuration. The failure mode is a certificate request that times out, and the Witness logs report it as a signing error rather than an OIDC error.

Third, Witness's handling of very large material sets is memory-bound. A build with 100k+ input files will OOM the default configuration. Increase the memory limit or exclude non-essential directories.

How Safeguard Helps

Safeguard consumes Witness attestation collections directly and unpacks each sub-attestation into its own verifiable unit so your team can write policy against specific attestors rather than the whole collection. We validate the Witness envelope signature, the in-toto statement, and the per-step material/product chain without requiring you to run Archivista. When a chain breaks because a step's product hash did not match the next step's material, Safeguard surfaces the exact mismatch and the commit that introduced it. Customers who already run Archivista can point Safeguard at their GraphQL endpoint and have attestations pulled automatically on each artifact ingestion.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.