Best Practices

Break-Glass Workflow Design: Audited Bypass That Works

Every policy needs a bypass path or it will be routed around. The trick is making the bypass auditable, time-bound, and rare enough to remain meaningful.

Nayan Dey
Senior Security Engineer
7 min read

Every policy fails at some moment. Production is on fire and the only fix carries an unsigned image. A vendor releases a critical patch that pulls in a transitive dependency the policy bans. A regulator demands a feature ship by a date that is incompatible with the standard rollout. The on-call engineer needs to act, and the policy is in the way. What happens next determines whether the policy survives the year.

Two failure modes are common. In the first, there is no documented bypass. The on-call engineer escalates to whoever has admin rights, an out-of-band exception is granted, no record is kept, and the next time the situation arises, the precedent has been set: emergencies are handled by ignoring policy. The bypass becomes the routine, and the policy becomes a thing to wait out.

In the second, the bypass is a configuration toggle that disables the entire policy. Somebody flips it during the incident, the cluster admits everything for the duration, and somebody — often nobody — flips it back later. The blast radius of a single bypass extends far beyond the original problem.

Neither outcome is necessary. A well-designed break-glass workflow treats bypass as a first-class operation: scoped, audited, time-bound, and reviewable.

What break-glass should be

A working break-glass operation has six properties.

Scoped. The bypass applies to a specific artifact, repository, or namespace, not to the whole policy. Letting an unsigned image through for one workload should not let unsigned images through cluster-wide.

Justified. The requester provides a written reason: incident ticket, vendor reference, regulatory mandate. A bypass without a stated reason is not granted.

Approved. A second human approves before the bypass takes effect. The approver is named, not a shared account. For the most sensitive policies, two approvers are required, neither from the requester's team.

Time-bound. The bypass has an expiry — hours for incident response, days for vendor coordination, never indefinite. When the timer ends, the bypass disappears and the policy resumes.

Recorded. Every aspect of the bypass — requester, approver, scope, justification, expiry, and the events that occurred under it — is captured in an audit log that survives the bypass itself.

Rare. The system tracks bypass frequency and surfaces patterns. A policy that is bypassed weekly is a policy that needs to be revised, not a policy that needs more bypasses.

The flow

The user-facing flow is intentionally short. A developer or operator who hits a policy block sees the failure message, which names the policy, the rule that fired, and the link to request a break-glass bypass. The link opens a request form pre-populated with the artifact, the rule, and a freeform justification field. The requester picks an expiry from a constrained list — one hour, four hours, twenty-four hours, seven days — and submits.

The request reaches a designated approver group, usually a rotation of senior engineers and security staff. The approver sees the request, the policy that was hit, the requester's recent bypass history, and the team's recent bypass history. They approve, deny, or ask for more context. Approval is recorded and the bypass is activated for the named scope.

Once active, the bypass converts the policy result for the named scope from block to warn for the duration. Every evaluation under the bypass produces a marked audit event. When the timer expires, the policy automatically returns to its prior mode. No human action is required to revert; reversion is the default.

What break-glass should not be

It should not be a permanent waiver. A waiver is a different artifact: a documented decision that a particular dependency or workload is exempt from a particular rule, with its own approval flow and review cadence. Break-glass is for the moments a waiver has not yet been granted and cannot wait. Conflating the two leads to bypasses that never expire.

It should not be available to a single person acting alone. Self-approval is a category error. The whole point of break-glass is that a second pair of eyes is involved before policy is suspended. For genuine break-glass-without-an-approver scenarios — the kind that show up in disaster recovery scripts — the access should be a separate, alarm-generating credential whose use pages the entire security team.

It should not be silent. A bypass that nobody knows happened is a bypass that will be re-used for less defensible reasons next time. Slack notifications, dashboard banners, and weekly digest emails to leadership are all reasonable mechanisms for keeping the practice visible.

It should not be cheap. The friction has to be real. If filling out the form takes ten seconds, the bypass becomes a developer-experience feature instead of an exception channel. The form should be specific enough that the requester actually has to think about why they need it.

Patterns that work

A few patterns recur in well-designed break-glass workflows.

Tiered approval. Low-impact policies — a stylistic dependency rule in a non-production repository — can be self-approved by team leads. High-impact policies — admission rules in regulated namespaces — require security team approval. The tier is a property of the policy, not the requester.

Pre-authorized scopes. Some teams have legitimate ongoing needs to bypass certain rules during specific operations — a release engineering team that maintains a vendor-shipped tool that fails one rule. Pre-authorizing the scope with a longer expiry and standing approval, while still requiring the request to be filed and recorded, captures the reality without inventing a permanent waiver.

Bypass budgets. Teams that exceed a configurable bypass count per quarter are flagged for review. The review is not punitive; it is a structured conversation about whether the policy or the team's practices should change.

Auto-escalation on high-severity policies. Bypasses against the most sensitive rules — signature verification on production images, for instance — automatically notify the CISO and require post-incident review regardless of operational outcome.

Measuring whether break-glass is healthy

A few metrics indicate whether the workflow is functioning.

The ratio of break-glass requests to policy blocks should be small and stable. A growing ratio means the policy is too aggressive, the remediation guidance is too weak, or the alternative channels are too painful.

The approval-to-denial ratio should not be 100% approval. If every request is approved, the approver is rubber-stamping. If every request is denied, the channel is theater.

The time from request to decision should be short — under fifteen minutes for incident-flagged requests, under a business day for the rest. Long decision times push people back into the unaudited bypass paths the workflow was designed to replace.

The post-bypass review rate — how many bypasses are reviewed within a week of expiry — should be high. Reviewing only when something goes wrong means the system never learns from the routine cases.

How Safeguard Helps

Safeguard implements break-glass as a structured operation against the same policy engine that drives all four enforcement gates.

At PR time, a developer who hits a blocking policy sees the failure message with a one-click link to file a bypass request. The request captures the artifact, the rule, the justification, and the requested expiry, and routes to the designated approver group.

At build time, an active bypass against the request's scope converts the build's policy result from block to warn for the duration, with each evaluation marked in the audit log.

At admission time, the Kubernetes webhook honors active bypasses for the named workload and namespace, refusing workloads that fall outside the bypass scope and admitting those within. Bypass scope cannot exceed the requester's authorized namespaces.

At runtime, drift detection continues to monitor under bypass, so a workload admitted via break-glass is more closely watched, not less.

Every bypass — requester, approver, scope, justification, expiry, and the evaluations that occurred under it — is captured in Safeguard's audit log and surfaces in a per-team and per-policy bypass report. The result is that policy survives its inevitable collisions with reality, the bypass path stays narrow enough to remain meaningful, and the security team gets the data it needs to revise policies that collide too often.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.