The three constitutions cover what we build and how we behave. This page covers how we write and ship the software underneath all of it. The principles below are how senior engineers at Safeguard hold themselves and each other accountable — in design review, in code review, on call, and in post-mortems.
Each principle is short on its own. The elaboration is for the engineer who is trying to apply it on a Tuesday afternoon.
Test-first is the default. When the shape of the work makes test-first impractical — exploratory spikes, performance investigations — we still require a passing test before the change is considered done. "It works on my machine and I will add the test later" is not a state we ship from.
Every build, every model release, every audit log is reproducible from the recorded recipe. If a binary cannot be regenerated from source plus a recorded environment, the binary is not allowed in production. The same rule applies to model weights and to the eval results that accompany them.
Rotations are kept short enough that nobody burns out and wide enough that everybody has skin in the game. Compensation for on-call is part of the package, not a favour. If on-call is unsustainable, the system is unsustainable — and the fix is the system, not the rotation length.
When an incident happens, we write a post-mortem that names components, timelines, and decisions — not people. The whole team reads it. The action items have owners and dates. We track them to closure with the same discipline we track production bugs.
Small, reversible changes go to production often. Long-lived branches are a smell — they accumulate risk that has to be resolved in one merge event. If a branch is more than a few days old and not behind a feature flag, somebody is making a tradeoff we should examine.
We do not read it casually, log it incidentally, train models on it, or move it across tenant boundaries without explicit consent. The platform is designed so that even an authorised engineer cannot accidentally cross those lines. The few paths that do touch customer code are audited per access.
Flags are for ramp, soak, and rollback — not for postponing the question of what the right behaviour is. A flag that has been on at 100% for three months should become a default. A flag that has been at 0% for three months should be deleted. Flag debt is technical debt.
p95 latency is a feature with a target written into the design document. If the implementation comes in over budget, we go back to the design — not ship and chase. Budgets exist for memory, p95 latency, model inference cost, and binary size, depending on the surface.
Public APIs — REST, gRPC, the trace format, the SDK shapes, the model output schema — are versioned. We do not break them without a deprecation window, a migration guide, and an opt-in ramp. Internal APIs follow the same discipline whenever they cross a team boundary.
Reviews from peers are mandatory. Reviews from leads are owed within a working day for non-trivial changes. A review is a teaching surface — "this works" is not a review; "this works, here is the edge case I would test, here is the simpler alternative" is.
Sensitive paths — anything touching auth, tenant boundaries, model serving, customer data — pass through security review before merge. Catching a security issue in production is failure mode, not the design. The security-review label is non-negotiable on those paths.
A feature without docs is half-done. Internal runbook, customer-facing changelog, API reference if applicable. Docs are reviewed alongside the code in the same pull request. A merged feature that has no merged docs is a regression waiting for the next on-call shift.
The list of practices that are not allowed under this engineering bar — even when the temptation to make an exception is high.
Silent skipping of tests. If a test is being skipped, the skip is annotated with a reason and a ticket.
One-off bypasses of policy gates that do not get logged. Every bypass is a logged event with a justification.
Untested hotfixes to production. Even at 2am during an incident, the hotfix has a test before it merges.
Force-pushing to shared branches. The branch history is the audit trail.
Copy-pasting customer code into prompts, chat, or any external surface. Customer code stays where customer code lives.
Deploying without a roll-back plan. If the deployment cannot be rolled back, the deployment cannot ship.
One-week rotation with a day-and-night handoff at a fixed time. Long enough to build context, short enough that a hard week does not become a hard month.
Severity definitions and response SLAs are documented and linked from the public status page. A P1 has a written response window; a P3 has a written triage window. No verbal SLAs.
Blameless, structured: timeline, decisions, contributing factors, action items with owners. The template is short on purpose — the goal is reading it, not writing it.
Public status updates at a fixed cadence during a P1 incident — even when the cadence is "still investigating, next update at X." Silence is not allowed during a customer-impacting event.
Non-trivial changes need at least two approvals. Trivial changes — typo fixes, dependency bumps with automated checks — can ship on a single approval with the appropriate label.
An approval comes with a sentence — what was checked, what was deferred, what the reviewer is on the hook for. "LGTM" alone is a not-yet, not a yes.
Sensitive paths carry a security-review label. The label cannot be removed by the author; it is removed by the security reviewer once the review passes.
Changes that cross an ownership boundary require the owning team's sign-off. The CODEOWNERS file is the source of truth and is reviewed every quarter.
Five steps from day one to first on-call shift. Each step has a written exit criterion, signed off by the engineer's lead.
New engineer shadows the on-call rotation — pages, dashboards, runbooks. Reads incidents from the last quarter. No production responsibility yet; observation only.
A small, real change that goes from local clone to production behind a flag. The point is the path, not the size — every step of the release pipeline gets exercised.
First small feature in the engineer's area. Design doc, implementation, tests, docs, deployment — the whole loop, owned end to end. Reviewed by their lead.
A feature that crosses a system or team boundary. The engineer is now the person who answers questions about that area in design review. Mentorship still active.
By month six, the engineer is on the rotation as a primary. Not before — running production for paying customers is not an entry-level responsibility.
Security, AI, and Human Values. The three documents that govern how Safeguard builds, ships, and behaves.
Who leads, how they lead, the leadership operating manual, and the decision-making rules of the road.
Why we built this, the ten-year arc, success and failure criteria, and the leading indicators.
The platform's own security posture — tenant isolation, signed weights, key management, the bug bounty programme.