Architecture

Multi-Tenant Isolation for FedRAMP HIGH

How Safeguard achieves hard multi-tenant isolation in a platform that meets FedRAMP HIGH — the boundaries, the proofs, and the trade-offs we accepted.

Shadab Khan
Security Engineer
8 min read

Running a multi-tenant SaaS that can legitimately hold FedRAMP HIGH workloads next to commercial workloads is a specific engineering problem, not a marketing claim. It requires isolation at every layer where data moves, every layer where compute runs, and every layer where humans operate — with evidence an auditor can inspect. Safeguard was designed this way from the start, because retrofitting isolation is nearly impossible. This post walks through the architecture and the design choices we made to get there.

Why can't you just put every customer in a VPC and call it done?

You can, and some security vendors do, but it does not satisfy FedRAMP HIGH. Infrastructure-level isolation is necessary but not sufficient. The auditor wants to know how cross-tenant access is prevented at every point the software could in principle mix data — in memory, in caches, in queues, in shared services, in log aggregation, in backup systems, in the support tooling your engineers use, in the embeddings you feed to AI components. A single VPC boundary does not answer any of those questions on its own.

The second problem is operational cost. VPC-per-tenant for tens of thousands of tenants is fine in theory but the control plane overhead (CI/CD, monitoring, secrets rotation, auditing) becomes the dominant cost. You end up with two bad options: a small number of huge tenants, or a brittle control plane. Neither is what FedRAMP asks for — what it asks for is demonstrable isolation with documented boundaries and continuous validation.

What is the actual isolation model?

We isolate at four layers: region, enclave, tenant partition, and request context. Each layer has a distinct failure mode it is designed to prevent, and each layer is independently auditable.

 ┌─────────────────────────────────────────────────────────────┐
 │                    REGION (GovCloud vs Commercial)          │
 │  ┌───────────────────────────────────────────────────────┐  │
 │  │              ENCLAVE (FedRAMP HIGH vs Moderate)       │  │
 │  │  ┌─────────────────────────────────────────────────┐  │  │
 │  │  │       TENANT PARTITION (per customer)           │  │  │
 │  │  │  ┌───────────────────────────────────────────┐  │  │  │
 │  │  │  │    REQUEST CONTEXT (per API call)         │  │  │  │
 │  │  │  └───────────────────────────────────────────┘  │  │  │
 │  │  └─────────────────────────────────────────────────┘  │  │
 │  └───────────────────────────────────────────────────────┘  │
 └─────────────────────────────────────────────────────────────┘

Region is a hard physical boundary. FedRAMP-destined workloads run in AWS GovCloud or Azure Government; commercial workloads run in commercial regions. There is no shared infrastructure across the boundary. Backups, telemetry, monitoring, support tooling, and deployment pipelines are all region-scoped. This is the simplest layer conceptually and by far the most expensive operationally, because it means we run the whole control plane twice.

Enclave is a softer boundary that separates FedRAMP HIGH from Moderate workloads within the same region. Enclaves do not share compute nodes, do not share caches, do not share Kafka clusters, and do not share storage buckets. They do share management plane components like CI runners, but those are reviewed and audited under HIGH controls.

Tenant partition is where we make the multi-tenancy work economical. Within an enclave, customers share compute and storage infrastructure, but every piece of data is tagged with a tenant ID and access is enforced at every service boundary. This is the layer that takes the most engineering care, and we walk through it in detail below.

Request context is a per-request token that carries the caller's identity, tenant, and authorization scope. Every internal service validates the context on entry and emits audit events keyed on it. This is the smallest unit of isolation and the one that catches the cases the upper layers miss.

How does tenant partitioning actually work in shared infrastructure?

The rule is simple in principle: no service accepts a request without a tenant claim, no query touches data that does not match the claim, and every cache entry is keyed with the tenant as part of the cache key. The hard part is making the rule mechanical rather than a convention engineers remember.

We enforce the rule with three mechanisms. First, a shared tenant-aware data access library that every service uses. It wraps graph queries, SQL queries, object store access, and Kafka consumption. The library requires a tenant claim to construct; the claim propagates into every downstream operation. Services that try to do raw access without the library fail CI because our linters catch direct SDK calls.

Second, gateway-level enforcement. Every internal service sits behind a sidecar that validates the tenant claim on the incoming request and refuses to proxy calls that do not match. If a service receives a tenant-A request and then tries to originate a tenant-B call, the sidecar rejects it. This catches the case where application code has a bug but does not yet contain the right library use.

Third, data plane verification. We run a continuous sweep that issues cross-tenant test requests from inside the enclave and verifies they are rejected at every layer. Any failure pages an on-call immediately. The sweep runs hourly in every enclave and we have never had a real failure, but the feedback loop would catch one within an hour if it happened.

Here is the shape of the request context as it flows through services:

{
  "request_id": "rq_2026_02_13_918af",
  "tenant": {
    "id": "tnt_acme_gov",
    "enclave": "fedramp-high-us-east",
    "region": "us-gov-east-1"
  },
  "identity": {
    "subject": "usr_jsmith",
    "roles": ["triage.read", "finding.write"],
    "auth_method": "saml-piv"
  },
  "scope": {
    "artifacts": ["tnt_acme_gov:prod:*"],
    "operations": ["findings.list", "findings.annotate"]
  },
  "attested_at": "2026-02-13T12:04:51Z",
  "expires_at": "2026-02-13T12:14:51Z"
}

Note the short expiry — ten minutes is the default for FedRAMP HIGH contexts, with a re-mint required from the auth service. Short-lived tokens limit the blast radius if a context ever leaks.

What about shared AI components?

Shared AI is the trickiest surface because models can memorize and replay. Our approach has three elements. First, no cross-tenant fine-tuning. The base models we use are trained on public data, and we do not fine-tune them on customer data in a way that mixes tenants. When a tenant wants a custom model (for example, a specialized classifier), we train it in a tenant-scoped pipeline with dedicated compute.

Second, per-tenant retrieval context. For agents like Griffin, the retrieval layer only returns documents from the requesting tenant. The retrieval index is partitioned by tenant and the query is signed with the tenant context. The model sees only tenant-scoped context.

Third, prompt and output logging per tenant. Every model call is logged into the requesting tenant's audit stream with input and output. Nothing is logged into a shared analytics system. For FedRAMP HIGH enclaves, model inference runs on dedicated GPU nodes in that enclave, not on a shared inference pool.

How do you prove the boundaries hold?

Proof happens at three levels. The first is architectural review — the System Security Plan names every boundary and the controls that enforce it, and our ATO (Authority to Operate) process has already reviewed and accepted those claims. Change control ensures the plan stays accurate; any change that affects the boundary requires an SCA (significant change assessment).

The second is continuous monitoring. Every cross-boundary event is emitted as a structured audit record and we run CSPM-style detection on deviations. For example, a service call from enclave A to enclave B is structurally impossible in our networking setup, but we detect at the packet level if one is ever attempted. Alerts route to the on-call within seconds.

The third is independent testing. We contract an external penetration test annually (and for FedRAMP HIGH, additional ongoing testing) that specifically targets cross-tenant and cross-enclave boundaries. The results go to the AO (authorizing official) and any finding becomes a mandatory remediation with a tracked POA&M (plan of actions and milestones).

What trade-offs did we accept?

Two important ones. First, some product features are not available in the FedRAMP HIGH enclave on the same day they ship commercially. The enclave runs a lagged release train because every deploy has to go through a tighter change-control process. The lag is typically two to four weeks. Customers in the enclave know this and accept it.

Second, some cross-tenant learning signals are off the table. In the commercial enclave we can aggregate anonymized patterns across tenants to improve rule recommendations ("customers like you typically set this policy"). We do not do this in FedRAMP HIGH. Every tenant's environment is sealed. The trade-off is slightly less leveraged intelligence, but no customer in that enclave would accept the alternative.

How Safeguard.sh Helps

Safeguard was designed from the start to run a single platform across commercial and federal workloads without compromising the isolation that federal workloads require. By building explicit region, enclave, tenant, and request boundaries — and by enforcing them mechanically rather than relying on engineer discipline — we can offer the same product experience to a commercial startup and a HIGH-baseline federal customer while keeping their data cleanly separated. If your organization has a mix of regulated and unregulated workloads and you want one vendor that handles both correctly, the Safeguard architecture is what makes that possible.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.