AI Security

Embedding Model Supply Chain Risks

Embedding models are the silent dependency under every RAG system. We cover poisoning, deprecation, and provenance gaps that break retrieval in production.

Shadab Khan
Security Engineer
5 min read

The embedding model is the most consequential dependency most teams never audit. A team will spend weeks evaluating GPT-5 vs Claude 4 for generation and pick text-embedding-3-large in a two-line config change. When OpenAI deprecated text-embedding-ada-002 on a 90-day notice in January 2025, we saw enterprises scramble because the embeddings in their Pinecone indexes were suddenly tied to a model that would stop serving. Re-embedding 200M vectors against the new model cost real money and downtime. At the same time, open-weight embedding models pulled from Hugging Face — BAAI/bge-large-en-v1.5, nomic-ai/nomic-embed-text-v1.5, mixedbread-ai/mxbai-embed-large-v1 — sit in production serving critical retrieval, often with no signature verification, no CVE monitoring, and no reachability tracking. This is a supply chain problem. It looks like a model problem because people keep treating embedding models as immutable math rather than versioned software.

Why is a backdoored embedding model a viable attack?

Because an attacker can train an embedder that places attacker-chosen queries and attacker-chosen documents close together in vector space, regardless of semantic content. Papers from NYU's CSAW track and from IBM Research in 2024 and 2025 demonstrated this on BGE-family models with minimal degradation on standard benchmarks. The attack surfaces as a retrieval hit: a user asks a benign question, the backdoored embedder pulls a planted document ranked first, the RAG prompt ingests the planted content. Classical model scanning won't catch this — the model weights pass format checks, there is no executable code — only behavioral eval against triggered inputs reveals the backdoor. This is why trust in the embedder's origin and lineage matters as much as trust in the generative model.

What provenance should you require for an embedder?

At minimum: a signed artifact with a verified publisher, a training-dataset attestation, and a published eval battery run you can reproduce. Hugging Face rolled out commit signing for model repositories in March 2025, and the OpenSSF Model Signing spec published in April standardized the attestation format. Use huggingface-cli verify or Sigstore's cosign to validate before load. For commercial embedders, the provenance chain is weaker — OpenAI and Cohere publish evals but don't provide attestation over weights customers never see. The mitigation there is behavioral: pin the model version, log cosine similarity distributions on a benchmark set daily, and alert when distributions drift more than a threshold. Silent model updates behind an API name are common; we've caught Cohere's embed-english-v3.0 shift distribution twice in 18 months without a version change.

How do you handle deprecation risk?

Treat every embedder as a component with a support horizon. Build your RAG pipeline so that re-embedding is a routine operation, not a crisis. The two architectural moves that help: store the embedder's model identifier and version in every vector's metadata, and design namespaces or collections so that parallel embeddings from two model versions can coexist during migration. A migration pattern we deploy: dual-write to a new namespace with the new embedder, shadow-query both, compare retrieval results on a held-out eval set until recall@10 crosses a quality threshold, cut over, retain the old namespace for rollback for 30 days. This replaces panic with a dial.

What about fine-tuned embedders trained on enterprise data?

Fine-tuned embedders inherit the provenance problems of their base model and add the provenance problems of their fine-tuning corpus. Techniques like sentence-transformers' contrastive fine-tuning and LoRA-over-embedder patterns are now standard at enterprises; they measurably improve retrieval on domain corpora. They also embed domain knowledge into the model weights, which means the fine-tuned embedder is a piece of proprietary IP and a potential exfiltration vector. Access to download the weights is access to reconstruct (via embedding inversion, see Vec2Text) a sample of the fine-tuning corpus. Most teams we audit publish these embedders to an internal registry with looser access than they'd grant to the source data. Fix that.

Does an AI-BOM for embedders actually help?

Yes, because the alternative is spreadsheets. An AI-BOM that lists each embedder (provider, version, model card hash, license, deprecation date if known), each vector namespace it produced, each application consuming that namespace, and the corpora it was trained or fine-tuned on answers questions like "if we switch embedders next quarter, what breaks?" in minutes rather than sprints. CycloneDX 1.6 has the primitives; the model card references, formulation blocks, and service relationships compose into a usable graph. The teams getting value from AI-BOMs are the ones treating them as operational data, refreshed on every release, not compliance theater for auditors.

What signals indicate an embedder problem in production?

Three are worth instrumenting. First, retrieval quality on a held-out eval corpus — recall@k and MRR, computed daily, alert on a 5%+ drift. Second, query-distribution shift detection — if the distribution of user queries changes, your retrieval scores will change too, but the two should be decoupleable. Third, anomalous high-similarity clusters — a sudden spike of queries matching the same small set of documents is a backdoor-trigger signature we've confirmed in a red-team exercise at a consumer-facing client. Arize, Phoenix, and Fiddler all have reasonable dashboards; we usually wire the custom signals into Datadog or Grafana against a small metrics pipeline.

How Safeguard Helps

Safeguard's AI-BOM captures embedding models as first-class artifacts with provenance, license, deprecation status, and downstream reachability into vector namespaces and applications. Griffin AI flags embedders loaded from unsigned sources or from Hugging Face repositories with known supply-chain issues, and the eval harness runs backdoor-detection and retrieval-quality probes on a schedule so drift or poisoning surfaces before it affects customers. Policy gates block model swaps that would deprecate a live index without a migration plan, and reachability analysis shows exactly which products consume which embeddings when you need to re-embed or rotate.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.