Fine-Tune Backdoor Insertion: Academic Research
A senior engineer's review of academic research on fine-tune backdoor insertion, from BadNets to sleeper agents, and how the findings translate to production ML.
Deep dives, practical guides, and incident analyses from engineers who build Safeguard. No fluff, no vendor FUD — just what you need to ship secure software.
A senior engineer's review of academic research on fine-tune backdoor insertion, from BadNets to sleeper agents, and how the findings translate to production ML.
The difference between an engine-plus-LLM bug hunter and a pure-LLM one is not a tuning detail. It is a structural divide that determines whether the findings are usable.
A threat model for sandbox escapes in Model Context Protocol servers, mapping attack surfaces from tool execution environments to host processes and shared state.
An attacker who can swap the model behind an API call can read every prompt and shape every response. The emerging trend in 2026 is model substitution as an attack class with its own techniques and disclosures.
Some tool calls cannot be undone. Out-of-band confirmation is the cheapest defense for that small set, and the most expensive thing to skip.
Pure-LLM vulnerability scanners hit production around 2024. By 2026 their failure modes are documented. Reachability remains the backbone — and the LLM is most useful on top of it.
A senior engineer's guide to training data poisoning defenses in 2026, from split-learning detection to provenance attestation and continuous pipeline monitoring.
A one-hour cycle from vulnerability finding to merged fix is achievable in 2026, but only with a pipeline designed for it. Here is what that pipeline looks like.
A hijacked tool call is more consequential than a hijacked response. The defence requires the tool layer to police the model, not the other way around.
Weekly insights on software supply chain security, delivered to your inbox.