Resources

Supply Chain Security, in plain English.

Deep dives, practical guides, and incident analyses from engineers who build Safeguard. No fluff, no vendor FUD — just what you need to ship secure software.

Filtering by tag:#evals24 articles

Filter

All (24)AI Security (294)DevSecOps (153)Open Source Security (132)Best Practices (126)Vulnerability Analysis (98)Incident Analysis (83)Industry Analysis (80)Application Security (73)Compliance (68)Container Security (64)Software Supply Chain Security (51)Vulnerability Management (47)Regulatory Compliance (42)Threat Intelligence (41)Supply Chain Attacks (36)Product (35)Cloud Security (35)SBOM (34)Supply Chain Security (25)Ransomware (21)Infrastructure Security (20)SBOM & Compliance (19)Industry Guides (19)Compliance & Regulations (18)Emerging Technology (17)Case Studies (17)Risk Management (16)Tool Reviews (16)Incident Response (15)Security Strategy (13)Dependency Security (11)Web Security (11)Kubernetes Security (9)Company (8)Architecture (8)Industry Trends (7)Secure Development (7)AppSec (7)How-To Guide (7)Zero-Day Exploits (7)Network Security (7)Dependency Management (7)Data Breach (7)Research (6)Tutorials (6)Security Operations (6)Organizational Security (6)Developer Security (6)Open Source (5)Breach Analysis (5)Code Security (5)Product Launch (4)Offensive Security (4)Tool Comparisons (4)Build Security (3)Vulnerability Research (3)Compliance & Frameworks (3)Regional Security (3)Policy & Compliance (3)SBOM Standards (3)Software Supply Chain (3)Analysis (3)Startup Security (3)Mobile Security (3)Hardware Security (3)Security (2)Zero-Day Analysis (2)Industry News (2)Release (2)SBOM and Compliance (2)Security Management (2)Threat Actors (2)API Security (2)Security Architecture (2)Security Culture (2)Social Engineering (2)DeFi Security (2)Cryptocurrency Security (2)Technical (1)Healthcare (1)Events (1)Frameworks (1)Product Update (1)Standards (1)Engineering (1)Language Security (1)Emerging Threats (1)Privacy (1)Lifecycle Management (1)Career Development (1)Tools & Platforms (1)Threat Modeling (1)Browser Security (1)Threat Analysis (1)Business Continuity (1)Runtime Security (1)Governance (1)Healthcare Security (1)Credential Attacks (1)Identity Security (1)PKI Security (1)Architecture Security (1)Nation-State Threats (1)Tools & Techniques (1)Privacy & Security (1)

Articles

RSS feed

AI Security

LLM-As-Judge Pitfalls In Security Evals

Using an LLM to score another LLM's output is expedient and dangerous. The judge has its own biases — ones that affect security evaluations specifically.

Feb 26, 20262 min read

AI Security

The Eval Culture Shift in AI Security

Two years ago, AI vendors shipped without evals. In 2026, the posture has shifted. Customers expect benchmarks. Vendors without them lose deals.

Feb 23, 20262 min read

AI Security

Golden Dataset Design: Griffin AI vs Mythos

Benchmark scores are only as honest as the dataset behind them. Griffin AI publishes golden-dataset design notes; Mythos-class tools rarely explain theirs.

Feb 20, 20267 min read

AI Security

Leakage Testing Methods For Security Benchmarks

A benchmark that the model has seen in training is a benchmark of memorisation. Specific leakage-testing methods separate generalisation from recall.

Feb 18, 20262 min read

AI Security

Regression Gates: Griffin AI vs Mythos

Every release risks making the model worse. Griffin AI's regression gates block bad builds before they ship. Mythos-class tools rarely describe a gate process at all.

Feb 12, 20267 min read

AI Security

Benchmark Contamination Concerns In Security Evals

When the test set is in the training set, the benchmark is broken. Security eval contamination is widespread and the mitigations are specific.

Feb 10, 20262 min read

AI Security

Refusal Rate Analysis: Griffin AI vs Mythos

A security AI that refuses too often is useless. One that refuses too rarely is dangerous. Griffin AI publishes calibrated refusal benchmarks; Mythos does not.

Feb 4, 20267 min read

AI Security

SEvenLLM Design And Coverage

SEvenLLM set out to measure how well LLMs handle Security Event analysis, the unglamorous day-to-day work of SOCs and IR teams. A design review of what the benchmark covers, how it was built, and where the coverage maps or does not map to real operations.

Feb 2, 20266 min read

AI Security

Citation Accuracy: Griffin AI vs Mythos

An AI security tool that cites the wrong advisory is worse than one that says nothing. Griffin AI benchmarks citation accuracy at 0.89 similarity; Mythos does not.

Jan 28, 20267 min read

Page 2 of 3

Stay informed

Weekly insights on software supply chain security, delivered to your inbox.

Blog | Safeguard.sh — Software Supply Chain Security Insights

Supply Chain Security, in plain English.

Articles

LLM-As-Judge Pitfalls In Security Evals

The Eval Culture Shift in AI Security

Golden Dataset Design: Griffin AI vs Mythos

Leakage Testing Methods For Security Benchmarks

Regression Gates: Griffin AI vs Mythos

Benchmark Contamination Concerns In Security Evals

Refusal Rate Analysis: Griffin AI vs Mythos

SEvenLLM Design And Coverage

Citation Accuracy: Griffin AI vs Mythos

Stay informed

Product

Solutions

Compare

Resources

Company

Legal

Developers