XML is not dead. SOAP services, SAML authentication, SVG images, Office documents, RSS feeds, Maven POMs, and Android manifests all use XML. Every application that processes XML is potentially vulnerable to a set of attacks that have been known for over a decade but continue to appear in production systems.
The root cause is that XML was designed as a powerful document format with features like external entity resolution, DTD processing, and XSLT transformation. These features are also exploitation primitives.
XML External Entity (XXE) Injection
XXE is the most well-known XML attack and consistently appears in the OWASP Top 10. It exploits XML's ability to define entities that reference external resources.
A malicious XML document can define an entity that reads a local file:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>
When the XML parser processes this document, it resolves the xxe entity by reading /etc/passwd and inserting its contents into the document. The attacker retrieves the file contents through the application's response.
XXE can also be used for:
Server-side request forgery (SSRF). Entities can reference HTTP URLs, causing the server to make requests to internal services: SYSTEM "http://internal-service:8080/admin".
Denial of service. Entities can reference resources that are slow to resolve or infinitely large, tying up the parser.
Remote code execution. In some configurations (PHP with expect:// wrapper, Java with certain classpath handlers), XXE can lead to code execution.
Out-of-band data exfiltration. When the application does not echo the parsed XML back, attackers can use parameter entities with external DTDs to exfiltrate data through DNS or HTTP callbacks.
Billion Laughs (Entity Expansion)
The Billion Laughs attack defines nested entities that expand exponentially:
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
]>
Each level multiplies the entity by 10. By the ninth level, the three-character string "lol" has expanded to over a billion copies, consuming gigabytes of memory. This crashes the parser and potentially the host.
Variations include the Quadratic Blowup attack, which uses a single large entity referenced many times, avoiding the nested definition pattern that some defenses look for.
XSLT Injection
When XML documents are transformed using XSLT stylesheets, and the stylesheet source is influenced by the attacker, XSLT injection can occur. XSLT is a Turing-complete language with access to the host system in many implementations.
In Java's XSLT processor, the document() function can read files. The system-property() function reveals system information. In some configurations, Java extension functions allow arbitrary method calls.
PHP's XSLT processor (libxslt-based) supports registerPHPFunctions, which, if enabled, allows arbitrary PHP function execution through XSLT.
XPath Injection
Applications that use XPath queries on XML data are vulnerable to injection if user input is concatenated into XPath expressions. Like SQL injection, XPath injection allows an attacker to modify the query logic.
//users/user[username='admin' and password='' or '1'='1']
This bypasses authentication by making the condition always true.
Language-Specific Parser Configuration
Java (most vulnerable by default):
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
Java's default XML parsers (DOM, SAX, StAX) have external entity resolution enabled by default. This has been the single largest source of XXE vulnerabilities.
Python:
from defusedxml import ElementTree
tree = ElementTree.parse(source)
Python's xml.etree.ElementTree is vulnerable to Billion Laughs but not XXE (since Python 3.7.1). The defusedxml library provides safe parsers for all attack types.
PHP:
libxml_disable_entity_loader(true);
PHP's simplexml_load_string and DOMDocument are vulnerable to XXE by default. libxml_disable_entity_loader(true) disables external entities.
.NET:
.NET's XmlDocument loads DTDs by default in older versions. XmlReaderSettings with DtdProcessing.Prohibit is the safe configuration.
Mitigation Checklist
- Disable DTD processing entirely if you do not need it.
- Disable external entity resolution in all XML parsers.
- Set entity expansion limits to prevent Billion Laughs.
- Use
defusedxmlin Python, safe parser configurations in Java. - Validate XML against a schema before processing.
- Parameterize XPath queries rather than concatenating user input.
- Restrict XSLT processing to trusted stylesheets only.
How Safeguard.sh Helps
Safeguard.sh monitors XML parsing libraries in your dependency tree for known vulnerabilities and unsafe default configurations. When an XML library CVE is published, Safeguard.sh identifies affected projects in your portfolio. For libraries known to have unsafe defaults (like Java's built-in XML parsers), Safeguard.sh flags the risk in your SBOM, helping your security team identify applications that may be vulnerable to XXE even without a specific CVE.