Application Security

Regular Expression Denial of Service (ReDoS): Detection and Prevention

A single bad regex can bring down your entire application. ReDoS attacks exploit catastrophic backtracking to consume unbounded CPU time.

Michael
DevSecOps Lead
4 min read

Regular expressions are one of the most useful tools in a programmer toolkit. They are also one of the most dangerous. A carefully crafted input string can cause a vulnerable regex to take minutes, hours, or effectively forever to evaluate. This is Regular Expression Denial of Service, or ReDoS.

ReDoS is not theoretical. Cloudflare experienced a global outage in July 2019 caused by a single regex in their WAF rules that triggered catastrophic backtracking. Stack Overflow has reported multiple incidents where regex performance caused service degradation. Any application that evaluates regular expressions against user-controlled input is potentially vulnerable.

How Catastrophic Backtracking Works

Most regex engines (including those in Python, Java, JavaScript, .NET, and Ruby) use a backtracking NFA (nondeterministic finite automaton) algorithm. When the engine encounters a choice point (like a* which can match zero or more characters), it tries one option and backtracks to try the other if the overall match fails.

The problem occurs when a regex has nested quantifiers or overlapping alternatives that create an exponential number of possible match paths. The classic example is the regex (a+)+$ applied to the input aaaaaaaaaaaaaaaaab.

The engine tries to match the a characters in every possible grouping: all in one group, split into two groups, split into three groups, and so on. For each grouping, it tries to match the $ (end of string) and fails because of the trailing b. The number of groupings to try is exponential in the length of the a sequence.

For an input of 25 a characters followed by a b, this regex takes several seconds. For 30 characters, it takes minutes. For 35, it takes hours. The input is tiny, but the computation is unbounded.

Common Vulnerable Patterns

Nested quantifiers: (a+)+, (a*)*, (a+)* -- Any pattern where a quantified group contains a quantified element.

Overlapping alternatives: (a|a)+ -- Alternatives that can match the same input.

Repeated overlapping patterns: (a+b?)+ where a+ can match the same substring that starts the next iteration.

Real-world examples:

  • Email validation: ^([a-zA-Z0-9])(([\-.]|[_]+)?([a-zA-Z0-9]+))*(@) -- The nested groups with optional separators create backtracking opportunities.
  • URL validation: Complex URL regexes frequently contain nested quantifiers for path segments.
  • HTML tag matching: <[^>]*(>|$) -- Actually safe, but variations like <(.|\n)*> are vulnerable.

Detection Methods

Static analysis tools. Several tools analyze regular expressions for ReDoS vulnerability:

  • recheck (JavaScript) -- Analyzes regex patterns and reports vulnerable ones with example inputs.
  • rxxr2 -- Academic tool that proves whether a regex is vulnerable through automata analysis.
  • Semgrep has rules that detect known ReDoS patterns.
  • ESLint plugin eslint-plugin-regexp includes rules for detecting potentially super-linear regex patterns.

Fuzzing. Generate random inputs of increasing length and measure regex evaluation time. If evaluation time grows faster than linearly with input length, the regex is likely vulnerable.

Timeout-based detection. Wrap regex evaluation in a timeout. If evaluation exceeds a threshold (say, 100ms), the regex is either vulnerable or unnecessarily complex.

Prevention Strategies

Use RE2 or similar linear-time engines. Google RE2 regex engine guarantees linear-time evaluation by using a DFA-based algorithm instead of backtracking. It does not support backreferences or lookahead/lookbehind, but most practical regex patterns do not need these features.

RE2 is available for Go (the default engine), Python (via the google-re2 package), C++, and Java. Switching to RE2 eliminates ReDoS entirely at the cost of some regex feature limitations.

Set evaluation timeouts. When using backtracking engines, wrap regex evaluation in a timeout. Java supports this with Thread.interrupt(), .NET with Regex.MatchTimeout, and other languages through external timeout mechanisms.

Limit input length. If you are evaluating a regex against user input, limit the input length to what your application actually needs. A username field does not need to accept 10,000-character strings.

Simplify your patterns. Use atomic groups (where supported) to prevent backtracking into groups that have already matched. Possessive quantifiers (a++ instead of a+) achieve the same effect.

Review all user-facing regex. Audit every regex that processes user-controlled input. Use static analysis tools to identify vulnerable patterns, and replace them with equivalent safe patterns or RE2.

Do not let users supply regex. If your application allows users to enter regex patterns (like search with regex support), you must either validate the pattern for safety, use a linear-time engine, or enforce strict timeouts.

How Safeguard.sh Helps

Safeguard.sh monitors your dependencies for libraries with known ReDoS vulnerabilities. Many npm packages, Python libraries, and Java frameworks have shipped vulnerable regex patterns that were later patched. Our platform tracks these CVEs and alerts you when your applications depend on vulnerable versions, helping you stay ahead of ReDoS risks in both your own code and your supply chain.

Never miss an update

Weekly insights on software supply chain security, delivered to your inbox.