Detecting and Preventing ReDoS Vulnerabilities

JRoller post from 1. March 2011 (the original post is available on the internet archive):

Regular expressions are omnipresent in today’s applications, they are used for input validation and parsing in web applications, web frameworks (in the browser and on the server side) and especially in security related applications, tools and libraries.

Today’s regular expression engines are pretty well tuned for performance. Even complex expressions are usually executed extremely fast. However, there are some combinations of expressions and input that slow down execution drastically and this can be abused by attackers for very effective (D)DoS attacks also called ReDoS (Regular Expression Denial of Service) attacks.

The Problem

ReDoS vulnerabilities were first described in a talk by Crosby & Wallach at USENIX 2003 (unfortunately, the paper isn’t online any more) and received more attention recently as various researchers discovered those vulnerabilities in numerous programming libraries and security related applications. The problem is not really an implementation issue of the regular expression engine but an inherent property of regular expression evaluation in general. Under the hood, each regular expression is parsed and translated into a nondeterministic finite automata (NFA) which then evaluates the input. The problem is that under certain circumstances, this automata requires a large amount of cycles to do its job.

The following expression for example requires quadratic time with respect to the input length to evaluate the input below:

a*[ab]*O

aaaaaaaaaaaaaaaaaaaaaaaX

The NFA first scans the input using the a* expression until it reaches X. Then, it tries all combinations of a* and [ab]* until it finally gives up. This evaluation strategy is know as backtracking. Adding another overlapping repetition requires cubic time to evaluate the input:

a*[ab]*[ac]* O

aaaaaaaaaaaaaaaaaaaaaaaX

Some expressions even require exponential time:

(a|aa)*O
(a+)+

Here is a real world example taken from SpamAssassin:

[a-z]+@[a-z]+([a-z\.]+\.)+[a-z]+

We have basically two options to fix the problem:

  1. Fixing the engine such that it at least interrupts execution when it finds out that it has no chance to finish evaluation in reasonable time.
  2. Inventing a tool that determines vulnerable expressions such that the developer can fix it.

Fixing the Engine

Ken Thompson fixed the problem by introducing a NFA that doesn’t use backtracking. Instead, it evaluates all possible branches in parallel threads. Another solution is to convert the NFA into a deterministic finite automata (DFA) which is much simpler and does not show this behaviour. However, the resulting DFA a well as Thompson’s NFA would not be able to handle backreferences, a feature that allows the regular expression to incorporate already scanned input in further evaluation. The second problem is that for complex regular expressions, the DFA would contain an enormous amount of states. This problem can be mitigated by creating those states on the fly whenever they are really needed. The first problem seems to be harder: At least for now there seems to be no way to evaluate expressions with backreferences in an efficient way, so all we can do is to limit the execution time to a reasonable amount.

Unfortunately, limiting execution time would be pretty hard for the engine built into current Java VMs so I have created a proof of concept by fixing the old Jakarta Regexp which at least in theory is pretty straight forward. All that is necessary is to limit the recursion depth to a reasonable amount and limit the execution time per sub evaluation:

    /**
     * Try to match a string against a subset of nodes in the program
     *
     * @param firstNode Node to start at in program
     * @param lastNode  Last valid node (used for matching a subexpression without
     *                  matching the rest of the program as well).
     * @param idxStart  Starting position in character array
     * @return Final input array index if match succeeded.  -1 if not.
     */
    protected int matchNodes(int firstNode, int lastNode, int idxStart)
    {
        if(recursionDepth == 0) {
            startTime = System.currentTimeMillis();
        }
        
        ++recursionDepth;
                
        if(recursionDepth > 1000) {
            throw new IllegalStateException("Recursion depth exeeded!");
        }
        
        if(System.currentTimeMillis() - startTime > Math.max(1000, 
                search.bufferedLength() * 10)) {
            throw new IllegalStateException("Calculation timed out!");
        }
        
        try {            
            return matchNodesRecursively(firstNode, lastNode, idxStart);
        } finally {
            --recursionDepth;
        }
    }

Limiting the recursion depth is simple but it took several experiments to determine a reasonable value that is tolerant enough for regular evaluations on the one hand and limits the required stack space on the other. Limiting the execution time is more tricky since the evaluation time depends also on what the VM does in the background (other threads running in parallel and garbage collection). So I choose a general threshold of one second and a linear extension depending on the search buffer length. This might require some more fine tuning but it seems to work pretty well this way. The engine throws an IllegalStateException when the maximum recursion depth or execution time is exceeded to indicate that the evaluation could not be done in a reasonable amount of time.

Conclusion

As I have shown, it is possible to fix regular expression engines at least to make sure that evaluation is limited to a reasonable amount of time. I think that existing engines should limit execution to prevent ReDoS attacks, since ReDoS vulnerabilities are pretty easy to detect and exploit.

In this followup article, I describe techniques to identify vulnerable regular expressions.

References

Wikipedia: ReDoS.

OWASP: Regular expression Denial of Service – ReDoS.

Alex Roichman, Adar Weidman: Regular expression Denial of Service Revisited, 2009.

Russ Cox: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, …), 2007.

One thought on “Detecting and Preventing ReDoS Vulnerabilities

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s