|
1 | 1 | <!DOCTYPE qhelp PUBLIC |
2 | | - "-//Semmle//qhelp//EN" |
3 | | - "qhelp.dtd"> |
| 2 | +"-//Semmle//qhelp//EN" |
| 3 | +"qhelp.dtd"> |
| 4 | + |
4 | 5 | <qhelp> |
5 | 6 |
|
6 | | -<overview> |
7 | | -<p> |
8 | | -Some regular expressions take a very long time to match certain input strings to the point where |
9 | | -the time it takes to match a string of length <i>n</i> is proportional to <i>2<sup>n</sup></i>. |
10 | | -Such regular expressions can negatively affect performance, or even allow a malicious user to |
11 | | -perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular |
12 | | -expression to match. |
13 | | -</p> |
14 | | -<p> |
15 | | -The regular expression engines provided by many popular JavaScript platforms use backtracking |
16 | | -non-deterministic finite automata to implement regular expression matching. While this approach |
17 | | -is space-efficient and allows supporting advanced features like capture groups, it is not |
18 | | -time-efficient in general. The worst-case time complexity of such an automaton can be exponential, |
19 | | -meaning that for strings of a certain shape, increasing the input length by ten characters may |
20 | | -make the automaton about 1000 times slower. |
21 | | -</p> |
22 | | -<p> |
23 | | -Typically, a regular expression is affected by this problem if it contains a repetition of the |
24 | | -form <code>r*</code> or <code>r+</code> where the sub-expression <code>r</code> is ambiguous in |
25 | | -the sense that it can match some string in multiple ways. More information about the precise |
26 | | -circumstances can be found in the references. |
27 | | -</p> |
28 | | -</overview> |
| 7 | + <include src="ReDoSIntroduction.qhelp" /> |
29 | 8 |
|
30 | | -<recommendation> |
31 | | -<p> |
32 | | -Modify the regular expression to remove the ambiguity. |
33 | | -</p> |
34 | | -</recommendation> |
| 9 | + <example> |
| 10 | + <p> |
| 11 | + Consider this regular expression: |
| 12 | + </p> |
| 13 | + <sample language="javascript"> |
| 14 | + /^_(__|.)+_$/ |
| 15 | + </sample> |
| 16 | + <p> |
| 17 | + Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the |
| 18 | + first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two |
| 19 | + repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting |
| 20 | + of an odd number of underscores followed by some other character will cause the regular |
| 21 | + expression engine to run for an exponential amount of time before rejecting the input. |
| 22 | + </p> |
| 23 | + <p> |
| 24 | + This problem can be avoided by rewriting the regular expression to remove the ambiguity between |
| 25 | + the two branches of the alternative inside the repetition: |
| 26 | + </p> |
| 27 | + <sample language="javascript"> |
| 28 | + /^_(__|[^_])+_$/ |
| 29 | + </sample> |
| 30 | + </example> |
35 | 31 |
|
36 | | -<example> |
37 | | -<p> |
38 | | -Consider this regular expression: |
39 | | -</p> |
40 | | -<sample language="javascript"> |
41 | | -/^_(__|.)+_$/ |
42 | | -</sample> |
43 | | -<p> |
44 | | -Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the |
45 | | -first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two |
46 | | -repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting |
47 | | -of an odd number of underscores followed by some other character will cause the regular |
48 | | -expression engine to run for an exponential amount of time before rejecting the input. |
49 | | -</p> |
50 | | -<p> |
51 | | -This problem can be avoided by rewriting the regular expression to remove the ambiguity between |
52 | | -the two branches of the alternative inside the repetition: |
53 | | -</p> |
54 | | -<sample language="javascript"> |
55 | | -/^_(__|[^_])+_$/ |
56 | | -</sample> |
57 | | -</example> |
| 32 | + <include src="ReDoSReferences.qhelp"/> |
58 | 33 |
|
59 | | -<references> |
60 | | -<li> |
61 | | -OWASP: |
62 | | -<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>. |
63 | | -</li> |
64 | | -<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li> |
65 | | -<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li> |
66 | | -<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke: |
67 | | -<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>. |
68 | | -</li> |
69 | | -</references> |
70 | 34 | </qhelp> |
0 commit comments