Skip to content

Commit de15ecf

Browse files
authored
Merge pull request #2593 from asger-semmle/regexp-always-matches
JS: Add RegExpAlwaysMatches query
2 parents 8e70008 + 775e63d commit de15ecf

File tree

11 files changed

+303
-19
lines changed

11 files changed

+303
-19
lines changed

change-notes/1.24/analysis-javascript.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
| **Query** | **Tags** | **Purpose** |
2020
|---------------------------------------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2121
| Cross-site scripting through exception (`js/xss-through-exception`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities where an exception is written to the DOM. Results are not shown on LGTM by default. |
22+
| Regular expression always matches (`js/regex/always-matches`) | correctness, regular-expressions | Highlights regular expression checks that trivially succeed by matching an empty substring. Results are shown on LGTM by default. |
2223

2324
## Changes to existing queries
2425

javascript/config/suites/javascript/correctness-core

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
+ semmlecode-javascript-queries/RegExp/DuplicateCharacterInCharacterClass.ql: /Correctness/Regular Expressions
3535
+ semmlecode-javascript-queries/RegExp/EmptyCharacterClass.ql: /Correctness/Regular Expressions
3636
+ semmlecode-javascript-queries/RegExp/IdentityReplacement.ql: /Correctness/Regular Expressions
37+
+ semmlecode-javascript-queries/RegExp/RegExpAlwaysMatches.ql: /Correctness/Regular Expressions
3738
+ semmlecode-javascript-queries/RegExp/UnboundBackref.ql: /Correctness/Regular Expressions
3839
+ semmlecode-javascript-queries/RegExp/UnmatchableCaret.ql: /Correctness/Regular Expressions
3940
+ semmlecode-javascript-queries/RegExp/UnmatchableDollar.ql: /Correctness/Regular Expressions

javascript/ql/src/Performance/ReDoS.ql

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,7 @@ newtype TInputSymbol =
144144
CharClass(RegExpCharacterClass recc) {
145145
getRoot(recc).isRelevant() and
146146
not recc.isInverted() and
147-
not isUniversalClass(recc)
147+
not recc.isUniversalClass()
148148
} or
149149
/** An input symbol representing all characters matched by `.`. */
150150
Dot() or
@@ -153,23 +153,6 @@ newtype TInputSymbol =
153153
/** An epsilon transition in the automaton. */
154154
Epsilon()
155155

156-
/**
157-
* Holds if character class `cc` matches all characters.
158-
*/
159-
predicate isUniversalClass(RegExpCharacterClass cc) {
160-
// [^]
161-
cc.isInverted() and not exists(cc.getAChild())
162-
or
163-
// [\w\W] and similar
164-
not cc.isInverted() and
165-
exists(string cce1, string cce2 |
166-
cce1 = cc.getAChild().(RegExpCharacterClassEscape).getValue() and
167-
cce2 = cc.getAChild().(RegExpCharacterClassEscape).getValue()
168-
|
169-
cce1 != cce2 and cce1.toLowerCase() = cce2.toLowerCase()
170-
)
171-
}
172-
173156
/**
174157
* An abstract input symbol, representing a set of concrete characters.
175158
*/
@@ -361,7 +344,7 @@ predicate delta(State q1, EdgeLabel lbl, State q2) {
361344
)
362345
or
363346
exists(RegExpCharacterClass cc |
364-
isUniversalClass(cc) and q1 = before(cc) and lbl = Any() and q2 = after(cc)
347+
cc.isUniversalClass() and q1 = before(cc) and lbl = Any() and q2 = after(cc)
365348
or
366349
q1 = before(cc) and lbl = CharClass(cc) and q2 = after(cc)
367350
)
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
There are several built-in JavaScript functions that search for a regular expression match within a string,
9+
such as <code>RegExp.prototype.test</code> and <code>String.prototype.search</code>.
10+
If the regular expression is not anchored, it only needs to match a substring of the input
11+
and won't necessarily match the whole string.
12+
</p>
13+
14+
<p>
15+
If the regular expression being searched for accepts the empty string, this means it can match an empty
16+
substring anywhere in the input string, and will thus always find a match.
17+
In this case, testing if a match exists is redundant and indicates dead code.
18+
</p>
19+
20+
</overview>
21+
<recommendation>
22+
23+
<p>
24+
Examine the regular expression and determine how it was intended to match:
25+
</p>
26+
27+
<ul>
28+
<li>To match the whole input string, add anchors at the beginning and end of the regular expression.</li>
29+
<li>To search for an occurrence within the input string, consider what the shortest meaningful match is and restrict the
30+
regular expression accordingly, such as by changing a <code>*</code> to a <code>+</code>.</li>
31+
</ul>
32+
33+
</recommendation>
34+
<example>
35+
<p>
36+
In the following example, a regular expression is used to check the format of a string <code>id</code>.
37+
However, the check always passes because the regular expression can match the empty substring.
38+
For example, it will allow the ID string "<code>%%</code>" by matching an empty string at index 0.
39+
</p>
40+
41+
<sample src="examples/RegExpAlwaysMatches.js" />
42+
43+
<p>
44+
To ensure the regular expression matches the whole string, add anchors at the beginning and end:
45+
</p>
46+
47+
<sample src="examples/RegExpAlwaysMatchesGood.js" />
48+
49+
</example>
50+
<references>
51+
52+
<li>Mozilla Developer Network: <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions">JavaScript Regular Expressions</a>.</li>
53+
54+
</references>
55+
</qhelp>
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
/**
2+
* @name Regular expression always matches
3+
* @description Regular expression tests that always find a match indicate dead code or a logic error
4+
* @kind problem
5+
* @problem.severity warning
6+
* @id js/regex/always-matches
7+
* @tags correctness
8+
* regular-expressions
9+
* @precision high
10+
*/
11+
12+
import javascript
13+
14+
/**
15+
* Gets a node reachable from the given root term through alts and groups only.
16+
*
17+
* For example, for `/(foo|bar)/` this gets `(foo|bar)`, `foo|bar`, `foo` and `bar`.
18+
*/
19+
RegExpTerm getEffectiveRootAux(RegExpTerm actualRoot) {
20+
actualRoot.isRootTerm() and
21+
result = actualRoot
22+
or
23+
result = getEffectiveRootAux(actualRoot).(RegExpAlt).getAChild()
24+
or
25+
result = getEffectiveRootAux(actualRoot).(RegExpGroup).getAChild()
26+
}
27+
28+
/**
29+
* Gets the effective root of the given term.
30+
*
31+
* For example, for `/(foo|bar)/` this gets `foo` and `bar`.
32+
*/
33+
RegExpTerm getEffectiveRoot(RegExpTerm actualRoot) {
34+
result = getEffectiveRootAux(actualRoot) and
35+
not result instanceof RegExpAlt and
36+
not result instanceof RegExpGroup
37+
}
38+
39+
/**
40+
* Holds if `term` contains an anchor on both ends.
41+
*/
42+
predicate isPossiblyAnchoredOnBothEnds(RegExpSequence node) {
43+
node.getAChild*() instanceof RegExpCaret and
44+
node.getAChild*() instanceof RegExpDollar and
45+
node.getNumChild() >= 2
46+
}
47+
48+
/**
49+
* Holds if `term` is obviously intended to match any string.
50+
*/
51+
predicate isUniversalRegExp(RegExpTerm term) {
52+
exists(RegExpTerm child | child = term.(RegExpStar).getAChild() |
53+
child instanceof RegExpDot
54+
or
55+
child.(RegExpCharacterClass).isUniversalClass()
56+
)
57+
}
58+
59+
/**
60+
* A call that searches for a regexp match within a string, but does not
61+
* extract the capture groups or the matched string itself.
62+
*
63+
* Because of the longest-match rule, queries that are more than pure tests
64+
* aren't necessarily broken just because the regexp can accept the empty string.
65+
*/
66+
abstract class RegExpQuery extends DataFlow::CallNode {
67+
abstract RegExpTerm getRegExp();
68+
}
69+
70+
/**
71+
* A call to `RegExp.prototype.test`.
72+
*/
73+
class RegExpTestCall extends DataFlow::MethodCallNode, RegExpQuery {
74+
DataFlow::RegExpCreationNode regexp;
75+
76+
RegExpTestCall() {
77+
this = regexp.getAReference().getAMethodCall("test")
78+
}
79+
80+
override RegExpTerm getRegExp() {
81+
result = regexp.getRoot()
82+
}
83+
}
84+
85+
/**
86+
* A call to `String.prototype.search`.
87+
*/
88+
class RegExpSearchCall extends DataFlow::MethodCallNode, RegExpQuery {
89+
DataFlow::RegExpCreationNode regexp;
90+
91+
RegExpSearchCall() {
92+
getMethodName() = "search" and
93+
regexp.getAReference().flowsTo(getArgument(0))
94+
}
95+
96+
override RegExpTerm getRegExp() {
97+
result = regexp.getRoot()
98+
}
99+
}
100+
101+
/**
102+
* Holds if `t` is a zero-width assertion other than an anchor.
103+
*/
104+
predicate isAssertion(RegExpTerm t) {
105+
t instanceof RegExpSubPattern or
106+
t instanceof RegExpWordBoundary or
107+
t instanceof RegExpNonWordBoundary
108+
}
109+
110+
from RegExpTerm term, RegExpQuery call, string message
111+
where
112+
term.isNullable() and
113+
not isAssertion(term.getAChild*()) and
114+
not isUniversalRegExp(term) and
115+
term = getEffectiveRoot(call.getRegExp()) and
116+
(
117+
call instanceof RegExpTestCall and
118+
not isPossiblyAnchoredOnBothEnds(term) and
119+
message = "This regular expression always matches when used in a test $@, as it can match an empty substring."
120+
or
121+
call instanceof RegExpSearchCall and
122+
not term.getAChild*() instanceof RegExpDollar and
123+
message = "This regular expression always the matches at index 0 when used $@, as it matches the empty substring."
124+
)
125+
select term, message, call, "here"
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
if (!/[a-z0-9]*/.test(id)) {
2+
throw new Error("Invalid id: " + id);
3+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
if (!/^[a-z0-9]*$/.test(id)) {
2+
throw new Error("Invalid id: " + id);
3+
}

javascript/ql/src/semmle/javascript/Regexp.qll

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -764,6 +764,23 @@ class RegExpCharacterClass extends RegExpTerm, @regexp_char_class {
764764
override string getAMatchedString() {
765765
not isInverted() and result = getAChild().getAMatchedString()
766766
}
767+
768+
/**
769+
* Holds if this character class matches any character.
770+
*/
771+
predicate isUniversalClass() {
772+
// [^]
773+
isInverted() and not exists(getAChild())
774+
or
775+
// [\w\W] and similar
776+
not isInverted() and
777+
exists(string cce1, string cce2 |
778+
cce1 = getAChild().(RegExpCharacterClassEscape).getValue() and
779+
cce2 = getAChild().(RegExpCharacterClassEscape).getValue()
780+
|
781+
cce1 != cce2 and cce1.toLowerCase() = cce2.toLowerCase()
782+
)
783+
}
767784
}
768785

769786
/**
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
| tst.js:2:11:2:20 | ^(https:)? | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:2:10:2:29 | /^(https:)?/.test(x) | here |
2+
| tst.js:14:11:14:19 | (\\.com)?$ | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:14:10:14:28 | /(\\.com)?$/.test(x) | here |
3+
| tst.js:22:11:22:34 | ^(?:https?:\|ftp:\|file:)? | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:22:10:22:43 | /^(?:ht ... test(x) | here |
4+
| tst.js:30:11:30:20 | (foo\|bar)? | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:30:10:30:29 | /(foo\|bar)?/.test(x) | here |
5+
| tst.js:34:21:34:26 | (baz)? | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:34:10:34:35 | /^foo\|b ... test(x) | here |
6+
| tst.js:58:20:58:25 | [a-z]* | This regular expression always the matches at index 0 when used $@, as it matches the empty substring. | tst.js:58:10:58:27 | x.search(/[a-z]*/) | here |
7+
| tst.js:70:20:70:26 | ^(foo)? | This regular expression always the matches at index 0 when used $@, as it matches the empty substring. | tst.js:70:10:70:28 | x.search(/^(foo)?/) | here |
8+
| tst.js:86:22:86:21 | | This regular expression always matches when used in a test $@, as it can match an empty substring. | tst.js:86:10:86:31 | new Reg ... test(x) | here |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
RegExp/RegExpAlwaysMatches.ql

0 commit comments

Comments
 (0)