Skip to content

Commit 5ac5aa0

Browse files
committed
Merge remote-tracking branch 'upstream/master' into mergeback-20181217
2 parents 01f5875 + 7adf1d9 commit 5ac5aa0

File tree

12 files changed

+262
-41
lines changed

12 files changed

+262
-41
lines changed

change-notes/1.20/analysis-javascript.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
| **Query** | **Tags** | **Purpose** |
1616
|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1717
| Double escaping or unescaping (`js/double-escaping`) | correctness, security, external/cwe/cwe-116 | Highlights potential double escaping or unescaping of special characters, indicating a possible violation of [CWE-116](https://cwe.mitre.org/data/definitions/116.html). Results are shown on LGTM by default. |
18+
| Incomplete regular expression for hostnames (`js/incomplete-hostname-regexp`) | correctness, security, external/cwe/cwe-020 | Highlights hostname sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default.|
1819
| Incomplete URL substring sanitization | correctness, security, external/cwe/cwe-020 | Highlights URL sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results shown on LGTM by default. |
1920
| Incorrect suffix check (`js/incorrect-suffix-check`) | correctness, security, external/cwe/cwe-020 | Highlights error-prone suffix checks based on `indexOf`, indicating a potential violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
2021
| Useless comparison test (`js/useless-comparison-test`) | correctness | Highlights code that is unreachable due to a numeric comparison that is always true or always false. Results are shown on LGTM by default. |

cpp/ql/src/semmle/code/cpp/security/CommandExecution.qll

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -159,17 +159,6 @@ predicate shellCommandPreface(string cmd, string flag) {
159159
)
160160
}
161161

162-
/**
163-
* An array element. This supports multiple kinds of array syntax.
164-
*/
165-
private predicate arrayElement(Expr arrayLit, int idx, Expr element) {
166-
exists (ArrayLiteral lit | lit = arrayLit |
167-
lit.getElement(idx) = element)
168-
or exists (MessageExpr arrayWithObjects | arrayWithObjects = arrayLit |
169-
arrayWithObjects.getStaticTarget().getQualifiedName().matches("NSArray%::+arrayWithObjects:") and
170-
arrayWithObjects.getArgument(idx) = element)
171-
}
172-
173162
/**
174163
* A command that is used as a command, or component of a command,
175164
* that will be executed by a general-purpose command interpreter

javascript/config/suites/javascript/security

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
+ semmlecode-javascript-queries/DOM/TargetBlank.ql: /Security/CWE/CWE-200
22
+ semmlecode-javascript-queries/Electron/EnablingNodeIntegration.ql: /Security/CWE/CWE-094
3+
+ semmlecode-javascript-queries/Security/CWE-020/IncompleteHostnameRegExp.ql: /Security/CWE/CWE-020
34
+ semmlecode-javascript-queries/Security/CWE-020/IncompleteUrlSubstringSanitization.ql: /Security/CWE/CWE-020
45
+ semmlecode-javascript-queries/Security/CWE-020/IncorrectSuffixCheck.ql: /Security/CWE/CWE-020
56
+ semmlecode-javascript-queries/Security/CWE-022/TaintedPath.ql: /Security/CWE/CWE-022
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
9+
Sanitizing untrusted URLs is an important technique for
10+
preventing attacks such as request forgeries and malicious
11+
redirections. Often, this is done by checking that the host of a URL
12+
is in a set of allowed hosts.
13+
14+
</p>
15+
16+
<p>
17+
18+
If a regular expression implements such a check, it is
19+
easy to accidentally make the check too permissive by not escaping the
20+
<code>.</code> meta-characters appropriately.
21+
22+
Even if the check is not used in a security-critical
23+
context, the incomplete check may still cause undesirable behaviors
24+
when it accidentally succeeds.
25+
26+
</p>
27+
</overview>
28+
29+
<recommendation>
30+
<p>
31+
32+
Escape all meta-characters appropriately when constructing
33+
regular expressions for security checks, pay special attention to the
34+
<code>.</code> meta-character.
35+
36+
</p>
37+
</recommendation>
38+
39+
<example>
40+
41+
<p>
42+
43+
The following example code checks that a URL redirection
44+
will reach the <code>example.com</code> domain, or one of its
45+
subdomains.
46+
47+
</p>
48+
49+
<sample src="examples/IncompleteHostnameRegExp.js"/>
50+
51+
<p>
52+
53+
The check is however easy to bypass because the unescaped
54+
<code>.</code> allows for any character before
55+
<code>example.com</code>, effectively allowing the redirect to go to
56+
an attacker-controlled domain such as <code>wwwXexample.com</code>.
57+
58+
</p>
59+
<p>
60+
61+
Address this vulnerability by escaping <code>.</code>
62+
appropriately: <code>let regex = /(www|beta|)\.example\.com/</code>.
63+
64+
</p>
65+
66+
</example>
67+
68+
<references>
69+
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
70+
<li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
71+
</references>
72+
</qhelp>
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
/**
2+
* @name Incomplete regular expression for hostnames
3+
* @description Matching a URL or hostname against a regular expression that contains an unescaped dot as part of the hostname might match more hostnames than expected.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @precision high
7+
* @id js/incomplete-hostname-regexp
8+
* @tags correctness
9+
* security
10+
* external/cwe/cwe-20
11+
*/
12+
13+
import javascript
14+
15+
/**
16+
* A taint tracking configuration for incomplete hostname regular expressions sources.
17+
*/
18+
class Configuration extends TaintTracking::Configuration {
19+
Configuration() { this = "IncompleteHostnameRegExpTracking" }
20+
21+
override
22+
predicate isSource(DataFlow::Node source) {
23+
isIncompleteHostNameRegExpPattern(source.asExpr().getStringValue(), _)
24+
}
25+
26+
override
27+
predicate isSink(DataFlow::Node sink) {
28+
isInterpretedAsRegExp(sink)
29+
}
30+
31+
}
32+
33+
34+
/**
35+
* Holds if `pattern` is a regular expression pattern for URLs with a host matched by `hostPart`,
36+
* and `pattern` contains a subtle mistake that allows it to match unexpected hosts.
37+
*/
38+
bindingset[pattern]
39+
predicate isIncompleteHostNameRegExpPattern(string pattern, string hostPart) {
40+
hostPart = pattern.regexpCapture(
41+
"(?i).*" +
42+
// an unescaped single `.`
43+
"(?<!\\\\)[.]" +
44+
// immediately followed by a sequence of subdomains, perhaps with some regex characters mixed in, followed by a known TLD
45+
"([():|?a-z0-9-]+(\\\\)?[.](" + RegExpPatterns::commonTLD() + "))" +
46+
".*", 1)
47+
}
48+
49+
from Expr e, string pattern, string hostPart
50+
where
51+
(
52+
e.(RegExpLiteral).getValue() = pattern or
53+
exists (Configuration cfg |
54+
cfg.hasFlow(e.flow(), _) and
55+
e.mayHaveStringValue(pattern)
56+
)
57+
) and
58+
isIncompleteHostNameRegExpPattern(pattern, hostPart)
59+
and
60+
// ignore patterns with capture groups after the TLD
61+
not pattern.regexpMatch("(?i).*[.](" + RegExpPatterns::commonTLD() + ").*[(][?]:.*[)].*")
62+
63+
64+
select e, "This regular expression has an unescaped '.' before '" + hostPart + "', so it might match more hosts than expected."

javascript/ql/src/Security/CWE-020/IncompleteUrlSubstringSanitization.ql

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ where
2121
substring.mayHaveStringValue(target) and
2222
(
2323
// target contains a domain on a common TLD, and perhaps some other URL components
24-
target.regexpMatch("(?i)([a-z]*:?//)?\\.?([a-z0-9-]+\\.)+(com|org|edu|gov|uk|net)(:[0-9]+)?/?") or
24+
target.regexpMatch("(?i)([a-z]*:?//)?\\.?([a-z0-9-]+\\.)+(" + RegExpPatterns::commonTLD() + ")(:[0-9]+)?/?") or
2525
// target is a HTTP URL to a domain on any TLD
2626
target.regexpMatch("(?i)https?://([a-z0-9-]+\\.)+([a-z]+)(:[0-9]+)?/?")
2727
) and
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param('url'),
3+
host = urlLib.parse(url).host;
4+
// BAD: the host of `url` may be controlled by an attacker
5+
let regex = /(www|beta|).example.com/;
6+
if (host.match(regex)) {
7+
res.redirect(url);
8+
}
9+
});

javascript/ql/src/semmle/javascript/Regexp.qll

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
/**
2-
* Provides classes for working with regular expression literals.
2+
* Provides classes for working with regular expressions.
33
*
4-
* Regular expressions are represented as an abstract syntax tree of regular expression
4+
* Regular expression literals are represented as an abstract syntax tree of regular expression
55
* terms.
66
*/
77

88
import javascript
9+
private import semmle.javascript.dataflow.InferredTypes
910

1011
/**
1112
* An element containing a regular expression term, that is, either
@@ -484,3 +485,41 @@ class RegExpParseError extends Error, @regexp_parse_error {
484485
result = getMessage()
485486
}
486487
}
488+
489+
/**
490+
* Holds if `source` may be interpreted as a regular expression.
491+
*/
492+
predicate isInterpretedAsRegExp(DataFlow::Node source) {
493+
// The first argument to an invocation of `RegExp` (with or without `new`).
494+
source = DataFlow::globalVarRef("RegExp").getAnInvocation().getArgument(0)
495+
or
496+
// The argument of a call that coerces the argument to a regular expression.
497+
exists(MethodCallExpr mce, string methodName |
498+
mce.getReceiver().analyze().getAType() = TTString() and
499+
mce.getMethodName() = methodName
500+
|
501+
(methodName = "match" and source.asExpr() = mce.getArgument(0) and mce.getNumArgument() = 1)
502+
or
503+
(
504+
methodName = "search" and
505+
source.asExpr() = mce.getArgument(0) and
506+
mce.getNumArgument() = 1 and
507+
// "search" is a common method name, and so we exclude chained accesses
508+
// because `String.prototype.search` returns a number
509+
not exists(PropAccess p | p.getBase() = mce)
510+
)
511+
)
512+
}
513+
514+
/**
515+
* Provides regular expression patterns.
516+
*/
517+
module RegExpPatterns {
518+
/**
519+
* Gets a pattern that matches common top-level domain names.
520+
*/
521+
string commonTLD() {
522+
// according to ranking by http://google.com/search?q=site:.<<TLD>>
523+
result = "com|org|edu|gov|uk|net|io"
524+
}
525+
}

javascript/ql/src/semmle/javascript/security/dataflow/RegExpInjection.qll

Lines changed: 4 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
*/
55

66
import javascript
7-
private import semmle.javascript.dataflow.InferredTypes
87

98
module RegExpInjection {
109
/**
@@ -51,36 +50,14 @@ module RegExpInjection {
5150
}
5251

5352
/**
54-
* The first argument to an invocation of `RegExp` (with or without `new`).
53+
* The source string of a regular expression.
5554
*/
56-
class RegExpObjectCreationSink extends Sink, DataFlow::ValueNode {
57-
RegExpObjectCreationSink() {
58-
this = DataFlow::globalVarRef("RegExp").getAnInvocation().getArgument(0)
55+
class RegularExpressionSourceAsSink extends Sink {
56+
RegularExpressionSourceAsSink() {
57+
isInterpretedAsRegExp(this)
5958
}
6059
}
6160

62-
/**
63-
* The argument of a call that coerces the argument to a regular expression.
64-
*/
65-
class RegExpObjectCoercionSink extends Sink {
66-
67-
RegExpObjectCoercionSink() {
68-
exists (MethodCallExpr mce, string methodName |
69-
mce.getReceiver().analyze().getAType() = TTString() and
70-
mce.getMethodName() = methodName |
71-
(methodName = "match" and this.asExpr() = mce.getArgument(0) and mce.getNumArgument() = 1) or
72-
(
73-
methodName = "search" and
74-
this.asExpr() = mce.getArgument(0) and
75-
mce.getNumArgument() = 1 and
76-
// `String.prototype.search` returns a number, so exclude chained accesses
77-
not exists(PropAccess p | p.getBase() = mce)
78-
)
79-
)
80-
}
81-
82-
}
83-
8461
/**
8562
* A call to a function whose name suggests that it escapes regular
8663
* expression meta-characters.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
| tst-IncompleteHostnameRegExp.js:3:2:3:28 | /http:\\ ... le.com/ | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
2+
| tst-IncompleteHostnameRegExp.js:5:2:5:28 | /http:\\ ... le.net/ | This regular expression has an unescaped '.' before 'example.net', so it might match more hosts than expected. |
3+
| tst-IncompleteHostnameRegExp.js:6:2:6:42 | /http:\\ ... b).com/ | This regular expression has an unescaped '.' before '(example-a\|example-b).com', so it might match more hosts than expected. |
4+
| tst-IncompleteHostnameRegExp.js:11:13:11:37 | "http:/ ... le.com" | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
5+
| tst-IncompleteHostnameRegExp.js:12:10:12:34 | "http:/ ... le.com" | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
6+
| tst-IncompleteHostnameRegExp.js:15:22:15:46 | "http:/ ... le.com" | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
7+
| tst-IncompleteHostnameRegExp.js:17:13:17:31 | `test.example.com$` | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
8+
| tst-IncompleteHostnameRegExp.js:17:14:17:30 | test.example.com$ | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
9+
| tst-IncompleteHostnameRegExp.js:19:17:19:34 | 'test.example.com' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
10+
| tst-IncompleteHostnameRegExp.js:22:27:22:44 | 'test.example.com' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
11+
| tst-IncompleteHostnameRegExp.js:28:22:28:39 | 'test.example.com' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
12+
| tst-IncompleteHostnameRegExp.js:37:2:37:54 | /^(http ... =$\|\\/)/ | This regular expression has an unescaped '.' before ')?example.com', so it might match more hosts than expected. |
13+
| tst-IncompleteHostnameRegExp.js:38:2:38:44 | /^(http ... p\\/f\\// | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
14+
| tst-IncompleteHostnameRegExp.js:39:2:39:34 | /\\(http ... m\\/\\)/g | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
15+
| tst-IncompleteHostnameRegExp.js:40:2:40:29 | /https? ... le.com/ | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
16+
| tst-IncompleteHostnameRegExp.js:41:13:41:68 | '^http: ... e\\.com' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
17+
| tst-IncompleteHostnameRegExp.js:41:41:41:68 | '^https ... e\\.com' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
18+
| tst-IncompleteHostnameRegExp.js:42:13:42:61 | 'http[s ... \\/(.+)' | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
19+
| tst-IncompleteHostnameRegExp.js:43:2:43:33 | /^https ... e.com$/ | This regular expression has an unescaped '.' before 'example.com', so it might match more hosts than expected. |
20+
| tst-IncompleteHostnameRegExp.js:44:9:44:100 | 'protos ... ernal)' | This regular expression has an unescaped '.' before 'example-b.com', so it might match more hosts than expected. |
21+
| tst-IncompleteHostnameRegExp.js:46:2:46:26 | /exampl ... le.com/ | This regular expression has an unescaped '.' before 'dev\|example.com', so it might match more hosts than expected. |

0 commit comments

Comments
 (0)