Skip to content

Commit 9e73ed7

Browse files
authored
Merge pull request #623 from esben-semmle/js/incomplete-url-sanitization
Approved by mc-semmle
2 parents c2116f0 + 4f53411 commit 9e73ed7

10 files changed

+244
-0
lines changed

change-notes/1.20/analysis-javascript.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
| **Query** | **Tags** | **Purpose** |
1212
|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
1313
| Double escaping or unescaping (`js/double-escaping`) | correctness, security, external/cwe/cwe-116 | Highlights potential double escaping or unescaping of special characters, indicating a possible violation of [CWE-116](https://cwe.mitre.org/data/definitions/116.html). Results are shown on LGTM by default. |
14+
| Incomplete URL substring sanitization | correctness, security, external/cwe/cwe-020 | Highlights URL sanitizers that are likely to be incomplete, indicating a violation of [CWE-020](https://cwe.mitre.org/data/definitions/20.html). Results shown on LGTM by default. |
1415
| Incorrect suffix check (`js/incorrect-suffix-check`) | correctness, security, external/cwe/cwe-020 | Highlights error-prone suffix checks based on `indexOf`, indicating a potential violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
1516
| Useless comparison test (`js/useless-comparison-test`) | correctness | Highlights code that is unreachable due to a numeric comparison that is always true or always false. Results are shown on LGTM by default. |
1617

javascript/config/suites/javascript/security

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
+ semmlecode-javascript-queries/DOM/TargetBlank.ql: /Security/CWE/CWE-200
22
+ semmlecode-javascript-queries/Electron/EnablingNodeIntegration.ql: /Security/CWE/CWE-094
3+
+ semmlecode-javascript-queries/Security/CWE-020/IncompleteUrlSubstringSanitization.ql: /Security/CWE/CWE-020
34
+ semmlecode-javascript-queries/Security/CWE-020/IncorrectSuffixCheck.ql: /Security/CWE/CWE-020
45
+ semmlecode-javascript-queries/Security/CWE-022/TaintedPath.ql: /Security/CWE/CWE-022
56
+ semmlecode-javascript-queries/Security/CWE-078/CommandInjection.ql: /Security/CWE/CWE-078
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
9+
Sanitizing untrusted URLs is an important technique for
10+
preventing attacks such as request forgeries and malicious
11+
redirections. Usually, this is done by checking that the host of a URL
12+
is in a set of allowed hosts.
13+
14+
</p>
15+
16+
<p>
17+
18+
However, it is notoriously error-prone to treat the URL as
19+
a string and check if one of the allowed hosts is a substring of the
20+
URL. Malicious URLs can bypass such security checks by embedding one
21+
of the allowed hosts in an unexpected location.
22+
23+
</p>
24+
25+
<p>
26+
27+
Even if the substring check is not used in a
28+
security-critical context, the incomplete check may still cause
29+
undesirable behaviors when the check succeeds accidentally.
30+
31+
</p>
32+
</overview>
33+
34+
<recommendation>
35+
<p>
36+
37+
Parse a URL before performing a check on its host value,
38+
and ensure that the check handles arbitrary subdomain sequences
39+
correctly.
40+
41+
</p>
42+
</recommendation>
43+
44+
<example>
45+
46+
<p>
47+
48+
The following example code checks that a URL redirection
49+
will reach the <code>example.com</code> domain, or one of its
50+
subdomains, and not some malicious site.
51+
52+
</p>
53+
54+
<sample src="examples/IncompleteUrlSubstringSanitization_BAD1.js"/>
55+
56+
<p>
57+
58+
The substring check is, however, easy to bypass. For example
59+
by embedding <code>example.com</code> in the path component:
60+
<code>http://evil-example.net/example.com</code>, or in the query
61+
string component: <code>http://evil-example.net/?x=example.com</code>.
62+
63+
Address these shortcomings by checking the host of the parsed URL instead:
64+
65+
</p>
66+
67+
<sample src="examples/IncompleteUrlSubstringSanitization_BAD2.js"/>
68+
69+
<p>
70+
71+
This is still not a sufficient check as the
72+
following URLs bypass it: <code>http://evil-example.com</code>
73+
<code>http://example.com.evil-example.net</code>.
74+
75+
Instead, use an explicit whitelist of allowed hosts to
76+
make the redirect secure:
77+
78+
</p>
79+
80+
<sample src="examples/IncompleteUrlSubstringSanitization_GOOD.js"/>
81+
82+
</example>
83+
84+
<references>
85+
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
86+
<li>OWASP: <a href="https://www.owasp.org/index.php/Unvalidated_Redirects_and_Forwards_Cheat_Sheet">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
87+
</references>
88+
</qhelp>
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/**
2+
* @name Incomplete URL substring sanitization
3+
* @description Security checks on the substrings of an unparsed URL are often vulnerable to bypassing.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @precision high
7+
* @id js/incomplete-url-substring-sanitization
8+
* @tags correctness
9+
* security
10+
* external/cwe/cwe-20
11+
*/
12+
13+
import javascript
14+
private import semmle.javascript.dataflow.InferredTypes
15+
16+
from DataFlow::MethodCallNode call, string name, DataFlow::Node substring, string target
17+
where
18+
(name = "indexOf" or name = "includes" or name = "startsWith" or name = "endsWith") and
19+
call.getMethodName() = name and
20+
substring = call.getArgument(0) and
21+
substring.mayHaveStringValue(target) and
22+
(
23+
// target contains a domain on a common TLD, and perhaps some other URL components
24+
target.regexpMatch("(?i)([a-z]*:?//)?\\.?([a-z0-9-]+\\.)+(com|org|edu|gov|uk|net)(:[0-9]+)?/?") or
25+
// target is a HTTP URL to a domain on any TLD
26+
target.regexpMatch("(?i)https?://([a-z0-9-]+\\.)+([a-z]+)(:[0-9]+)?/?")
27+
) and
28+
// whitelist
29+
not (
30+
name = "indexOf" and
31+
(
32+
// arithmetic on the indexOf-result
33+
any(ArithmeticExpr e).getAnOperand().getUnderlyingValue() = call.asExpr()
34+
or
35+
// non-trivial position check on the indexOf-result
36+
exists(EqualityTest test, Expr n |
37+
test.hasOperands(call.asExpr(), n) |
38+
not n.getIntValue() = [-1..0]
39+
)
40+
)
41+
or
42+
// the leading dot in a subdomain sequence makes the suffix-check safe (if it is performed on the host of the url)
43+
name = "endsWith" and
44+
target.regexpMatch("(?i)\\.([a-z0-9-]+)(\\.[a-z0-9-]+)+")
45+
or
46+
// the trailing slash makes the prefix-check safe
47+
(
48+
name = "startsWith"
49+
or
50+
name = "indexOf" and
51+
exists(EqualityTest test, Expr n |
52+
test.hasOperands(call.asExpr(), n) and
53+
n.getIntValue() = 0
54+
)
55+
) and
56+
target.regexpMatch(".*/")
57+
)
58+
select call, "'$@' may be at an arbitrary position in the sanitized URL.", substring, target
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param("url");
3+
// BAD: the host of `url` may be controlled by an attacker
4+
if (url.includes("example.com")) {
5+
res.redirect(url);
6+
}
7+
});
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param("url"),
3+
host = urlLib.parse(url).host;
4+
// BAD: the host of `url` may be controlled by an attacker
5+
if (host.includes("example.com")) {
6+
res.redirect(url);
7+
}
8+
});
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
app.get('/some/path', function(req, res) {
2+
let url = req.param('url'),
3+
host = urlLib.parse(url).host;
4+
// GOOD: the host of `url` can not be controlled by an attacker
5+
let allowedHosts = [
6+
'example.com',
7+
'beta.example.com',
8+
'www.example.com'
9+
];
10+
if (allowedHosts.includes(host)) {
11+
res.redirect(url);
12+
}
13+
});
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
| tst-IncompleteUrlSubstringSanitization.js:4:5:4:27 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:4:15:4:26 | "secure.com" | secure.com |
2+
| tst-IncompleteUrlSubstringSanitization.js:5:5:5:27 | x.index ... e.net") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:5:15:5:26 | "secure.net" | secure.net |
3+
| tst-IncompleteUrlSubstringSanitization.js:6:5:6:28 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:6:15:6:27 | ".secure.com" | .secure.com |
4+
| tst-IncompleteUrlSubstringSanitization.js:10:5:10:27 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:10:15:10:26 | "secure.com" | secure.com |
5+
| tst-IncompleteUrlSubstringSanitization.js:11:5:11:27 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:11:15:11:26 | "secure.com" | secure.com |
6+
| tst-IncompleteUrlSubstringSanitization.js:12:5:12:27 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:12:15:12:26 | "secure.com" | secure.com |
7+
| tst-IncompleteUrlSubstringSanitization.js:14:5:14:38 | x.start ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:14:18:14:37 | "https://secure.com" | https://secure.com |
8+
| tst-IncompleteUrlSubstringSanitization.js:15:5:15:28 | x.endsW ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:15:16:15:27 | "secure.com" | secure.com |
9+
| tst-IncompleteUrlSubstringSanitization.js:20:5:20:28 | x.inclu ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:20:16:20:27 | "secure.com" | secure.com |
10+
| tst-IncompleteUrlSubstringSanitization.js:32:5:32:35 | x.index ... e.com") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:32:15:32:34 | "https://secure.com" | https://secure.com |
11+
| tst-IncompleteUrlSubstringSanitization.js:33:5:33:39 | x.index ... m:443") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:33:15:33:38 | "https: ... om:443" | https://secure.com:443 |
12+
| tst-IncompleteUrlSubstringSanitization.js:34:5:34:36 | x.index ... .com/") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:34:15:34:35 | "https: ... e.com/" | https://secure.com/ |
13+
| tst-IncompleteUrlSubstringSanitization.js:52:5:52:41 | x.index ... ernal") | '$@' may be at an arbitrary position in the sanitized URL. | tst-IncompleteUrlSubstringSanitization.js:52:15:52:40 | "https: ... ternal" | https://example.internal |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Security/CWE-020/IncompleteUrlSubstringSanitization.ql
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
(function(x){
2+
x.indexOf("internal"); // NOT OK, but not flagged
3+
x.indexOf("localhost"); // NOT OK, but not flagged
4+
x.indexOf("secure.com"); // NOT OK
5+
x.indexOf("secure.net"); // NOT OK
6+
x.indexOf(".secure.com"); // NOT OK
7+
x.indexOf("sub.secure."); // NOT OK, but not flagged
8+
x.indexOf(".sub.secure."); // NOT OK, but not flagged
9+
10+
x.indexOf("secure.com") === -1; // NOT OK
11+
x.indexOf("secure.com") === 0; // NOT OK
12+
x.indexOf("secure.com") >= 0; // NOT OK
13+
14+
x.startsWith("https://secure.com"); // NOT OK
15+
x.endsWith("secure.com"); // NOT OK
16+
x.endsWith(".secure.com"); // OK
17+
x.startsWith("secure.com/"); // OK
18+
x.indexOf("secure.com/") === 0; // OK
19+
20+
x.includes("secure.com"); // NOT OK
21+
22+
x.indexOf("#"); // OK
23+
x.indexOf(":"); // OK
24+
x.indexOf(":/"); // OK
25+
x.indexOf("://"); // OK
26+
x.indexOf("//"); // OK
27+
x.indexOf(":443"); // OK
28+
x.indexOf("/some/path/"); // OK
29+
x.indexOf("some/path"); // OK
30+
x.indexOf("/index.html"); // OK
31+
x.indexOf(":template:"); // OK
32+
x.indexOf("https://secure.com"); // NOT OK
33+
x.indexOf("https://secure.com:443"); // NOT OK
34+
x.indexOf("https://secure.com/"); // NOT OK
35+
36+
x.indexOf(".cn"); // NOT OK, but not flagged
37+
x.indexOf(".jpg"); // OK
38+
x.indexOf("index.html"); // OK
39+
x.indexOf("index.js"); // OK
40+
x.indexOf("index.php"); // OK
41+
x.indexOf("index.css"); // OK
42+
43+
x.indexOf("secure=true"); // OK (query param)
44+
x.indexOf("&auth="); // OK (query param)
45+
46+
x.indexOf(getCurrentDomain()); // NOT OK, but not flagged
47+
x.indexOf(location.origin); // NOT OK, but not flagged
48+
49+
x.indexOf("tar.gz") + offset // OK
50+
x.indexOf("tar.gz") - offset // OK
51+
52+
x.indexOf("https://example.internal"); // NOT OK
53+
x.indexOf("https://"); // OK
54+
});

0 commit comments

Comments
 (0)