Skip to content

Commit f85e30a

Browse files
authored
Merge pull request #571 from xiemaisi/js/numeric-constant-interpreted-as-code
JavaScript: Add new query `HardcodedDataInterpretedAsCode`.
2 parents 1956cd8 + 73ce0f1 commit f85e30a

File tree

20 files changed

+312
-21
lines changed

20 files changed

+312
-21
lines changed

change-notes/1.19/analysis-javascript.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## General improvements
44

5-
* Modelling of taint flow through array operations has been improved. This may give additional results for the security queries.
5+
* Modeling of taint flow through array and buffer operations has been improved. This may give additional results for the security queries.
66

77
* Support for AMD modules has been improved. This may give additional results for the security queries as well as any queries that use type inference on code bases that use such modules.
88

@@ -23,6 +23,7 @@
2323
|-----------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2424
| Enabling Node.js integration for Electron web content renderers (`js/enabling-electron-renderer-node-integration`) | security, frameworks/electron, external/cwe/cwe-094 | Highlights Electron web content renderer preferences with Node.js integration enabled, indicating a violation of [CWE-94](https://cwe.mitre.org/data/definitions/94.html). Results are not shown on LGTM by default. |
2525
| File data in outbound network request | security, external/cwe/cwe-200 | Highlights locations where file data is sent in a network request. Results are not shown on LGTM by default. |
26+
| Hard-coded data interpreted as code | security, external/cwe/cwe-506 | Highlights locations where hard-coded data is transformed and then executed as code or interpreted as an import path, which may indicate embedded malicious code ([CWE-506](https://cwe.mitre.org/data/definitions/506.html)). Results are shown on LGTM by default. |
2627
| Host header poisoning in email generation | security, external/cwe/cwe-640 | Highlights code that generates emails with links that can be hijacked by HTTP host header poisoning, indicating a violation of [CWE-640](https://cwe.mitre.org/data/definitions/640.html). Results shown on LGTM by default. |
2728
| Replacement of a substring with itself (`js/identity-replacement`) | correctness, security, external/cwe/cwe-116 | Highlights string replacements that replace a string with itself, which usually indicates a mistake. Results shown on LGTM by default. |
2829
| Stored cross-site scripting (`js/stored-xss`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights uncontrolled stored values flowing into HTML content, indicating a violation of [CWE-079](https://cwe.mitre.org/data/definitions/79.html). Results shown on LGTM by default. |

javascript/config/suites/javascript/security

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
+ semmlecode-javascript-queries/Security/CWE-352/MissingCsrfMiddleware.ql: /Security/CWE/CWE-352
2121
+ semmlecode-javascript-queries/Security/CWE-400/RemotePropertyInjection.ql: /Security/CWE/CWE-400
2222
+ semmlecode-javascript-queries/Security/CWE-502/UnsafeDeserialization.ql: /Security/CWE/CWE-502
23+
+ semmlecode-javascript-queries/Security/CWE-506/HardcodedDataInterpretedAsCode.ql: /Security/CWE/CWE-506
2324
+ semmlecode-javascript-queries/Security/CWE-601/ClientSideUrlRedirect.ql: /Security/CWE/CWE-601
2425
+ semmlecode-javascript-queries/Security/CWE-601/ServerSideUrlRedirect.ql: /Security/CWE/CWE-601
2526
+ semmlecode-javascript-queries/Security/CWE-611/Xxe.ql: /Security/CWE/CWE-611
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
4+
<overview>
5+
<p>
6+
Interpreting hard-coded data, such as string literals containing hexadecimal numbers,
7+
as code or as an import path is typical of malicious backdoor code that has been
8+
implanted into an otherwise trusted code base and is trying to hide its true purpose
9+
from casual readers or automated scanning tools.
10+
</p>
11+
</overview>
12+
13+
<recommendation>
14+
<p>
15+
Examine the code in question carefully to ascertain its provenance and its true purpose.
16+
If the code is benign, it should always be possible to rewrite it without relying
17+
on dynamically interpreting data as code, improving both clarity and safety.
18+
</p>
19+
</recommendation>
20+
21+
<example>
22+
<p>
23+
As an example of malicious code using this obfuscation technique, consider the following
24+
simplified version of a snippet of backdoor code that was discovered in a dependency of
25+
the popular <code>event-stream</code> npm package:
26+
</p>
27+
<sample src="examples/HardcodedDataInterpretedAsCode.js"/>
28+
<p>
29+
While this shows only the first few lines of code, it already looks very suspicious
30+
since it takes a hard-coded string literal, hex-decodes it and then uses it as an
31+
import path. The only reason to do so is to hide the name of the file being imported.
32+
</p>
33+
</example>
34+
35+
<references>
36+
<li>
37+
OWASP:
38+
<a href="https://www.owasp.org/index.php/Trojan_Horse">Trojan Horse</a>.
39+
</li>
40+
<li>
41+
The npm Blog:
42+
<a href="https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident">Details about the event-stream incident</a>.
43+
</li>
44+
</references>
45+
46+
</qhelp>
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/**
2+
* @name Hard-coded data interpreted as code
3+
* @description Transforming hard-coded data (such as hexadecimal constants) into code
4+
* to be executed is a technique often associated with backdoors and should
5+
* be avoided.
6+
* @kind path-problem
7+
* @problem.severity error
8+
* @precision medium
9+
* @id js/hardcoded-data-interpreted-as-code
10+
* @tags security
11+
* external/cwe/cwe-506
12+
*/
13+
14+
import javascript
15+
import semmle.javascript.security.dataflow.HardcodedDataInterpretedAsCode::HardcodedDataInterpretedAsCode
16+
import DataFlow::PathGraph
17+
18+
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
19+
where cfg.hasFlowPath(source, sink)
20+
select sink.getNode(), source, sink,
21+
"Hard-coded data from $@ is interpreted as " + sink.getNode().(Sink).getKind() + ".",
22+
source.getNode(), "here"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
var r = require;
2+
3+
function e(r) {
4+
return Buffer.from(r, "hex").toString()
5+
}
6+
7+
// BAD: hexadecimal constant decoded and interpreted as import path
8+
var n = r(e("2e2f746573742f64617461"));

javascript/ql/src/semmle/javascript/NodeJS.qll

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,13 @@ predicate findNodeModulesFolder(Folder f, Folder nodeModules, int distance) {
132132
*/
133133
private class RequireVariable extends Variable {
134134
RequireVariable() {
135-
exists (ModuleScope m | this = m.getVariable("require"))
135+
this = any(ModuleScope m).getVariable("require")
136+
or
137+
// cover cases where we failed to detect Node.js code
138+
this.(GlobalVariable).getName() = "require"
139+
or
140+
// track through assignments to other variables
141+
this.getAnAssignedExpr().(VarAccess).getVariable() instanceof RequireVariable
136142
}
137143
}
138144

@@ -149,7 +155,9 @@ private predicate moduleInFile(Module m, File f) {
149155
class Require extends CallExpr, Import {
150156
Require() {
151157
exists (RequireVariable req |
152-
this.getCallee() = req.getAnAccess()
158+
this.getCallee() = req.getAnAccess() and
159+
// `mjs` files explicitly disallow `require`
160+
getFile().getExtension() != "mjs"
153161
)
154162
}
155163

javascript/ql/src/semmle/javascript/frameworks/NodeJSLib.qll

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,22 @@ module NodeJSLib {
297297
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
298298
pred = tainted and succ = this
299299
}
300+
}
301+
302+
/**
303+
* A model of taint propagation through `new Buffer` and `Buffer.from`.
304+
*/
305+
private class BufferTaintStep extends TaintTracking::AdditionalTaintStep, DataFlow::InvokeNode {
306+
BufferTaintStep() {
307+
this = DataFlow::globalVarRef("Buffer").getAnInstantiation()
308+
or
309+
this = DataFlow::globalVarRef("Buffer").getAMemberInvocation("from")
310+
}
300311

312+
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
313+
pred = getArgument(0) and
314+
succ = this
315+
}
301316
}
302317

303318
/**
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
/**
2+
* Provides a taint-tracking configuration for reasoning about hard-coded data
3+
* being interpreted as code.
4+
*/
5+
6+
import javascript
7+
private import semmle.javascript.security.dataflow.CodeInjection
8+
9+
module HardcodedDataInterpretedAsCode {
10+
/**
11+
* A data flow source for hard-coded data.
12+
*/
13+
abstract class Source extends DataFlow::Node {
14+
/** Gets a flow label for which this is a source. */
15+
DataFlow::FlowLabel getLabel() {
16+
result = DataFlow::FlowLabel::data()
17+
}
18+
}
19+
20+
/**
21+
* A data flow sink for code injection.
22+
*/
23+
abstract class Sink extends DataFlow::Node {
24+
/** Gets a flow label for which this is a sink. */
25+
abstract DataFlow::FlowLabel getLabel();
26+
27+
/** Gets a description of what kind of sink this is. */
28+
abstract string getKind();
29+
}
30+
31+
/**
32+
* A sanitizer for hard-coded data.
33+
*/
34+
abstract class Sanitizer extends DataFlow::Node {}
35+
36+
/**
37+
* A taint-tracking configuration for reasoning about hard-coded data
38+
* being interpreted as code
39+
*/
40+
class Configuration extends TaintTracking::Configuration {
41+
Configuration() {
42+
this = "HardcodedDataInterpretedAsCode"
43+
}
44+
45+
override predicate isSource(DataFlow::Node source, DataFlow::FlowLabel lbl) {
46+
source.(Source).getLabel() = lbl
47+
}
48+
49+
override predicate isSink(DataFlow::Node nd, DataFlow::FlowLabel lbl) {
50+
nd.(Sink).getLabel() = lbl
51+
}
52+
53+
override predicate isSanitizer(DataFlow::Node node) {
54+
node instanceof Sanitizer
55+
}
56+
}
57+
58+
/**
59+
* A constant string consisting of eight or more hexadecimal characters (including at
60+
* least one digit), viewed as a source of hard-coded data that should not be
61+
* interpreted as code.
62+
*/
63+
private class DefaultSource extends Source, DataFlow::ValueNode {
64+
DefaultSource() {
65+
exists (string val | val = astNode.(Expr).getStringValue() |
66+
val.regexpMatch("[0-9a-fA-F]{8,}") and
67+
val.regexpMatch(".*[0-9].*")
68+
)
69+
}
70+
}
71+
72+
/**
73+
* A code injection sink; hard-coded data should not flow here.
74+
*/
75+
private class DefaultCodeInjectionSink extends Sink {
76+
DefaultCodeInjectionSink() { this instanceof CodeInjection::Sink }
77+
override DataFlow::FlowLabel getLabel() { result = DataFlow::FlowLabel::taint() }
78+
override string getKind() { result = "code" }
79+
}
80+
81+
/**
82+
* An argument to `require` path; hard-coded data should not flow here.
83+
*/
84+
private class RequireArgumentSink extends Sink {
85+
RequireArgumentSink() {
86+
this = any(Require r).getAnArgument().flow()
87+
}
88+
89+
override DataFlow::FlowLabel getLabel() {
90+
result = DataFlow::FlowLabel::data()
91+
or
92+
result = DataFlow::FlowLabel::taint()
93+
}
94+
95+
override string getKind() { result = "an import path" }
96+
}
97+
}
Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
1-
| a.js:1:9:1:22 | require('./b') | ./b | b.js:1:1:8:0 | <toplevel> |
2-
| a.js:3:6:3:23 | require('./sub/c') | ./sub/c | sub/c.js:1:1:4:0 | <toplevel> |
3-
| a.js:4:6:4:29 | require ... /d.js') | ./sub/../d.js | d.js:1:1:7:15 | <toplevel> |
4-
| a.js:7:1:7:18 | require('./sub/c') | ./sub/c | sub/c.js:1:1:4:0 | <toplevel> |
5-
| a.js:10:1:10:18 | require(__dirname) | | index.js:1:1:3:0 | <toplevel> |
6-
| a.js:11:1:11:25 | require ... + '/e') | /e | e.js:1:1:7:0 | <toplevel> |
7-
| a.js:12:1:12:28 | require ... + 'c') | ./sub/c | sub/c.js:1:1:4:0 | <toplevel> |
8-
| b.js:1:1:1:18 | require('./sub/c') | ./sub/c | sub/c.js:1:1:4:0 | <toplevel> |
9-
| d.js:7:1:7:14 | require('foo') | foo | sub/f.js:1:1:4:17 | <toplevel> |
10-
| index.js:2:1:2:41 | require ... b.js")) | /index.js/../b.js | b.js:1:1:8:0 | <toplevel> |
11-
| mjs-files/require-from-js.js:1:12:1:36 | require ... on-me') | ./depend-on-me | mjs-files/depend-on-me.mjs:1:1:7:1 | <toplevel> |
12-
| mjs-files/require-from-js.js:2:12:2:39 | require ... me.js') | ./depend-on-me.js | mjs-files/depend-on-me.js:1:1:8:0 | <toplevel> |
13-
| mjs-files/require-from-js.js:3:12:3:40 | require ... e.mjs') | ./depend-on-me.mjs | mjs-files/depend-on-me.mjs:1:1:7:1 | <toplevel> |
14-
| sub/c.js:1:1:1:15 | require('../a') | ../a | a.js:1:1:14:0 | <toplevel> |
1+
| a.js:1:9:1:22 | require('./b') |
2+
| a.js:2:7:2:19 | require('fs') |
3+
| a.js:3:6:3:23 | require('./sub/c') |
4+
| a.js:4:6:4:29 | require ... /d.js') |
5+
| a.js:7:1:7:18 | require('./sub/c') |
6+
| a.js:10:1:10:18 | require(__dirname) |
7+
| a.js:11:1:11:25 | require ... + '/e') |
8+
| a.js:12:1:12:28 | require ... + 'c') |
9+
| b.js:1:1:1:18 | require('./sub/c') |
10+
| d.js:1:1:1:38 | require ... s/ini') |
11+
| d.js:7:1:7:14 | require('foo') |
12+
| f.js:2:1:2:7 | r("fs") |
13+
| index.js:1:12:1:26 | require('path') |
14+
| index.js:2:1:2:41 | require ... b.js")) |
15+
| mjs-files/require-from-js.js:1:12:1:36 | require ... on-me') |
16+
| mjs-files/require-from-js.js:2:12:2:39 | require ... me.js') |
17+
| mjs-files/require-from-js.js:3:12:3:40 | require ... e.mjs') |
18+
| sub/c.js:1:1:1:15 | require('../a') |
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
11
import semmle.javascript.NodeJS
22

3-
from Require r, string fullpath, string prefix
4-
where fullpath = r.getImportedPath().getValue() and
5-
sourceLocationPrefix(prefix)
6-
select r, fullpath.replaceAll(prefix, ""), r.getImportedModule()
3+
from Require r
4+
select r

0 commit comments

Comments
 (0)