|
| 1 | +# Improvements to Python analysis |
| 2 | + |
| 3 | + |
| 4 | +## General improvements |
| 5 | + |
| 6 | +> Changes that affect alerts in many files or from many queries |
| 7 | +> For example, changes to file classification |
| 8 | +
|
| 9 | +### Representation of the control flow graph |
| 10 | + |
| 11 | +The representation of the control flow graph (CFG) has been modified to better reflect the semantics of Python. |
| 12 | + |
| 13 | +The following statement types no longer have a CFG node for the statement itself, as their sub-expressions already contain all the |
| 14 | +semantically significant information: |
| 15 | + |
| 16 | +* `ExprStmt` |
| 17 | +* `If` |
| 18 | +* `Assign` |
| 19 | +* `Import` |
| 20 | + |
| 21 | +For example, the CFG for `if cond: foo else bar` now starts with the CFG node for `cond`. |
| 22 | + |
| 23 | +For the following statement types, the CFG node for the statement now follows the CFG nodes of its sub-expressions to better reflect the semantics: |
| 24 | + |
| 25 | +* `Print` |
| 26 | +* `TemplateWrite` |
| 27 | +* `ImportStar` |
| 28 | + |
| 29 | +For example the CFG for `print foo` (in Python 2) has changed from `print -> foo` to `foo -> print`, better reflecting the runtime behavior. |
| 30 | + |
| 31 | + |
| 32 | +The CFG for the `with` statement has been re-ordered to more closely reflect the semantics. |
| 33 | +For the `with` statement: |
| 34 | +```python |
| 35 | +with cm as var: |
| 36 | + body |
| 37 | +``` |
| 38 | +The order of the CFG changes from: |
| 39 | + |
| 40 | + <with> |
| 41 | + cm |
| 42 | + var |
| 43 | + body |
| 44 | + |
| 45 | +to: |
| 46 | + |
| 47 | + cm |
| 48 | + <with> |
| 49 | + var |
| 50 | + body |
| 51 | + |
| 52 | +A new predicate `Stmt.getAnEntryNode()` has been added to make it easier to write reachability queries involving statements. |
| 53 | + |
| 54 | + |
| 55 | +## New queries |
| 56 | + |
| 57 | +| **Query** | **Tags** | **Purpose** | |
| 58 | +|-----------------------------|-----------|--------------------------------------------------------------------| |
| 59 | +| Information exposure through an exception (`py/stack-trace-exposure`) | security, external/cwe/cwe-209, external/cwe/cwe-497 | Finds instances where information about an exception may be leaked to an external user. Enabled on LGTM by default. | |
| 60 | + |
| 61 | +## Changes to existing queries |
| 62 | + |
| 63 | +All taint-tracking queries now support visualization of paths in QL for Eclipse. |
| 64 | +Most security alerts are now visible on LGTM by default. |
| 65 | + |
| 66 | +| **Query** | **Expected impact** | **Change** | |
| 67 | +|----------------------------|------------------------|------------------------------------------------------------------| |
| 68 | +| Assert statement tests the truth value of a literal constant (`py/assert-literal-constant`) | reliability, correctness | Checks whether an assert statement is testing the truth of a literal constant value. Not shown by default. | |
| 69 | +| Code injection (`py/code-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | |
| 70 | +| Deserializing untrusted input (`py/unsafe-deserialization`) | Supports path visualization | No change to expected results | |
| 71 | +| Encoding error (`py/encoding-error`) | Better alert location | Alert is now shown at the position of the first offending character, rather than at the top of the file. | |
| 72 | +| Missing call to \_\_init\_\_ during object initialization (`py/missing-call-to-init`) | Fewer false positive results | Results where it is likely that the full call chain has not been analyzed are no longer reported. | |
| 73 | +| Reflected server-side cross-site scripting (`py/reflective-xss`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | |
| 74 | +| SQL query built from user-controlled sources (`py/sql-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | |
| 75 | +| Uncontrolled data used in path expression (`py/path-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | |
| 76 | +| Uncontrolled command line (`py/command-line-injection`) | Supports path visualization and is now visible on LGTM by default | No change to expected results | |
| 77 | +| URL redirection from remote source (`py/url-redirection`) | Fewer false positive results and now supports path visualization | Taint is no longer tracked from the right hand side of binary expressions. In other words `SAFE + TAINTED` is now treated as safe. | |
| 78 | + |
| 79 | + |
| 80 | +## Changes to code extraction |
| 81 | + |
| 82 | +* Improved scalability: Scaling is near linear to at least 20 CPU cores. |
| 83 | +* Five levels of logging can be selected: `ERROR`, `WARN`, `INFO`, `DEBUG` and `TRACE`. `WARN` is the stand-alone default, but `INFO` will be used when run by LGTM. |
| 84 | +* The `-v` flag can be specified multiple times to increase logging level by one per `-v`. |
| 85 | +* The `-q` flag has been added and can be specified multiple times to reduce the logging level by one per `-q`. |
| 86 | +* Log lines are now in the `[SEVERITY] message` style and never overlap. |
| 87 | +* Extractor now outputs the location of the first offending character when an EncodingError is encountered. |
| 88 | + |
| 89 | +## Changes to QL libraries |
| 90 | + |
| 91 | +* Taint tracking analysis now understands HTTP requests in the `twisted` library. |
| 92 | + |
| 93 | +* The analysis now handles `isinstance` and `issubclass` tests involving the basic abstract base classes better. For example, the test `issubclass(list, collections.Sequence)` is now understood to be `True` |
| 94 | +* Taint tracking automatically tracks tainted mappings and collections, without you having to add additional taint kinds. This means that custom taints are tracked from `x` to `y` in the following flow: `l = [x]; y =l[0]`. |
0 commit comments