11# Using the shared data-flow library
22
3- This document is aimed towards language maintainers and contain implementation
3+ This document is aimed towards language maintainers and contains implementation
44details that should be mostly irrelevant to query writers.
55
66## Overview
@@ -40,9 +40,10 @@ module DataFlow {
4040The ` DataFlowImpl.qll ` and ` DataFlowCommon.qll ` files contain the library code
4141that is shared across languages. These contain ` Configuration ` -specific and
4242` Configuration ` -independent code, respectively. This organization allows
43- multiple copies of the library (for the use case when a query wants to use two
44- instances of global data flow and the configuration of one depends on the
45- results from the other). Using multiple copies just means duplicating
43+ multiple copies of the library to exist without duplicating the
44+ ` Configuration ` -independent predicates (for the use case when a query wants to
45+ use two instances of global data flow and the configuration of one depends on
46+ the results from the other). Using multiple copies just means duplicating
4647` DataFlow.qll ` and ` DataFlowImpl.qll ` , for example as:
4748
4849```
@@ -52,9 +53,9 @@ dataflow/internal/DataFlowImpl2.qll
5253dataflow/internal/DataFlowImpl3.qll
5354```
5455
55- The ` DataFlowImplSpecific.qll ` provides all the language-specific classes and
56- predicates that the library needs as input and is the topic of the rest of this
57- document.
56+ The file ` DataFlowImplSpecific.qll ` provides all the language-specific classes
57+ and predicates that the library needs as input and is the topic of the rest of
58+ this document.
5859
5960This file must provide two modules named ` Public ` and ` Private ` , which the
6061shared library code will import publicly and privately, respectively, thus
@@ -88,7 +89,9 @@ Recommendations:
8889* Define ` predicate localFlowStep(Node node1, Node node2) ` as an alias of
8990 ` simpleLocalFlowStep ` and expose it publicly. The reason for this indirection
9091 is that it gives the option of exposing local flow augmented with field flow.
91- See the C/C++ implementation, which makes use of this feature.
92+ See the C/C++ implementation, which makes use of this feature. Another use of
93+ this indirection is to hide synthesized local steps that are only relevant
94+ for global flow. See the C# implementation for an example of this.
9295* Define ` predicate localFlow(Node node1, Node node2) { localFlowStep*(node1, node2) } ` .
9396* Make the local flow step relation in ` simpleLocalFlowStep ` follow
9497 def-to-first-use and use-to-next-use steps for SSA variables. Def-use steps
@@ -141,8 +144,9 @@ must be provided.
141144First, two types, ` DataFlowCall ` and ` DataFlowCallable ` , must be defined. These
142145should be aliases for whatever language-specific class represents calls and
143146callables (a "callable" is intended as a broad term covering functions,
144- methods, constructors, lambdas, etc.). The call-graph should be defined as a
145- predicate:
147+ methods, constructors, lambdas, etc.). It can also be useful to represent
148+ ` DataFlowCall ` as an IPA type if implicit calls need to be modelled. The
149+ call-graph should be defined as a predicate:
146150``` ql
147151DataFlowCallable viableCallable(DataFlowCall c)
148152```
@@ -182,7 +186,7 @@ corresponding `OutNode`s.
182186
183187Flow through global variables are called jump-steps, since such flow steps
184188essentially jump from one callable to another completely discarding call
185- context .
189+ contexts .
186190
187191Adding support for this type of flow is done with the following predicate:
188192``` ql
@@ -206,10 +210,12 @@ as described above.
206210
207211The library supports tracking flow through field stores and reads. In order to
208212support this, a class ` Content ` and two predicates
209- ` storeStep(Node node1, Content f, PostUpdateNode node2) ` and
210- ` readStep(Node node1, Content f, Node node2) ` must be defined. Besides this,
211- certain nodes must have associated ` PostUpdateNode ` s. The node associated with
212- a ` PostUpdateNode ` should be defined by ` PostUpdateNode::getPreUpdateNode() ` .
213+ ` storeStep(Node node1, Content f, Node node2) ` and
214+ ` readStep(Node node1, Content f, Node node2) ` must be defined. It generally
215+ makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
216+ requirement. Besides this, certain nodes must have associated
217+ ` PostUpdateNode ` s. The node associated with a ` PostUpdateNode ` should be
218+ defined by ` PostUpdateNode::getPreUpdateNode() ` .
213219
214220` PostUpdateNode ` s are generally used when we need two data-flow nodes for a
215221single AST element in order to distinguish the value before and after some
@@ -351,30 +357,27 @@ otherwise be equivalent with respect to compatibility can then be represented
351357as a single entity (this improves performance). As an example, Java uses erased
352358types for this purpose and a single equivalence class for all numeric types.
353359
354- One also needs to define
360+ The type of a ` Node ` is given by the following predicate
361+ ```
362+ DataFlowType getNodeType(Node n)
363+ ```
364+ and every ` Node ` should have a type.
365+
366+ One also needs to define the the string representation of a ` DataFlowType ` :
355367```
356- Type Node::getType()
357- Type Node::getTypeBound()
358- DataFlowType getErasedRepr(Type t)
359368string ppReprType(DataFlowType t)
360369```
361- where ` Type ` can be a language-specific name for the types native to the
362- language. Of the member predicate ` Node::getType() ` and ` Node::getTypeBound() `
363- only the latter is used by the library, but the former is usually nice to have
364- if it makes sense for the language. The ` getErasedRepr ` predicate acts as the
365- translation between regular types and the type system used for pruning, the
366- shared library will use ` getErasedRepr(node.getTypeBound()) ` to get the
367- ` DataFlowType ` for a node. The ` ppReprType ` predicate is used for printing a
368- type in the labels of ` PathNode ` s, this can be defined as ` none() ` if type
369- pruning is not used.
370+ The ` ppReprType ` predicate is used for printing a type in the labels of
371+ ` PathNode ` s, this can be defined as ` none() ` if type pruning is not used.
370372
371373Finally, one must define ` CastNode ` as a subclass of ` Node ` as those nodes
372374where types should be checked. Usually this will be things like explicit casts.
373375The shared library will also check types at ` ParameterNode ` s and ` OutNode ` s
374376without needing to include these in ` CastNode ` . It is semantically perfectly
375377valid to include all nodes in ` CastNode ` , but this can hurt performance as it
376378will reduce the opportunity for the library to compact several local steps into
377- one.
379+ one. It is also perfectly valid to leave ` CastNode ` as the empty set, and this
380+ should be the default if type pruning is not used.
378381
379382## Virtual dispatch with call context
380383
@@ -424,9 +427,9 @@ that can be tracked. This is given by the following predicate:
424427``` ql
425428int accessPathLimit() { result = 5 }
426429```
427- We have traditionally used 5 as a default value here, as we have yet to observe
428- the need for this much field nesting . Changing this value has a direct impact
429- on performance for large databases.
430+ We have traditionally used 5 as a default value here, and real examples have
431+ been observed to require at least this much . Changing this value has a direct
432+ impact on performance for large databases.
430433
431434### Hidden nodes
432435
0 commit comments