-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Expand file tree
/
Copy pathDataFlowDispatch.qll
More file actions
1923 lines (1675 loc) · 67.4 KB
/
DataFlowDispatch.qll
File metadata and controls
1923 lines (1675 loc) · 67.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/**
* INTERNAL: Do not use.
*
* TypeTracker based call-graph.
*
* The overall scheme for resolving calls, is to notice that Python has different kinds
* of callables, and resolve those with different strategies. Currently we handle these
* completely separately:
* 1. plain functions (and lambdas)
* 2. methods on classes
* 3. class instantiation
*
* So we have type-trackers for each of the 3 categories above, with some considerable
* effort to handle different kinds of methods on classes (staticmethod, classmethod,
* normal), and resolving methods correctly in regards to MRO.
*
*
* A goal of this library is to support modeling calls that happens by third-party
* libraries. For example `call_later(func, arg0, arg1, foo=val)`, and the fact that the
* library might inject it's own arguments, for example a context that will always be
* passed as the actual first argument to the function. Currently the aim is to provide
* enough predicates for such `call_later` function to be modeled by providing
* additional data-flow steps for the arguments/parameters. This means we cannot have
* any special logic that requires an AST call to be made before we care to figure out
* what callable this call might end up targeting.
*
* Specifically this means that we cannot use type-backtrackers from the function of a
* `CallNode`, since there is no `CallNode` to backtrack from for `func` in the example
* above.
*
* Note: This hasn't been 100% realized yet, so we don't currently expose a predicate to
* ask what targets any data-flow node has. But it's still the plan to do this!
*/
private import python
private import DataFlowPublic
private import DataFlowPrivate
private import FlowSummaryImpl as FlowSummaryImpl
private import semmle.python.internal.CachedStages
private import semmle.python.dataflow.new.internal.TypeTrackingImpl::CallGraphConstruction as CallGraphConstruction
newtype TParameterPosition =
/** Used for `self` in methods, and `cls` in classmethods. */
TSelfParameterPosition() or
/**
* This is used for tracking flow through captured variables, and
* we use separate parameter/argument positions in order to distinguish
* "lambda self" from "normal self", as lambdas may also access outer `self`
* variables (through variable capture).
*/
TLambdaSelfParameterPosition() or
TPositionalParameterPosition(int index) {
index = any(Parameter p).getPosition()
or
// since synthetic parameters are made for a synthetic summary callable, based on
// what Argument positions they have flow for, we need to make sure we have such
// parameter positions available.
FlowSummaryImpl::ParsePositions::isParsedPositionalArgumentPosition(_, index)
} or
TPositionalParameterLowerBoundPosition(int pos) {
FlowSummaryImpl::ParsePositions::isParsedArgumentLowerBoundPosition(_, pos)
} or
TKeywordParameterPosition(string name) {
name = any(Parameter p).getName()
or
// see comment for TPositionalParameterPosition
FlowSummaryImpl::ParsePositions::isParsedKeywordArgumentPosition(_, name)
} or
TStarArgsParameterPosition(int index) {
// since `.getPosition` does not work for `*args`, we need *args parameter positions
// at index 1 larger than the largest positional parameter position (and 0 must be
// included as well). This is a bit of an over-approximation.
index = 0 or
index = any(Parameter p).getPosition() + 1
} or
TSynthStarArgsElementParameterPosition(int index) { exists(TStarArgsParameterPosition(index)) } or
TDictSplatParameterPosition() or
// To get flow from a **kwargs argument to a keyword parameter, we add a read-step
// from a synthetic **kwargs parameter. We need this separate synthetic ParameterNode,
// since we clear content of the normal **kwargs parameter for the names that
// correspond to normal keyword parameters. Since we cannot re-use the same parameter
// position for multiple parameter nodes in the same callable, we introduce this
// synthetic parameter position.
TSynthDictSplatParameterPosition()
/** A parameter position. */
class ParameterPosition extends TParameterPosition {
/** Holds if this position represents a `self`/`cls` parameter. */
predicate isSelf() { this = TSelfParameterPosition() }
/** Holds if this position represents a reference to a lambda itself. Only used for tracking flow through captured variables. */
predicate isLambdaSelf() { this = TLambdaSelfParameterPosition() }
/** Holds if this position represents a positional parameter at (0-based) `index`. */
predicate isPositional(int index) { this = TPositionalParameterPosition(index) }
/** Holds if this position represents any positional parameter starting from position `pos`. */
predicate isPositionalLowerBound(int pos) { this = TPositionalParameterLowerBoundPosition(pos) }
/** Holds if this position represents a keyword parameter named `name`. */
predicate isKeyword(string name) { this = TKeywordParameterPosition(name) }
/** Holds if this position represents a `*args` parameter at (0-based) `index`. */
predicate isStarArgs(int index) { this = TStarArgsParameterPosition(index) }
/**
* Holds if this position represents a synthetic parameter at or after (0-based)
* position `index`, from which there will be made a store step to the real
* `*args` parameter.
*/
predicate isSynthStarArgsElement(int index) {
this = TSynthStarArgsElementParameterPosition(index)
}
/** Holds if this position represents a `**kwargs` parameter. */
predicate isDictSplat() { this = TDictSplatParameterPosition() }
/**
* Holds if this position represents a **synthetic** `**kwargs` parameter
* (see comment for `TSynthDictSplatParameterPosition`).
*/
predicate isSynthDictSplat() { this = TSynthDictSplatParameterPosition() }
/** Gets a textual representation of this element. */
string toString() {
this.isSelf() and result = "self"
or
this.isLambdaSelf() and result = "lambda self"
or
exists(int index | this.isPositional(index) and result = "position " + index)
or
exists(int pos | this.isPositionalLowerBound(pos) and result = "position " + pos + "..")
or
exists(string name | this.isKeyword(name) and result = "keyword " + name)
or
exists(int index | this.isStarArgs(index) and result = "*args at " + index)
or
exists(int index |
this.isSynthStarArgsElement(index) and
result = "synthetic *args element at (or after) " + index
)
or
this.isDictSplat() and result = "**"
or
this.isSynthDictSplat() and result = "synthetic **"
}
}
newtype TArgumentPosition =
/** Used for `self` in methods, and `cls` in classmethods. */
TSelfArgumentPosition() or
/**
* This is used for tracking flow through captured variables, and
* we use separate parameter/argument positions in order to distinguish
* "lambda self" from "normal self", as lambdas may also access outer `self`
* variables (through variable capture).
*/
TLambdaSelfArgumentPosition() or
TPositionalArgumentPosition(int index) {
exists(any(CallNode c).getArg(index))
or
// since synthetic calls within a summarized callable could use a unique argument
// position, we need to ensure we make these available (these are specified as
// parameters in the flow-summary spec)
FlowSummaryImpl::ParsePositions::isParsedPositionalParameterPosition(_, index)
or
// the generated function inside a comprehension has a positional argument at index 0
exists(Comp c) and
index = 0
} or
TKeywordArgumentPosition(string name) {
exists(any(CallNode c).getArgByName(name))
or
// see comment for TPositionalArgumentPosition
FlowSummaryImpl::ParsePositions::isParsedKeywordParameterPosition(_, name)
} or
TStarArgsArgumentPosition(int index) {
exists(Call c | c.getPositionalArg(index) instanceof Starred)
} or
TDictSplatArgumentPosition()
/** An argument position. */
class ArgumentPosition extends TArgumentPosition {
/** Holds if this position represents a `self`/`cls` argument. */
predicate isSelf() { this = TSelfArgumentPosition() }
/** Holds if this position represents a lambda `self` argument. Only used for tracking flow through captured variables. */
predicate isLambdaSelf() { this = TLambdaSelfArgumentPosition() }
/** Holds if this position represents a positional argument at (0-based) `index`. */
predicate isPositional(int index) { this = TPositionalArgumentPosition(index) }
/** Holds if this position represents a keyword argument named `name`. */
predicate isKeyword(string name) { this = TKeywordArgumentPosition(name) }
/** Holds if this position represents a `*args` argument at (0-based) `index`. */
predicate isStarArgs(int index) { this = TStarArgsArgumentPosition(index) }
/** Holds if this position represents a `**kwargs` argument. */
predicate isDictSplat() { this = TDictSplatArgumentPosition() }
/** Gets a textual representation of this element. */
string toString() {
this.isSelf() and result = "self"
or
this.isLambdaSelf() and result = "lambda self"
or
exists(int pos | this.isPositional(pos) and result = "position " + pos)
or
exists(string name | this.isKeyword(name) and result = "keyword " + name)
or
exists(int index | this.isStarArgs(index) and result = "*args at " + index)
or
this.isDictSplat() and result = "**"
}
}
/** Holds if arguments at position `apos` match parameters at position `ppos`. */
predicate parameterMatch(ParameterPosition ppos, ArgumentPosition apos) {
ppos.isSelf() and apos.isSelf()
or
ppos.isLambdaSelf() and apos.isLambdaSelf()
or
exists(int index | ppos.isPositional(index) and apos.isPositional(index))
or
exists(int index1, int index2 |
ppos.isPositionalLowerBound(index1) and apos.isPositional(index2) and index2 >= index1
)
or
exists(string name | ppos.isKeyword(name) and apos.isKeyword(name))
or
exists(int index | ppos.isStarArgs(index) and apos.isStarArgs(index))
or
exists(int paramIndex, int argIndex | argIndex >= paramIndex |
ppos.isSynthStarArgsElement(paramIndex) and apos.isPositional(argIndex)
)
or
ppos.isDictSplat() and apos.isDictSplat()
or
ppos.isSynthDictSplat() and apos.isDictSplat()
}
// =============================================================================
// Helper predicates
// =============================================================================
/**
* Holds if the function `func` is a staticmethod -- either by having a
* `@staticmethod` decorator or by convention
* (like a `__new__` method on a class is a classmethod even without the decorator).
*/
predicate isStaticmethod(Function func) {
exists(NameNode id | id.getId() = "staticmethod" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
}
/**
* Holds if the function `func` is a classmethod -- either by having a
* `@classmethod` decorator or by convention
* (like a `__new__` method on a class is a classmethod even without the decorator).
*/
predicate isClassmethod(Function func) {
exists(NameNode id | id.getId() = "classmethod" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
or
exists(Class cls |
cls.getAMethod() = func and
func.getName() in [
"__new__", // https://docs.python.org/3.10/reference/datamodel.html#object.__new__
"__init_subclass__", // https://docs.python.org/3.10/reference/datamodel.html#object.__init_subclass__
"__class_getitem__", // https://docs.python.org/3.10/reference/datamodel.html#object.__class_getitem__
]
)
}
/** Holds if the function `func` has a `property` decorator. */
predicate hasPropertyDecorator(Function func) {
exists(NameNode id | id.getId() = "property" and id.isGlobal() |
func.getADecorator() = id.getNode()
)
}
/**
* Holds if the function `func` has a `contextlib.contextmanager`.
*/
predicate hasContextmanagerDecorator(Function func) {
exists(ControlFlowNode contextmanager |
contextmanager.(NameNode).getId() = "contextmanager" and contextmanager.(NameNode).isGlobal()
or
contextmanager.(AttrNode).getObject("contextmanager").(NameNode).getId() = "contextlib"
|
func.getADecorator() = contextmanager.getNode()
)
}
// =============================================================================
// Callables
// =============================================================================
/** A callable defined in library code, identified by a unique string. */
abstract class LibraryCallable extends string {
bindingset[this]
LibraryCallable() { any() }
/** Gets a call to this library callable. */
abstract CallCfgNode getACall();
/** Same as `getACall` but without referring to the call graph or API graph. */
CallCfgNode getACallSimple() { none() }
/** Gets a data-flow node, where this library callable is used as a call-back. */
abstract ArgumentNode getACallback();
}
newtype TDataFlowCallable =
/**
* Is used as the target for all calls: plain functions, lambdas, methods on classes,
* class instantiations, and (in the future) special methods.
*/
TFunction(Function func) {
// Functions with an explicit definition
exists(func.getDefinition())
or
// For generators/list-comprehensions we create a synthetic function.
exists(Comp c | c.getFunction() = func)
} or
/** see QLDoc for `DataFlowModuleScope` for why we need this. */
TModule(Module m) or
TLibraryCallable(LibraryCallable callable)
/** A callable. */
abstract class DataFlowCallable extends TDataFlowCallable {
/** Gets a textual representation of this element. */
abstract string toString();
/** Gets qualified name for this callable, if any. */
abstract string getQualifiedName();
/** Gets the scope of this callable */
abstract Scope getScope();
/** Gets the parameter at position `ppos`, if any. */
abstract ParameterNode getParameter(ParameterPosition ppos);
/** Gets the underlying library callable, if any. */
LibraryCallable asLibraryCallable() { this = TLibraryCallable(result) }
/** Gets the location of this dataflow callable. */
abstract Location getLocation();
}
/** A callable function. */
abstract class DataFlowFunction extends DataFlowCallable, TFunction {
Function func;
DataFlowFunction() {
this = TFunction(func) and
// TODO: Handle @property decorators
not hasPropertyDecorator(func)
}
override string toString() { result = func.toString() }
override string getQualifiedName() { result = func.getQualifiedName() }
override Function getScope() { result = func }
override Location getLocation() { result = func.getLocation() }
/** Gets the positional parameter offset, to take into account self/cls parameters. */
int positionalOffset() { result = 0 }
override ParameterNode getParameter(ParameterPosition ppos) {
// Do not handle lower bound positions (such as `[1..]`) here
// they are handled by parameter matching and would create
// inconsistencies here as multiple parameters could match such a position.
exists(int index | ppos.isPositional(index) |
result.getParameter() = func.getArg(index + this.positionalOffset())
)
or
exists(string name | ppos.isKeyword(name) | result.getParameter() = func.getArgByName(name))
or
// `*args`
exists(int index |
(
ppos.isStarArgs(index) and
result.getParameter() = func.getVararg()
or
ppos.isSynthStarArgsElement(index) and
result = TSynthStarArgsElementParameterNode(this)
)
|
// a `*args` parameter comes after the last positional parameter. We need to take
// self parameter into account, so for
// `def func(foo, bar, *args)` it should be index 2 (pos-param-count == 2)
// `class A: def func(self, foo, bar, *args)` it should be index 2 (pos-param-count - 1 == 3 - 1)
index = func.getPositionalParameterCount() - this.positionalOffset()
or
// no positional argument
not exists(func.getArg(_)) and index = 0
)
or
// `**kwargs`
ppos.isDictSplat() and result.getParameter() = func.getKwarg()
or
ppos.isSynthDictSplat() and result = TSynthDictSplatParameterNode(this)
}
}
/** A plain (non-method) function. */
class DataFlowPlainFunction extends DataFlowFunction {
DataFlowPlainFunction() { not this instanceof DataFlowMethod }
}
/** A method. */
class DataFlowMethod extends DataFlowFunction {
Class cls;
DataFlowMethod() { cls.getAMethod() = func }
/** Gets the class this function is a method of. */
Class getClass() { result = cls }
override int positionalOffset() { result = 1 }
override ParameterNode getParameter(ParameterPosition ppos) {
ppos.isSelf() and result.getParameter() = func.getArg(0)
or
result = super.getParameter(ppos)
}
}
/** A classmethod. */
class DataFlowClassmethod extends DataFlowMethod {
DataFlowClassmethod() { isClassmethod(func) }
}
/** A staticmethod. */
class DataFlowStaticmethod extends DataFlowMethod, DataFlowFunction {
DataFlowStaticmethod() { isStaticmethod(func) }
override int positionalOffset() { result = 0 }
override ParameterNode getParameter(ParameterPosition ppos) {
result = DataFlowFunction.super.getParameter(ppos)
}
}
/**
* A module. This is not actually a callable, but we need this so a
* `ModuleVariableNode` have an enclosing callable.
*/
class DataFlowModuleScope extends DataFlowCallable, TModule {
Module mod;
DataFlowModuleScope() { this = TModule(mod) }
override string toString() { result = mod.toString() }
override string getQualifiedName() { result = mod.getName() }
override Module getScope() { result = mod }
override Location getLocation() { result = mod.getLocation() }
override ParameterNode getParameter(ParameterPosition ppos) { none() }
}
class LibraryCallableValue extends DataFlowCallable, TLibraryCallable {
LibraryCallable callable;
LibraryCallableValue() { this = TLibraryCallable(callable) }
override string toString() { result = "LibraryCallableValue: " + callable.toString() }
override string getQualifiedName() { result = callable.toString() }
/** Gets a data-flow node, where this library callable is used as a call-back. */
ArgumentNode getACallback() { result = callable.getACallback() }
override Scope getScope() { none() }
override ParameterNode getParameter(ParameterPosition ppos) { none() }
override LibraryCallable asLibraryCallable() { result = callable }
override Location getLocation() { none() }
}
// =============================================================================
// Type trackers used to resolve calls.
// =============================================================================
/** Gets a call to `type`. */
private CallCfgNode getTypeCall() {
exists(NameNode id | id.getId() = "type" and id.isGlobal() |
result.getFunction().asCfgNode() = id
)
}
/** Gets a call to `super`. */
private CallCfgNode getSuperCall() {
// While it is possible to reference super and call it later, it's almost never done in
// practice. From looking at top 1000 projects, there were a few uses around mocking (see
// link below), but otherwise only 2 edgecases. Overall it seems ok to ignore this complexity.
//
// https://github.com/python/cpython/blob/18b1782192f85bd26db89f5bc850f8bee4247c1a/Lib/unittest/mock.py#L48-L50
exists(NameNode id | id.getId() = "super" and id.isGlobal() |
result.getFunction().asCfgNode() = id
)
}
/**
* Holds if the file `f` should be ignored when computing the call-graph.
*
* We currently see a performance problem when analyzing the `sympy` PyPI package,
* which can be part of the database when dependencies are installed and extracted.
* From what we can understand, SymPy is using Python in a exotic way, so the fact that
* our analysis currently does not handle this project has nothing to say about our
* ability to handle normal Python code. Furthermore, SymPy does not look to be relevant
* in a security context, so we should not lose out on any security results by doing
* this.
*/
private predicate ignoreForCallGraph(File f) {
f.getAbsolutePath().matches("%/site-packages/sympy/%")
}
private module TrackFunctionInput implements CallGraphConstruction::Simple::InputSig {
class State = Function;
predicate start(Node start, Function func) {
start.asExpr() = func.getDefinition()
or
// when a function is decorated, it's the result of the (last) decorator call that
// is used
start.asExpr() = func.getDefinition().(FunctionExpr).getADecoratorCall()
}
predicate filter(Node n) { ignoreForCallGraph(n.getLocation().getFile()) }
}
/**
* Gets a reference to the function `func`.
*/
Node functionTracker(Function func) {
CallGraphConstruction::Simple::Make<TrackFunctionInput>::track(func)
.(LocalSourceNode)
.flowsTo(result)
}
private module TrackClassInput implements CallGraphConstruction::Simple::InputSig {
class State = Class;
predicate start(Node start, Class cls) {
start.asExpr() = cls.getParent()
or
// when a class is decorated, it's the result of the (last) decorator call that
// is used
start.asExpr() = cls.getParent().getADecoratorCall()
or
// `type(obj)`, where obj is an instance of this class
start = getTypeCall() and
start.(CallCfgNode).getArg(0) = classInstanceTracker(cls)
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to the class `cls`.
*/
Node classTracker(Class cls) {
CallGraphConstruction::Simple::Make<TrackClassInput>::track(cls).(LocalSourceNode).flowsTo(result)
}
private module TrackClassInstanceInput implements CallGraphConstruction::Simple::InputSig {
class State = Class;
predicate start(Node start, Class cls) {
exists(Annotation ann |
ann = classTracker(cls).asExpr() and
start.asExpr() = ann.getAnnotatedExpression()
)
or
resolveClassCall(start.(CallCfgNode).asCfgNode(), cls)
or
// result of `super().__new__` as used in a `__new__` method implementation
exists(Class classUsedInSuper |
fromSuperNewCall(start.(CallCfgNode).asCfgNode(), classUsedInSuper, _, _) and
classUsedInSuper = getADirectSuperclass*(cls)
)
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to an instance of the class `cls`.
*/
Node classInstanceTracker(Class cls) {
CallGraphConstruction::Simple::Make<TrackClassInstanceInput>::track(cls)
.(LocalSourceNode)
.flowsTo(result)
}
private module TrackSelfInput implements CallGraphConstruction::Simple::InputSig {
class State = Class;
predicate start(Node start, Class classWithMethod) {
exists(Function func |
func = classWithMethod.getAMethod() and
not isStaticmethod(func) and
not isClassmethod(func)
|
start.asExpr() = func.getArg(0)
)
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to the `self` argument of a method on class `classWithMethod`.
* The method cannot be a `staticmethod` or `classmethod`.
*/
Node selfTracker(Class classWithMethod) {
CallGraphConstruction::Simple::Make<TrackSelfInput>::track(classWithMethod)
.(LocalSourceNode)
.flowsTo(result)
}
private module TrackClsArgumentInput implements CallGraphConstruction::Simple::InputSig {
class State = Class;
predicate start(Node start, Class classWithMethod) {
exists(Function func |
func = classWithMethod.getAMethod() and
isClassmethod(func)
|
start.asExpr() = func.getArg(0)
)
or
// type(self)
start = getTypeCall() and
start.(CallCfgNode).getArg(0) = selfTracker(classWithMethod)
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to the enclosing class `classWithMethod` from within one of its
* methods, either through the `cls` argument from a `classmethod` or from `type(self)`
* from a normal method.
*/
Node clsArgumentTracker(Class classWithMethod) {
CallGraphConstruction::Simple::Make<TrackClsArgumentInput>::track(classWithMethod)
.(LocalSourceNode)
.flowsTo(result)
}
private module TrackSuperCallNoArgumentInput implements CallGraphConstruction::Simple::InputSig {
class State = Function;
predicate start(Node start, Function func) {
not isStaticmethod(func) and
exists(CallCfgNode call | start = call |
call = getSuperCall() and
not exists(call.getArg(_)) and
call.getScope() = func
)
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to the result of calling `super` without any argument, where the
* call happened in the method `func` (either a method or a classmethod).
*/
Node superCallNoArgumentTracker(Function func) {
CallGraphConstruction::Simple::Make<TrackSuperCallNoArgumentInput>::track(func)
.(LocalSourceNode)
.flowsTo(result)
}
private module TrackSuperCallTwoArgumentInput implements CallGraphConstruction::Simple::InputSig {
additional predicate superCall(CallCfgNode call, Class cls, Node obj) {
call = getSuperCall() and
call.getArg(0) = classTracker(cls) and
call.getArg(1) = obj
}
class State = CallCfgNode;
predicate start(Node start, CallCfgNode call) {
superCall(call, _, _) and
start = call
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/**
* Gets a reference to the result of calling `super` with 2 arguments, where the
* first is a reference to the class `cls`, and the second argument is `obj`.
*/
Node superCallTwoArgumentTracker(Class cls, Node obj) {
exists(CallCfgNode call |
TrackSuperCallTwoArgumentInput::superCall(call, cls, obj) and
CallGraphConstruction::Simple::Make<TrackSuperCallTwoArgumentInput>::track(call)
.(LocalSourceNode)
.flowsTo(result)
)
}
// =============================================================================
// MRO
// =============================================================================
/**
* Gets a direct superclass of the argument `cls`, if any.
*
* For `A` with the class definition `class A(B, C)` it will have results `B` and `C`.
*/
Class getADirectSuperclass(Class cls) { cls.getABase() = classTracker(result).asExpr() }
/**
* Gets a direct subclass of the argument `cls`, if any.
*
*For `B` with the class definition `class A(B)` it will have result `A`.
*/
Class getADirectSubclass(Class cls) { cls = getADirectSuperclass(result) }
/**
* Gets a class that, from an approximated MRO calculation, might be the next class used
* for member-lookup when `super().attr` is used inside the class `cls`.
*
* In the example below, with `cls=B`, this predicate will have `A` and `C` as results.
* ```py
* class A: pass
* class B(A): pass
* class C(A): pass
* class D(B, C): pass
* ```
*
* NOTE: This approximation does not handle all cases correctly, and in the example
* below, with `cls=A` will not have any results, although it should include `Y`.
*
* ```py
* class A: pass
* class B(A): pass
* class X: pass
* class Y(X): pass
* class Ex(B, Y): pass
* ```
*
* NOTE for debugging the results of this predicate: Since a class can be part of
* multiple MROs, results from this predicate might only be valid in some, but not all,
* inheritance chains: This is the case with the result `C` for `cls=B` in the first
* example -- if `B` and `C` are defined in the same file, but `D` in a different file,
* this might make the results from this predicate difficult to comprehend at first.
*
* For more info on the C3 MRO used in Python see:
* - https://docs.python.org/3/glossary.html#term-method-resolution-order
* - https://www.python.org/download/releases/2.3/mro/
* - https://opendylan.org/_static/c3-linearization.pdf
*/
private Class getNextClassInMro(Class cls) {
// class A(B, ...):
// `B` must be the next class after `A` in the MRO for A.
cls.getBase(0) = classTracker(result).asExpr()
or
// class A(B, C, D):
// - `C` could be the next class after `B` in MRO.
// - `D` could be the next class after `C` in MRO.
exists(Class sub, int i |
sub.getBase(i) = classTracker(cls).asExpr() and
sub.getBase(i + 1) = classTracker(result).asExpr() and
not result = cls
)
// There are three important properties for MRO computed with C3 in Python:
//
// 1) monotonicity: if C1 precedes C2 in the MRO of C, then C1 precedes C2 in the MRO
// of any subclass of C.
// 2) local precedence ordering: if C1 precedes C2 in the list of superclasses for C,
// they will keep the same order in the MRO for C (and due to monotonicity, any
// subclass).
// 3) consistency with the extended precedence graph: if A and B (that are part of the
// class hierarchy of C) do not have a subclass/superclass relationship on their
// own, the ordering of A and B in the MRO of C will be determined by the local
// precedence ordering in the classes that use both A and B, either directly or
// through a subclass. (see paper for more details)
//
// Note that not all class hierarchies are allowed with C3, see the Python 2.3 article
// for examples.
}
/**
* Gets a potential definition of the function `name` according to our approximation of
* MRO for the class `cls` (see `getNextClassInMro` for more information).
*/
Function findFunctionAccordingToMro(Class cls, string name) {
result = cls.getAMethod() and
result.getName() = name
or
not class_has_method(cls, name) and
result = findFunctionAccordingToMro(getNextClassInMro(cls), name)
}
/**
* Join-order helper for `findFunctionAccordingToMro` and `findFunctionAccordingToMroKnownStartingClass`.
*/
pragma[nomagic]
private predicate class_has_method(Class cls, string name) { cls.getAMethod().getName() = name }
/**
* Gets a class that, from an approximated MRO calculation, might be the next class
* after `cls` in the MRO for `startingClass`.
*
* Note: this is almost the same as `getNextClassInMro`, except we know the
* `startingClass`, which can give slightly more precise results.
*
* See QLDoc for `getNextClassInMro`.
*/
Class getNextClassInMroKnownStartingClass(Class cls, Class startingClass) {
cls.getBase(0) = classTracker(result).asExpr() and
cls = getADirectSuperclass*(startingClass)
or
exists(Class sub, int i | sub = getADirectSuperclass*(startingClass) |
sub.getBase(i) = classTracker(cls).asExpr() and
sub.getBase(i + 1) = classTracker(result).asExpr() and
not result = cls
)
}
/**
* Gets a potential definition of the function `name` of the class `cls` according to our approximation of
* MRO for the class `startingCls` (see `getNextClassInMroKnownStartingClass` for more information).
*
* Note: this is almost the same as `findFunctionAccordingToMro`, except we know the
* `startingClass`, which can give slightly more precise results.
*/
Function findFunctionAccordingToMroKnownStartingClass(Class cls, Class startingClass, string name) {
result = cls.getAMethod() and
result.getName() = name and
cls = getADirectSuperclass*(startingClass)
or
not class_has_method(cls, name) and
result =
findFunctionAccordingToMroKnownStartingClass(getNextClassInMroKnownStartingClass(cls,
startingClass), startingClass, name)
}
/**
* Gets a potential definition of the function `name` according to our approximation of
* MRO for the class `startingCls` (see `getNextClassInMroKnownStartingClass` for more information).
*
* Note: this is almost the same as `findFunctionAccordingToMro`, except we know the
* `startingClass`, which can give slightly more precise results.
*/
pragma[inline]
Function findFunctionAccordingToMroKnownStartingClass(Class startingClass, string name) {
result = findFunctionAccordingToMroKnownStartingClass(startingClass, startingClass, name)
}
// =============================================================================
// attribute trackers
// =============================================================================
private module TrackAttrReadInput implements CallGraphConstruction::Simple::InputSig {
class State = AttrRead;
predicate start(Node start, AttrRead attr) {
start = attr and
pragma[only_bind_into](attr.getObject()) in [
classTracker(_), classInstanceTracker(_), selfTracker(_), clsArgumentTracker(_),
superCallNoArgumentTracker(_), superCallTwoArgumentTracker(_, _)
]
}
predicate filter(Node n) {
ignoreForCallGraph(n.getLocation().getFile())
or
n.(ParameterNodeImpl).isParameterOf(_, any(ParameterPosition pp | pp.isSelf()))
}
}
/** Gets a reference to the attribute read `attr` */
Node attrReadTracker(AttrRead attr) {
CallGraphConstruction::Simple::Make<TrackAttrReadInput>::track(attr)
.(LocalSourceNode)
.flowsTo(result)
}
// =============================================================================
// call and argument resolution
// =============================================================================
newtype TCallType =
/** A call to a function that is not part of a class. */
CallTypePlainFunction() or
/**
* A call to an "normal" method on a class instance.
* Does not include staticmethods or classmethods.
*/
CallTypeNormalMethod() or
/** A call to a staticmethod. */
CallTypeStaticMethod() or
/** A call to a classmethod. */
CallTypeClassMethod() or
/**
* A call to method on a class, not going through an instance method, such as
*
* ```py
* class Foo:
* def method(self, arg):
* pass
*
* foo = Foo()
* Foo.method(foo, 42)
* ```
*/
CallTypeMethodAsPlainFunction() or
/** A call to a class. */
CallTypeClass() or
/** A call on a class instance, that goes to the `__call__` method of the class */
CallTypeClassInstanceCall()
/** A type of call. */
class CallType extends TCallType {
string toString() {
this instanceof CallTypePlainFunction and
result = "CallTypePlainFunction"
or
this instanceof CallTypeNormalMethod and
result = "CallTypeNormalMethod"
or
this instanceof CallTypeStaticMethod and
result = "CallTypeStaticMethod"
or
this instanceof CallTypeClassMethod and
result = "CallTypeClassMethod"
or
this instanceof CallTypeMethodAsPlainFunction and
result = "CallTypeMethodAsPlainFunction"
or
this instanceof CallTypeClass and
result = "CallTypeClass"
or
this instanceof CallTypeClassInstanceCall and
result = "CallTypeClassInstanceCall"
}
}
// -------------------------------------
// method call resolution
// -------------------------------------
private module MethodCalls {
/**
* Holds if `call` is a call to a method `target` on an instance or class, where the
* instance or class is not derived from an implicit `self`/`cls` argument to a method
* -- for that, see `callWithinMethodImplicitSelfOrCls`.
*
* It is found by making an attribute read `attr` with the name `functionName` on a
* reference to the class `cls`, or to an instance of the class `cls`. The reference the
* attribute-read is made on is `self`.
*/
pragma[nomagic]
private predicate directCall(
CallNode call, Function target, string functionName, Class cls, AttrRead attr, Node self
) {
target = findFunctionAccordingToMroKnownStartingClass(cls, functionName) and
directCall_join(call, functionName, cls, attr, self)
}
/** Extracted to give good join order */
pragma[nomagic]
private predicate directCall_join(
CallNode call, string functionName, Class cls, AttrRead attr, Node self