-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Hide points-to imports in python.qll
#20739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Hide points-to imports in python.qll
#20739
Conversation
| not exists(ExprWithPointsTo left, ExprWithPointsTo right, Value val | | ||
| comp.compares(left, op, right) and | ||
| exists(ImmutableLiteral il | il.getLiteralValue() = val) | ||
| exists(ImmutableLiteral il | il = val.(ConstantObjectInternal).getLiteral()) |
Check warning
Code scanning / CodeQL
Expression can be replaced with a cast Warning
685b2db to
d643079
Compare
6f767db to
e532fb6
Compare
For now, these have just been made into `private` imports. After doing
this, I went through all of the (now not compiling) files and added in
private imports to the modules that they actually depended on.
I also added an explicit import of `LegacyPointsTo` (even though it may
be unnecessary) in cases where the points-to dependency was somewhat
surprising (and one we want to get rid of). This was primarily inside
the various SSA layers.
For modules inside `semmle.python.{types, objects, pointsto}` I did not
bother, as these are fairly clearly related to points-to.
This frees `Class.qll`, `Exprs.qll`, and `Function.qll` from the clutches of points-to. For the somewhat complicated setup with `getLiteralObject` (an abstract method), I opted for a slightly ugly but workable solution of just defining a predicate on `ImmutableLiteral` that inlines each predicate body, special-cased to the specific instance to which it applies.
These methods were in `pointsto.Base` but did not actually interact with the points-to machinery directly, so they were easy to move out.
One might argue that these should be rewritten entirely to use more modern APIs, but for now I'll be content with just having them compile properly.
Mostly just adding `private import LegacyPointsTo`. Sometimes getting rid of other imports that are superceded by that module.
Gets rid of a bunch of predicates relating to reachability (which depended on the modelling of exceptions, which uses points-to), moving them to `LegacyPointsTo`. In the process, we gained a new class `BasicBlockWithPointsTo`.
Turns out the `ImportTime` module (despite living in `semmle.python.types` does not actually depend on points-to, so some of the `LegacyPointsTo` imports could be replaced or removed.
For whatever reason, the CFG node for exceptions and exception groups was placed with the points-to code. (Probably because a lot of the predicates depended on points-to.) However, as it turned out, two of the SSA modules only depended on non-points-to properties of these nodes, and so it was fairly straightforward to remove the imports of `LegacyPointsTo` for those modules. In the process, I moved the aforementioned CFG node types into `Flow.qll`, and changed the classes in the `Exceptions` module to the `...WithPointsTo` form that we introduced elsewhere.
The `Builtins` module is deeply entwined with points-to, so it would be nice to not have this dependence. Happily, the only thing we used `Builtin` for was to get the names of known builtins, and for this we already maintain such a set of names in `dataflow.new.internal.Builtins`.
e532fb6 to
5b47fcb
Compare
Happily, this was not as deeply entwined as it looked at first glance.
Gets rid of the `getMetrics` methods on the `Function`, `Class`, and
`Module` classes. To access the metrics, one must first import the
`LegacyPointsTo` module, and then either change the type to
`{Function,Class,Module}Metrics` or cast to the appropriate type.
9358997 to
e490cb6
Compare
In hindsight, having a `.getMetrics()` method that just returns `this` is somewhat weird. It's possible that it predates the existence of the inline cast, however.
e490cb6 to
c6ad438
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes all points-to analysis related imports from the top-level python.qll module by making them private. Users who depend on points-to functionality must now explicitly import the new LegacyPointsTo module. This is a breaking change that affects how various classes and predicates are accessed - for example, Function.getFunctionObject() is no longer directly available and requires using FunctionWithPointsTo from the LegacyPointsTo module.
Key Changes:
- Made points-to related imports private in
python.qll - Created new
LegacyPointsTomodule that exposes points-to functionality - Refactored test files, queries, and library code to import
LegacyPointsToexplicitly - Moved predicates and methods from base classes into specialized extension classes (e.g.,
FunctionWithPointsTo,SsaVariableWithPointsTo) - Restructured code organization to separate points-to dependent functionality
Reviewed changes
Copilot reviewed 284 out of 284 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/python.qll |
Made points-to and related imports private |
python/ql/lib/LegacyPointsTo.qll |
New module exposing points-to functionality with extension classes |
python/ql/lib/semmle/python/*.qll |
Removed points-to methods from base classes, moved to extension classes |
python/ql/lib/semmle/python/types/*.qll |
Added necessary imports, removed dependencies from exposed modules |
python/ql/lib/semmle/python/essa/*.qll |
Moved helper predicates to appropriate modules |
python/ql/test/**/*.ql |
Added private import LegacyPointsTo to all affected test queries |
python/ql/src/**/*.ql |
Added private import LegacyPointsTo to all affected source queries |
python/ql/examples/**/*.ql |
Added private import LegacyPointsTo to example snippets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking this on, Taus! To my untrained snake eyes this looks good, but it would probably be good to have another pair of snake eyes on it as well before we merge it. I guess we have to wait for DCA to stop misbehaving anyway
python/ql/lib/change-notes/2025-11-26-remove-top-level-points-to-import.md
Outdated
Show resolved
Hide resolved
This makes things a bit cleaner.
After this, the only non-private (and non-`LegacyPointsTo`) imports of
`semmle.python.{types,objects,pointsto}.*` are in
`semmle.python.objects.ObjectInternal`, which is reasonable, as that is
the entry point for the entire internal object API.
Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>
|
There doesn't seem to be any impact where performance is concerned. |
yoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, congratulations on getting this far!
I had one question, but it is not blocking.
I think the solution for ImmutableLiterals is perfectly acceptable.
| private import semmle.python.dataflow.new.internal.ImportStar | ||
| private import semmle.python.dataflow.new.TypeTracking | ||
| private import semmle.python.dataflow.new.internal.DataFlowPrivate | ||
| private import semmle.python.essa.SsaDefinitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this necessary now and not before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is not entirely obvious!
That module was reexported by semmle.python.pointsto.Base (see below) which was imported in python.qll, but now that import has (of course) been made private.
This PR removes all points-to related imports from
python.qll. For now, the imports have just been made private, on the off-chance that actual removal would affect the behaviour of the optimiser. At a later date, these can be removed fully.The remaining uses of
LegacyPointsToinside thepython/ql/libdirectory are in the following files:of these, the ones in
/typesand/objectform the main points-to analysis itself, and/dataflow/oldis long since deprecated. The remaining modules have a fairly deep use of points-to, and so removal was not feasible (but none of them are imported frompython.qllat this point, so leaving them in place should be fine).