-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Rust: Add support for MaD sources and sinks with access paths #18298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cb85b26 to
707bebb
Compare
efbbea1 to
67f7387
Compare
67f7387 to
1b31c90
Compare
geoffw0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward to this PR being merged. I want to use this functionality in several places already, the tests look like exactly what I need, and I'm experimenting with this branch locally.
I don't feel very confident reviewing the code changes in this PR though - if there are any specific parts you'd like me to look at or discuss please say so.
| extensible: sourceModel | ||
| data: | ||
| - ["repo::test", "crate::enum_source", "ReturnValue.Variant[crate::MyFieldEnum::D::field_d]", "test-source", "manual"] | ||
| - ["repo::test", "<crate::MyFieldEnum>::source", "ReturnValue.Variant[crate::MyFieldEnum::C::field_c]", "test-source", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the repo:: prefix mean, and why do we need the crate:: prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo::test bit matches Resolvable.getResolvedCrateOrigin and crate::enum_source matches Resolvable.getResolvedPath, so this is simply what we get from the extractor. As for the repo prefix, I'm not sure, but for the crate prefix this appears to be always present in canonical paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if there are items from multiple crates involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect those to have different getResolvedCrateOrigin values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect those to have different
getResolvedCrateOriginvalues.
That's right. I was wondering about the case where a function from one crate has arguments or return values of types in other crates. How would the lines above look if MyFieldEnum was defined in a different crate than enum_source ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I decided to not include the crate origins in the Variant arguments, since that would bloat things. But you are right that this can potentially give rise to some ambiguity, if two crates define variants with the same canonical paths.
|
@aschackmull : Are you able to review the shared bits in |
| TSourceOutputNode(SourceNode source, SummaryNodeState state, string kind, string model) { | ||
| state.isSourceOutputState(source, _, kind, model) | ||
| } or | ||
| TSinkInputNode(SinkNode sink, SummaryNodeState state, string kind, string model) { | ||
| state.isSinkInputState(sink, _, kind, model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something needs to change here. We worked very hard to untangle and stratify DataFlow::Node vs. SummaryNode such that FlowSummaryImpl wasn't dependent on DataFlow::Node. But now they've been made mutually recursive. We need to fix that, i.e. we should not embed DataFlow::Nodes into SummaryNodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SinkNode is not a subset of DataFlow::Node, rather it is a subset of the parameterized type SinkBase, which for Rust is subset of AstNode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Right.
So I guess the only new connection to DataFlow::Node is in sourceLocalStep via StepsInput::getSourceNode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SourceNode and SinkNode ought to be called something different, because it's perhaps somewhat unclear what kind of "nodes" they are supposed to be - at least I got confused. They're not DataFlow::Nodes and they are not SummaryNodes. Maybe simply use SourceElement/SinkElement instead? And likely elaborate related qldoc a bit.
9dfebe3 to
868caf9
Compare
aschackmull
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared code LGTM.
8b257aa to
0f045a8
Compare
0f045a8 to
653d122
Compare
|
🎉 |
Overview
This PR adds support for defining flow sources/sinks for Rust with non-trivial access paths. For example, one may define
which models calls to
crate::my_option_sourceas flow sources of kindtest-source, but where the origin of tainted data is not directly the result of the call, but rather stored insideOption::Somein the result of the call.Similarly, one may define a sink restricted to data stored inside
Option::Some:Implementation
The implementation piggy-backs on the existing
FlowSummaryImpllibrary, which already has functionality for specifying flow summaries with non-trivial input/output access paths. For a flow source likemy_option_sourceabove, we synthesize a store step from synthetic source nodes to actual calls tomy_option_source, and dually for sinks likemy_option_sink, we synthesize read steps from arguments to synthetic sink nodes.The functionality is currently limited to Rust, but other languages should be able to adopt relatively easily.