-
Notifications
You must be signed in to change notification settings - Fork 128
remove query comments #560 #749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
remove query comments #560 #749
Conversation
The previous approach is incorrect because we need a str representing the query instead of just a ScanResult. This approach reconstructs the query with string slicing by using the indices where comments are present as returned by the scanner.
This version will accept a list of regexs to keep
Add ability to keep pgdog metadata comments anywhere in the string
Not preserving these would mean queries with comments would never match their commentless counterparts
|
dev-lew seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| } | ||
| } | ||
|
|
||
| if comment::has_comments(query.query(), ctx.sharding_schema.query_parser_engine)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is here because we might as well short circuit if there are no comments. There is no need to tokenize the query and iterate through it again for no reason.
| let start = st.start as usize; | ||
| let end = st.end as usize; | ||
|
|
||
| out.push_str(&query[cursor..start]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code involving cursor keeps non token characters (between and at the end of all tokens) like spaces to preserve the original query. We don't want to do anything extra like normalize it or something.
|
|
||
| pub fn has_comments(query: &str, engine: QueryParserEngine) -> Result<bool, Error> { | ||
| let result = match engine { | ||
| QueryParserEngine::PgQueryProtobuf => scan(query), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small concern I have here is we are calling scan multiple times. It's actually a pretty expensive call, because we have to go all the way to the Postgres parser, so we try to minimize it as much as possible (i.e. that's why we have the cache). It would be great if we could do this all inside fn comment, somehow, calling scan only once. Maybe we need to move comment detection a little higher in the call stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worst case we call scan 3 times, one for has_comments, one for remove_comments, and one for comment if we get to with_context call and create an Ast. A simple solution is to make has_comments return the Option<Vec<ScanToken>> so that remove_comments can use it, that would make the worst case 2 scan calls.
If we wanted to call scan a single time in comment, maybe we have a hashmap that maps query to tokens and populate it on every call to comment. Then, has_comments and remove_comments can operate on the tokens in the hashmap for a given query. If its not in the hashmap, we can actually skip the entire block and go right to the with_context call. This should reduce the worst case to a single scan.
The memory usage of that hashmap is unclear to me though, its probably unbounded as long as pgdog is running...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an easier way to do this is to move comment detection up here:
| let entry = Ast::with_context(query, ctx, prepared_statements)?; |
comment will return a String without the comments, iff the string had comments. Then you can check the cache again, and if it's a hit, return the cached Ast.
Else, pass the comment in as an argument for Ast struct creation, so you don't have to scan twice.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Logic for comment parsing happens before Ast creation resulting in fewer calls to scan.
Standardized the returned query if there are comments. We prepend and sort the metadata comments so they are consistent.
This PR addresses #560
It modifies the comment function (now parse_comment), which will capture a role and shard and then return the original query string with all comments removed except for those that match 0 or more regexes.
If there is a cache miss, we do another cache lookup after removing comments except for those used for pgdog metadata (pgdog_shard, etc) in the hope that we can increase cache hit rate.
Two associated helper functions were added as well.
Also, some tests were modified to reflect the new functionality.