Give precises types to all **opts argument, optimize ensure_list()#7393
Give precises types to all **opts argument, optimize ensure_list()#7393georgesittas merged 36 commits intotobymao:mainfrom
ensure_list()#7393Conversation
- it was only checking for tuple or list, `Collection` was way too broad - uncessary full copy of the list in case it was already one
- avoid overloads with simple type union - narrow the accepted types at various places This allows to delete a few `t.cast` calls
|
/benchmark |
|
@OutSquareCapital thanks for the PR- seems like we'll need some time to get to reviewing this. Thanks for the patience. |
georgesittas
left a comment
There was a problem hiding this comment.
@OutSquareCapital did a quick first pass. Can you take a look and see if we can simplify this a but more before I review everything?
|
Let me know when this is ready for another review. |
Checks passed, ready |
georgesittas
left a comment
There was a problem hiding this comment.
Great progress @OutSquareCapital.
| def tokenize(self, sql: str, **opts: object) -> list[Token]: | ||
| return self.tokenizer(**opts).tokenize(sql) | ||
| def tokenize(self, sql: str, dialect: DialectType = None, **opts: object) -> list[Token]: | ||
| return self.tokenizer(dialect=dialect, **opts).tokenize(sql) | ||
|
|
||
| def tokenizer(self, **opts: t.Any) -> Tokenizer: | ||
| return self.tokenizer_class(**{"dialect": self, **opts}) | ||
| def tokenizer(self, dialect: DialectType = None, **opts: object) -> Tokenizer: | ||
| return self.tokenizer_class(dialect=dialect or self, **opts) | ||
|
|
||
| def jsonpath_tokenizer(self, **opts: t.Any) -> JSONPathTokenizer: | ||
| return self.jsonpath_tokenizer_class(**{"dialect": self, **opts}) | ||
| def jsonpath_tokenizer(self, dialect: DialectType = None) -> JSONPathTokenizer: | ||
| return self.jsonpath_tokenizer_class(dialect=dialect or self) |
There was a problem hiding this comment.
The tokenizer's constructor appears to expect an **ops but it's currently dead code. Only the dialect kwarg is used. This is a good candidate for cleaning up, post-merge.
There was a problem hiding this comment.
It's actually used at one place, in the Athena Dialect.
But then it call super().tokenize() and pass it the same updated kwargs, who will anyway be ignored. So there's this place to clean up as well.
There was a problem hiding this comment.
I see, thanks for pointing it out. Looks like a remnant from the recent refactors.
|
FYI, I adressed all your latest comments and done the needed changes, except the |
… correct annotation of parse_args
|
A couple of final comments + need to fix CI, then this should be good to go. Thank you for the quick turnaround. |
Yep. I'm trying to work on the |
…n` is NOT already an `Expr`
|
Okay! all checks are passing and I have reworked the |
|
Ok, just one more small comment from me, LGTM otherwise. This #7393 (comment) should be a small lift. Good candidate for a follow-up PR, if you want to take it on. Thanks a bunch for these improvements, great work! |
Thanks for the kind words! By the way, if it's ok I will continue with follow-ups on typing PR's. |
Definitely agreed. Let's do that moving forward, and thanks for proactively suggesting it.
Is there a PEP for what to prefer? I'm assuming the "modern syntax", as you refer to it, is the proper way to type unions? If so, we can do that in a follow-up, sure. |
Yes, ,more infos here: By the way with a few more rules enabled Ruff would allow to clean up and standardize this across the codebase at once, and for other type syntaxes also. But that's another story/decision. |
This PR is the natural next step of my previous two.
Now that I'm more familiarised with the codebase and that all
**optshave a base annotation, I could finally start giving them precise type hints.To do this, a hierarchy of
TypedDicthas been created in_typing.pymodule, andtyping_extensions.Unpackis used at each call site.Unfortunately, this type of "brutal" change (from
Any | objecttoT) can't be easily localized and often require a lot of changes at various places, hence the PR size.This PR also bring performance improvements.
ensure_list()is called 22 times, but had an inefficient implementation -> doinglist(x)whenxis already one will copy it.Now it simply return it directly, vastly improving the efficiency. This function was also incorrectly typed, and thus this required various typing adjustements at other places (only list and tuple containers handled, instead of
Collection).Expr.isinmethod internalt.castcall is moved from the elements to the container. Python call overhead is a real thing, so each small improvement like this help.Yes the TypedDict hierarchy is a bit daunting at first (naming could be better..), but unless there's a refactor toward a builder pattern, this is the only way to manage various default arguments across different functions AND
**kwargs.OFC, another solution could be to have kwargs for the public API, and then create dataclasses from them for internals.
Both of those solution would avoid a TypedDict hierarchy, and would improve performance (repeated unpacking of arguments, especially kwargs, is relatively expensive)
All in all, strong typing across all public API entry points is a big improvement for the end user, who can now directly see all possible options in his IDE, with early warnings from LSP and type checkers in case of incorrect values.
In VSCode with basedpyright for example:
Question for the next ones
I saw that most of the time
t.Unionandt.Optionalwere used, which is what I prioritized in my own work.However, I saw already existing annotations with the modern syntax, e.g
str | int | Noneinstead oft.Optional[t.Union[str, int]]I have a strong preference for the modern one. What should be the convention?