-
Notifications
You must be signed in to change notification settings - Fork 487
[explain] new syntax (LIR-based, Postgres like) #32262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ed61038 to
bcb046d
Compare
99c9204 to
9322754
Compare
4f8da05 to
394276a
Compare
1dc2362 to
3a5f0ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quickly typing some preliminary comments from looking at the new output. I need to sign off now for ~2 hours, but I'll review it a bit more later today. I think this new output format is eventually gonna be great!
For Map/Filter/Project, did we decide to not show the scalar expressions? I'm a bit concerned that this would make it hard to map it back to SQL (which is already harder for us than for Postgres, because of our plan operators being so far away from SQL, e.g., no outer joins, subqueries, window functions, ...).
Indexed l0 -- This appears both for Gets to global IDs and to local IDs, where the referred CTE ends with an arrangement. I'm not sure whether it's ok to use the word "index" in the latter case, since an index is a somewhat different concept from an arrangement. Maybe it would confuse users into thinking that it's using an index that they created. Maybe just somehow use the word "arranged" when it's a local ID?
Delta joins are extremely verbose. How about wiring up the existing WITH (JOIN IMPLEMENTATIONS) flag to the new explain, and show the verbose stuff only when it's set?
(IndexUsageType is not shown inline in the plan (only at the end, separately), but this is not so easy to fix, so ok if this will be a follow-up.)
"Consolidating Monotonic Hierarchical GroupAggregate" -- I'm thinking that maybe we should drop the word Hierarchical when it's also Monotonic, since there is actually no hierarchy rendered in this case. (This is a confusing thing also in the code, which we should eventually refactor.)
For a "Map/Filter/Project", I think "Input key" is important only in some extremely rare cases that are kinda deprecated (literal constraints discovered in MIR-to-LIR lowering), so let's not show it.
There might be some fused MFPs that are not shown. E.g., for TPC-H 04 I don't see the 1993-07-01 filter anywhere. This should be somewhere above or inside the join.
TPC-H 06: the Indexed ... Filter looks like as if we were using the index to do a range lookup, but this is not the case.
Literal constraints: Fast path looks good, but the slow path got much more verbose: we are now showing the underlying Differential join. Not sure we can do something with this in this PR, just something to keep in mind. E.g.:
create table t(x int, y int);
explain select * from t as t1 join t as t2 on t1.y = t2.y where t1.x=5;
Physical Plan
Explained Query:
→Differential Join %0 » %1
Join stage %0: Lookup key #1{y} in %1
→Arrange
Keys: 1 arrangement available, plus raw stream
Arrangement 0: #1{y}
→Differential Join %1 » %0
Join stage %0: Lookup key #0{x} in %0
→Indexed materialize.public.t
→Arrange
Keys: 1 arrangement available, plus raw stream
Arrangement 0: #0
→Constant (1 row)
→Arrange
Keys: 1 arrangement available, plus raw stream
Arrangement 0: #1{y}
→Indexed materialize.public.t
Filter: (#1{y}) IS NOT NULL
Key: (#0{x})
Used Indexes:
- materialize.public.i1 (*** full scan ***, lookup)
Target cluster: quickstart
3a5f0ba to
ea2aa8f
Compare
Thank you!
We do show them---just not projects. Check out test/sqllogictest/explain/default.slt for many examples.
Sure, changed!
Thoughts on what to trim/condense?
Sure, changed!
Is this true for the
It's getting pushed down:
What do you mean? This is what I get:
Weird, I get something different: |
|
|
||
| ## Which part of my query runs slowly or uses a lot of memory? | ||
|
|
||
| {{< public-preview />}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the public-preview annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
|
||
| You can [`EXPLAIN`](/sql/explain-plan/) a query to see how it will be run as a | ||
| dataflow. In particular, `EXPLAIN PHYSICAL PLAN` will show the concrete, fully | ||
| dataflow. In particular, `EXPLAIN PHYSICAL PLAN` (the default) will show the concrete, fully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... we had mentioned Explain Analyze in https://github.com/MaterializeInc/materialize/pull/32555/files#diff-2a594501d91e5be31dbb0fe53b698815099fa917b888d7a5c3033073d0e440b5R289 ... Should we remention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's in the next graf... should I move it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: I added a reference to EXPLAIN ANALYZE towards the bottom of EXPLAIN PLAN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ... got it! Thank you!!!
antiguru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Format LGTM |
Ah, ok, sorry! Why not show Projects? Because we don't have col names anyway for those currently?
Maybe by default it would be enough to know that it's a delta join and the arrangement keys for each input? And
I'd say we can also hide it for
Ah, yes, sorry! I typed that comment too quickly.
This is my TPC-H setup: I think when I wrote my above comment on Wednesday, it was slightly different, saying There was also:
But the In any case, maybe just make it clear that the Filter is not part of the arrangement lookup itself, but happens afterwards. Maybe something like Btw. a more general issue, which is also relevant for the above, is that there are several cases where a field of an LIR node is (semantically, not necessarily physically) applied after the main thing that the node does. (E.g., could instead be (Although, I think Edit: And the same with
Sorry, I forgot the index! Edit: This is the same situation as in the "Test an IndexedFilter join." in the slt. |
| Map: null, null | ||
| →Consolidating Union | ||
| →Negate Diffs | ||
| →Differential Join %1 » %0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we are not rendering the equivalences. At the LIR level, they are hidden in ready_equivalences. We could show them inside the stages, and/or also at the top level of the Join node (like in MIR). I think showing them at the top level would be important! (When collecting them to the top level, we'll need to depermute col refs in them, because in ready_equivalences the col refs are in terms of what inputs are already available at the corresponding stage.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding them stage-by-stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is a bigger problem here than I thought: ready_equivalences contains only those equivalences that need to be applied after executing a lookup, but not those that are implicitly applied by the lookup itself. For example, looking at
| WHERE a = c and d = e and b = f |
#3{f} = #2{b}. Contrast this with the MIR EXPLAIN, which showsJoin on=(#0{a} = #2{c} AND #1{b} = #5{f} AND #3{d} = #4{e}) type=delta.The problem is that not seeing the equivalences makes it harder to attribute back the plan to SQL. (Especially if the inputs are not Gets of global things, like
materialize.public.t here, but other plan fragments.)
I'm not sure whether there is a straightforward way to recover this info from LIR. We might have to put this information explicitly there during the lowering from MIR, i.e., just copy over the MIR Join node's equivalences. Edit: Or get it from stream_key and lookup_key.
| Input key: (#0{a}) | ||
| Keys: 2 arrangements available, no raw stream | ||
| Arrangement 0: #0{a} | ||
| Arrangement 1: #1{b} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now super confused: I thought that ArrangeBy's forms are just the new arrangements that are getting created right here. But shouldn't we be creating only one new arrangement here? Is this a lowering bug maybe? Also, does rendering do the correct thing (create just one new arrangement), or it creates two new arrangements? Or what's going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're just showing every arrangement we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, that's what I originally thought, but then it turned out that this is not the case in other situations, see #32262 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I was so confused about how this works. So, the lowering does seem to include all the output arrangements in an LIR ArrangeBy, even those that already exist.
But wait, then I'm sad, because I thought that the #1 benefit of basing our new EXPLAIN format on LIR would be that ArrangeBys will show the arrangements that are created right there. If this is not true, then figuring out how many arrangements are getting created will still involve arcane knowledge about what operators produce what output arrangements in what cases.
Couldn't we just add an invariant that LIR ArrangeBy lists only not-yet-existing arrangements? This shouldn't be hard, as during the lowering we already know which arrangements exist when going into an ArrangeBy.
88d3537 to
44643cb
Compare
I just don't think it's that important, especially if we have column names.
Hidden.
I'm going to leave it alone for now. We should aim for simpler terminology---the difference between "arranged" and "index" is subtle.
Done! I used |
…noyingly special (less context)
…`EXPLAIN`) rather than lir_id numbering
b2043a0 to
926fc4a
Compare

Introduces new default syntax for
EXPLAIN, such that now (1)EXPLAINby default explains LIR plans, which have unambiguous interpretations (unlike MIR plans), and (2)EXPLAINshows information in a Postgres-like syntax, and significantly less information than it used to forEXPLAIN PHYSICAL PLAN FOR(i.e., for LIR).You can still explain MIR plans with the old syntax using
EXPLAIN OPTIMIZED PLAN FOR.You can still explain LIR plans with the old, very verbose syntax using
EXPLAIN PHYSICAL PLAN AS VERBOSE TEXT FOR.Remaining TODOs:
mz_lir_mappingto describe new operator names.Motivation
[design doc]
EXPLAINin Postgres syntax #31643https://github.com/MaterializeInc/database-issues/issues/8889
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.