[CALCITE-7409] MERGE JOIN condition cannot contain IS NOT DISTINCT FROM#4785
[CALCITE-7409] MERGE JOIN condition cannot contain IS NOT DISTINCT FROM#4785xiedeyantu wants to merge 1 commit intoapache:mainfrom
Conversation
|
|
||
| @Override public @Nullable RelNode convert(RelNode rel) { | ||
| Join join = (Join) rel; | ||
| // MergeJoin cannot handle IS NOT DISTINCT FROM because it stops at NULL values |
There was a problem hiding this comment.
there is another comment about this right below; are both comments necessary?
There was a problem hiding this comment.
I'd like to keep these two: one explaining the reason, and the other describing a reasonable approach. However, supporting "IS NOT DISTINCT FROM" would require modifying Linq4j, which doesn't seem straightforward. So, I'll add an extra TODO for now. WDYT?
There was a problem hiding this comment.
You can check for a JIRA issue about it, and if there is one add the link.
I think the two comments could be combined into one.
There was a problem hiding this comment.
Is it acceptable to change it to the following format?
// TODO: support IS NOT DISTINCT FROM condition as join keys of MergeJoin.
// MergeJoin cannot handle IS NOT DISTINCT FROM because it stops at NULL values
// while IS NOT DISTINCT FROM treats NULL = NULL as true.
I couldn't find anything similar on Jira, or maybe I'm just not very good at using Jira.
|
This SQL query actually comes from CALCITE-6452. Jira should currently be working correctly, but due to a cost model issue, it's selecting MergeJoin. From the DAG, it's clear that MergeJoin has about 7 fewer rows than HashJoin, but consumes 6 times more CPU. The current cost model ignores CPU usage, hence selecting MergeJoin. Although the cost model is flawed, this error should still be fixed. |
5e0693e to
8d79878
Compare
8d79878 to
d4bc742
Compare
|



See CALCITE-7409