Skip to content

[CALCITE-7409] MERGE JOIN condition cannot contain IS NOT DISTINCT FROM#4785

Open
xiedeyantu wants to merge 1 commit intoapache:mainfrom
xiedeyantu:CALCITE-7409
Open

[CALCITE-7409] MERGE JOIN condition cannot contain IS NOT DISTINCT FROM#4785
xiedeyantu wants to merge 1 commit intoapache:mainfrom
xiedeyantu:CALCITE-7409

Conversation

@xiedeyantu
Copy link
Member


@Override public @Nullable RelNode convert(RelNode rel) {
Join join = (Join) rel;
// MergeJoin cannot handle IS NOT DISTINCT FROM because it stops at NULL values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is another comment about this right below; are both comments necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep these two: one explaining the reason, and the other describing a reasonable approach. However, supporting "IS NOT DISTINCT FROM" would require modifying Linq4j, which doesn't seem straightforward. So, I'll add an extra TODO for now. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check for a JIRA issue about it, and if there is one add the link.
I think the two comments could be combined into one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to change it to the following format?

    // TODO: support IS NOT DISTINCT FROM condition as join keys of MergeJoin.
    //  MergeJoin cannot handle IS NOT DISTINCT FROM because it stops at NULL values
    //  while IS NOT DISTINCT FROM treats NULL = NULL as true.

I couldn't find anything similar on Jira, or maybe I'm just not very good at using Jira.

@xiedeyantu
Copy link
Member Author

This SQL query actually comes from CALCITE-6452. Jira should currently be working correctly, but due to a cost model issue, it's selecting MergeJoin. From the DAG, it's clear that MergeJoin has about 7 fewer rows than HashJoin, but consumes 6 times more CPU. The current cost model ignores CPU usage, hence selecting MergeJoin. Although the cost model is flawed, this error should still be fixed.

rel#207 (EnumerableHashJoin):
  rows=14.209999999999999
  cost={222.6832026146136 rows, 204.2 cpu, 0.0 io}

rel#213 (EnumerableMergeJoin):
  rows=14.209999999999999
  cost={215.13639999999998 rows, 1272.3104732091813 cpu, 0.0 io}

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants