Is your feature request related to a problem? Please describe.
Auron currently only converts native sort-merge and shuffled-hash joins when the Spark join condition is empty beyond the equality keys. Queries such as ... JOIN ... ON a.id = b.id AND a.ts > b.ts fall back even though they are still equi-joins with an additional residual predicate. This leaves a common class of inner joins outside the native path.
Describe the solution you'd like
Support residual conditions for native SortMergeJoinExec and ShuffledHashJoinExec when the join type is InnerLike.
For the first step, the native join can still be planned using only the equality join keys, and the residual condition can be applied as a native filter above the join output. This keeps the scope small and avoids changing the native join protocol in the first implementation.
The scope should be limited to:
SortMergeJoinExec
ShuffledHashJoinExec
InnerLike joins
- residual predicates that can already be converted as native boolean expressions
Unsupported residual predicates should continue to fall back cleanly.
Describe alternatives you've considered
One alternative is to wait until a generic join-time residual-condition mechanism is added to the native join protocol. That would be more uniform, but it would also delay support for the simplest and most common case.
Another alternative is to keep falling back to Spark for any non-empty join condition, but that leaves a large amount of inner-join workload outside native execution.
Additional context
This issue is intentionally limited to inner joins only. It should not include:
- outer joins
- semi/anti joins
- broadcast hash joins
- broadcast nested loop joins
- pure non-equi joins without equality keys
Suggested validation:
- inner SMJ with
equi keys + residual predicate
- inner SHJ with
equi keys + residual predicate
- null handling in residual predicates
- fallback for unsupported residual predicates
Is your feature request related to a problem? Please describe.
Auron currently only converts native sort-merge and shuffled-hash joins when the Spark join condition is empty beyond the equality keys. Queries such as
... JOIN ... ON a.id = b.id AND a.ts > b.tsfall back even though they are still equi-joins with an additional residual predicate. This leaves a common class of inner joins outside the native path.Describe the solution you'd like
Support residual conditions for native
SortMergeJoinExecandShuffledHashJoinExecwhen the join type isInnerLike.For the first step, the native join can still be planned using only the equality join keys, and the residual condition can be applied as a native filter above the join output. This keeps the scope small and avoids changing the native join protocol in the first implementation.
The scope should be limited to:
SortMergeJoinExecShuffledHashJoinExecInnerLikejoinsUnsupported residual predicates should continue to fall back cleanly.
Describe alternatives you've considered
One alternative is to wait until a generic join-time residual-condition mechanism is added to the native join protocol. That would be more uniform, but it would also delay support for the simplest and most common case.
Another alternative is to keep falling back to Spark for any non-empty join condition, but that leaves a large amount of inner-join workload outside native execution.
Additional context
This issue is intentionally limited to inner joins only. It should not include:
Suggested validation:
equi keys + residual predicateequi keys + residual predicate