Skip to content

Implement native support for inner residual join conditions on SMJ/SHJ #2193

@weimingdiit

Description

@weimingdiit

Is your feature request related to a problem? Please describe.
Auron currently only converts native sort-merge and shuffled-hash joins when the Spark join condition is empty beyond the equality keys. Queries such as ... JOIN ... ON a.id = b.id AND a.ts > b.ts fall back even though they are still equi-joins with an additional residual predicate. This leaves a common class of inner joins outside the native path.

Describe the solution you'd like
Support residual conditions for native SortMergeJoinExec and ShuffledHashJoinExec when the join type is InnerLike.

For the first step, the native join can still be planned using only the equality join keys, and the residual condition can be applied as a native filter above the join output. This keeps the scope small and avoids changing the native join protocol in the first implementation.

The scope should be limited to:

  • SortMergeJoinExec
  • ShuffledHashJoinExec
  • InnerLike joins
  • residual predicates that can already be converted as native boolean expressions

Unsupported residual predicates should continue to fall back cleanly.

Describe alternatives you've considered
One alternative is to wait until a generic join-time residual-condition mechanism is added to the native join protocol. That would be more uniform, but it would also delay support for the simplest and most common case.

Another alternative is to keep falling back to Spark for any non-empty join condition, but that leaves a large amount of inner-join workload outside native execution.

Additional context
This issue is intentionally limited to inner joins only. It should not include:

  • outer joins
  • semi/anti joins
  • broadcast hash joins
  • broadcast nested loop joins
  • pure non-equi joins without equality keys

Suggested validation:

  • inner SMJ with equi keys + residual predicate
  • inner SHJ with equi keys + residual predicate
  • null handling in residual predicates
  • fallback for unsupported residual predicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions