fix: Nested self-referential CASE chains should not cause exponential hashing work during physical planning.#22175
Conversation
Nested self-referential CASE chains cause exponential hash work during physical planning because CaseExpr derives Hash over both the original CaseBody and the derived ProjectedCaseBody in EvalMethod. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EvalMethod (including ProjectedCaseBody) is deterministically derived from CaseBody in try_new(), so hashing it is redundant. With the derived Hash, nested CASE chains caused exponential hash work during ProjectionExec::compute_properties because each level doubled the tree traversal. Manual Hash/Eq on body-only fixes this. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@alamb this doesn't look related to our change... do you know who to talk to about CI issues? |
|
|
||
| // eval_method is functionally derived from body, so excluding it from | ||
| // Hash/Eq avoids redundantly hashing the expression tree twice. For | ||
| // nested CASE chains this prevents exponential blowup (see #22173). |
There was a problem hiding this comment.
| // nested CASE chains this prevents exponential blowup (see #22173). | |
| // nested CASE chains this prevents exponential blowup (see https://github.com/apache/datafusion/issues/22173). |
Restarted - seems some flaky thing |
|
FYI @pepijnve I wonder if there is any way to test for this (a benchmark in sql_planner perhaps) so we don't break it in the future |
|
Dang completely missed that. Would it make sense to use something like |
I think in general we are trying to keep our (already quite extensive) dependency chain low |
Makes sense. Yet another supply chain attack vector to worry about otherwise. |
… hashing work during physical planning. (apache#22175) (#421) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes apache#22173. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> Explained in issue ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> 1. Removed derived Hash & Eq 2. implemented manual Hash & Eq 3. manual impl excludes redudant & recursive hash `eval_method` field ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> A unit test was added ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> Pathologically nested CASE/WHEN queries will plan significantly faster. <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
+ compile times |
Which issue does this PR close?
Rationale for this change
Explained in issue
What changes are included in this PR?
eval_methodfieldAre these changes tested?
A unit test was added
Are there any user-facing changes?
Pathologically nested CASE/WHEN queries will plan significantly faster.