feat: extend interval arithmetic to support scalar functions (UDFs)#22248
Draft
davidlghellin wants to merge 10 commits into
Draft
feat: extend interval arithmetic to support scalar functions (UDFs)#22248davidlghellin wants to merge 10 commits into
davidlghellin wants to merge 10 commits into
Conversation
…analysis `check_support` previously returned `false` for any `ScalarFunctionExpr`, preventing `FilterExec::statistics()` from entering the interval-analysis path for predicates containing scalar UDFs (e.g. `ceil(x) > 100`). Extend `check_support` to recurse into `ScalarFunctionExpr` when both the return type and all argument expressions are supported. This enables `evaluate_bounds` and `propagate_constraints` on `ScalarUDFImpl` to participate in selectivity estimation and constraint propagation.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR extends interval-analysis support to include ScalarFunctionExpr (notably ceil) so filter selectivity/statistics can be estimated more accurately, and adds tests covering the new behavior.
Changes:
- Add
ScalarFunctionExprhandling tocheck_supportso interval analysis can traverse scalar functions. - Implement bounds evaluation and constraint propagation for
CeilFunc. - Add regression tests for both
check_supportand filter statistics withceil(x) > const.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| datafusion/physical-plan/src/filter.rs | Adds a regression test ensuring filter statistics are narrowed when predicates include ceil(x). |
| datafusion/physical-expr/src/intervals/utils.rs | Extends check_support to allow scalar functions and adds unit tests covering support detection. |
| datafusion/functions/src/math/ceil.rs | Implements evaluate_bounds and propagate_constraints for ceil to enable interval narrowing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
FilterExec::statistics()uses interval arithmetic to narrow row-count estimates. The entry gate for this analysis ischeck_support, which decides whether an expression can participate. Previously, ScalarFunctionExpr was not recognized bycheck_support, so any filter containing a scalar UDF call (e.g.ceil(x) > 12) would skip interval analysis entirely and fall back to a flat 50% selectivity guess, even if the UDF implementedevaluate_boundsandpropagate_constraints.What changes are included in this PR?
datafusion/physical-expr/src/intervals/utils.rs: extendcheck_supportto recognizeScalarFunctionExpr, gating on whether the return type and all argument expressions are supported.datafusion/functions/src/math/ceil.rs: implementevaluate_boundsandpropagate_constraintsforCeilFuncas a working example of a UDF that benefits from the fix.Are these changes tested?
Yes:
5 unit tests in
intervals/utils.rscovering supported/unsupported return types, unsupported child expressions, andScalarFunctionExprinside aBinaryExpr.1 integration test in
filter.rs(test_filter_statistics_ceil_scalar_fn) that verifiesFilterExec::partition_statisticsproduces a narrowed row estimate forceil(x) > 12.0over a column with known min/max bounds.Are there any user-facing changes?
No API changes. The improvement is visible as more accurate row estimates in query plans containing scalar UDF filters.
related lakehq/sail#1917