Skip to content

feat: extend interval arithmetic to support scalar functions (UDFs)#22248

Draft
davidlghellin wants to merge 10 commits into
apache:mainfrom
davidlghellin:feat/extend_evaluate_propagate
Draft

feat: extend interval arithmetic to support scalar functions (UDFs)#22248
davidlghellin wants to merge 10 commits into
apache:mainfrom
davidlghellin:feat/extend_evaluate_propagate

Conversation

@davidlghellin
Copy link
Copy Markdown
Contributor

@davidlghellin davidlghellin commented May 16, 2026

Which issue does this PR close?

  • N/A

Rationale for this change

FilterExec::statistics() uses interval arithmetic to narrow row-count estimates. The entry gate for this analysis is check_support, which decides whether an expression can participate. Previously, ScalarFunctionExpr was not recognized by check_support, so any filter containing a scalar UDF call (e.g. ceil(x) > 12) would skip interval analysis entirely and fall back to a flat 50% selectivity guess, even if the UDF implemented evaluate_bounds and propagate_constraints.

What changes are included in this PR?

  • datafusion/physical-expr/src/intervals/utils.rs: extend check_support to recognize ScalarFunctionExpr, gating on whether the return type and all argument expressions are supported.

  • datafusion/functions/src/math/ceil.rs: implement evaluate_bounds and propagate_constraints for CeilFunc as a working example of a UDF that benefits from the fix.

Are these changes tested?

Yes:

  • 5 unit tests in intervals/utils.rs covering supported/unsupported return types, unsupported child expressions, and ScalarFunctionExpr inside a BinaryExpr.

  • 1 integration test in filter.rs (test_filter_statistics_ceil_scalar_fn) that verifies FilterExec::partition_statistics produces a narrowed row estimate for ceil(x) > 12.0 over a column with known min/max bounds.

Are there any user-facing changes?

No API changes. The improvement is visible as more accurate row estimates in query plans containing scalar UDF filters.

related lakehq/sail#1917

…analysis

`check_support` previously returned `false` for any `ScalarFunctionExpr`,
preventing `FilterExec::statistics()` from entering the interval-analysis
path for predicates containing scalar UDFs (e.g. `ceil(x) > 100`).

Extend `check_support` to recurse into `ScalarFunctionExpr` when both the
return type and all argument expressions are supported. This enables
`evaluate_bounds` and `propagate_constraints` on `ScalarUDFImpl` to
participate in selectivity estimation and constraint propagation.
Copilot AI review requested due to automatic review settings May 16, 2026 13:44
@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels May 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends interval-analysis support to include ScalarFunctionExpr (notably ceil) so filter selectivity/statistics can be estimated more accurately, and adds tests covering the new behavior.

Changes:

  • Add ScalarFunctionExpr handling to check_support so interval analysis can traverse scalar functions.
  • Implement bounds evaluation and constraint propagation for CeilFunc.
  • Add regression tests for both check_support and filter statistics with ceil(x) > const.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/physical-plan/src/filter.rs Adds a regression test ensuring filter statistics are narrowed when predicates include ceil(x).
datafusion/physical-expr/src/intervals/utils.rs Extends check_support to allow scalar functions and adds unit tests covering support detection.
datafusion/functions/src/math/ceil.rs Implements evaluate_bounds and propagate_constraints for ceil to enable interval narrowing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread datafusion/functions/src/math/ceil.rs
Comment thread datafusion/functions/src/math/ceil.rs Outdated
Comment thread datafusion/physical-expr/src/intervals/utils.rs
@davidlghellin davidlghellin requested a review from Copilot May 16, 2026 13:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread datafusion/physical-expr/src/intervals/utils.rs
Comment thread datafusion/functions/src/math/ceil.rs Outdated
Comment thread datafusion/functions/src/math/ceil.rs
Comment thread datafusion/functions/src/math/ceil.rs Outdated
@davidlghellin davidlghellin marked this pull request as draft May 16, 2026 14:21
@github-actions github-actions Bot added the logical-expr Logical plan and expressions label May 16, 2026
@davidlghellin davidlghellin changed the title feat: extend check_support to include ScalarFunctionExpr in interval analysis feat: extend interval arithmetic to support scalar functions (UDFs) May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants