Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes by kosiew · Pull Request #21029 · apache/datafusion

kosiew · 2026-03-18T15:17:56Z

Which issue does this PR close?

Part of perf: push_down_filter is pathologically slow for some plans #20002

Rationale for this change

The recently added PushDownFilter A/B benchmark suite is working as intended, but the full grouped sweeps are expensive by design and can take a long time to complete. Running the full matrix by default slows down routine validation and makes iterative benchmark work unnecessarily heavy.

This PR improves the benchmark workflow by changing the default execution shape. Instead of always running the full predicate/depth sweep, the benchmark now runs a smaller representative subset by default and requires an explicit opt-in for the full sweep. This preserves the ability to run the full coverage matrix when needed, while making routine local validation much faster and better suited for iterative development.

This change supports the benchmarking workflow for PushDownFilter performance work under Part #20002 by making the benchmark easier to run iteratively during development.

What changes are included in this PR?

This PR updates datafusion/core/benches/sql_planner_extended.rs to split benchmark sweep coverage into two modes:

Adds shared constants for the full sweep dimensions:
- FULL_PREDICATE_SWEEP = [10, 20, 30, 40, 60]
- FULL_DEPTH_SWEEP = [1, 2, 3]
Adds a smaller default benchmark matrix:
- DEFAULT_SWEEP_POINTS = [(10, 1), (30, 2), (60, 3)]
Adds an environment-gated opt-in for the full sweep via:
- DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1
Introduces helper functions:
- include_full_push_down_filter_sweep()
- push_down_filter_sweep_points()
Updates both benchmark groups to iterate over the selected sweep points instead of always expanding the full 5 × 3 matrix:
- push_down_filter_hotspot_case_heavy_left_join_ab
- push_down_filter_control_non_case_left_join_ab

Net effect:

Default benchmark runs are much smaller and provide faster feedback.
Full benchmark coverage is still available explicitly when doing broader performance validation.
The A/B benchmark structure and benchmark IDs remain unchanged for the selected sweep points.

Are these changes tested?

Yes.

The change is limited to benchmark configuration and iteration behavior in sql_planner_extended.rs, and it preserves the existing benchmark construction and execution logic for each selected point.

Recommended validation for this PR is:

cargo check -p datafusion --bench sql_planner_extended
cargo bench -p datafusion --bench sql_planner_extended -- push_down_filter_hotspot_case_heavy_left_join_ab
DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1 cargo bench -p datafusion --bench sql_planner_extended -- push_down_filter_hotspot_case_heavy_left_join_ab

This verifies both:

the default reduced sweep path, and
the explicit full-sweep path.

Because this PR changes benchmark execution scope rather than optimizer semantics, additional functional tests were not added.

Are there any user-facing changes?

Yes, for developers running benchmarks.

By default, the PushDownFilter A/B benchmark groups now run a reduced representative subset instead of the full sweep. Running the full matrix now requires opting in with:

DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1

There are no user-facing API changes.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

… benchmarks

kosiew · 2026-03-18T15:18:45Z

run benchmark sql_planner_extended

kosiew · 2026-03-18T15:19:05Z

show benchmark queue

adriangbot · 2026-03-18T15:19:07Z

Hi @kosiew, you asked to view the benchmark queue (#21029 (comment)).

Comment	Repo	PR	User	Benchmarks	Status
#4083377148	apache/datafusion	#21029	kosiew	["sql_planner_extended"]	running

adriangbot · 2026-03-18T15:21:33Z

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4083377148-423-7k56m 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-05-20002 (6cc60be) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

kosiew added 3 commits March 18, 2026 22:54

Refactor push down filter benchmarks to use dynamic sweep points

ed27a73

Add dynamic sample size configuration for push down filter benchmarks

5f10213

Remove unused sample size function and constant from push down filter…

6cc60be

… benchmarks

kosiew mentioned this pull request Mar 18, 2026

Avoid null-restrict evaluation for predicates that reference non-join columns in PushDownFilter #20961

Draft

github-actions bot added the core Core DataFusion crate label Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029

Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029
kosiew wants to merge 3 commits intoapache:mainfrom
kosiew:push-down-05-20002

kosiew commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kosiew commented Mar 18, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

kosiew commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

adriangbot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants