Skip to content

Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029

Draft
kosiew wants to merge 3 commits intoapache:mainfrom
kosiew:push-down-05-20002
Draft

Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029
kosiew wants to merge 3 commits intoapache:mainfrom
kosiew:push-down-05-20002

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 18, 2026

Which issue does this PR close?

Rationale for this change

The recently added PushDownFilter A/B benchmark suite is working as intended, but the full grouped sweeps are expensive by design and can take a long time to complete. Running the full matrix by default slows down routine validation and makes iterative benchmark work unnecessarily heavy.

This PR improves the benchmark workflow by changing the default execution shape. Instead of always running the full predicate/depth sweep, the benchmark now runs a smaller representative subset by default and requires an explicit opt-in for the full sweep. This preserves the ability to run the full coverage matrix when needed, while making routine local validation much faster and better suited for iterative development.

This change supports the benchmarking workflow for PushDownFilter performance work under Part #20002 by making the benchmark easier to run iteratively during development.

What changes are included in this PR?

This PR updates datafusion/core/benches/sql_planner_extended.rs to split benchmark sweep coverage into two modes:

  • Adds shared constants for the full sweep dimensions:

    • FULL_PREDICATE_SWEEP = [10, 20, 30, 40, 60]
    • FULL_DEPTH_SWEEP = [1, 2, 3]
  • Adds a smaller default benchmark matrix:

    • DEFAULT_SWEEP_POINTS = [(10, 1), (30, 2), (60, 3)]
  • Adds an environment-gated opt-in for the full sweep via:

    • DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1
  • Introduces helper functions:

    • include_full_push_down_filter_sweep()
    • push_down_filter_sweep_points()
  • Updates both benchmark groups to iterate over the selected sweep points instead of always expanding the full 5 × 3 matrix:

    • push_down_filter_hotspot_case_heavy_left_join_ab
    • push_down_filter_control_non_case_left_join_ab

Net effect:

  • Default benchmark runs are much smaller and provide faster feedback.
  • Full benchmark coverage is still available explicitly when doing broader performance validation.
  • The A/B benchmark structure and benchmark IDs remain unchanged for the selected sweep points.

Are these changes tested?

Yes.

The change is limited to benchmark configuration and iteration behavior in sql_planner_extended.rs, and it preserves the existing benchmark construction and execution logic for each selected point.

Recommended validation for this PR is:

cargo check -p datafusion --bench sql_planner_extended
cargo bench -p datafusion --bench sql_planner_extended -- push_down_filter_hotspot_case_heavy_left_join_ab
DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1 cargo bench -p datafusion --bench sql_planner_extended -- push_down_filter_hotspot_case_heavy_left_join_ab

This verifies both:

  • the default reduced sweep path, and
  • the explicit full-sweep path.

Because this PR changes benchmark execution scope rather than optimizer semantics, additional functional tests were not added.

Are there any user-facing changes?

Yes, for developers running benchmarks.

By default, the PushDownFilter A/B benchmark groups now run a reduced representative subset instead of the full sweep. Running the full matrix now requires opting in with:

DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1

There are no user-facing API changes.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@kosiew
Copy link
Contributor Author

kosiew commented Mar 18, 2026

run benchmark sql_planner_extended

@kosiew
Copy link
Contributor Author

kosiew commented Mar 18, 2026

show benchmark queue

@adriangbot
Copy link

Hi @kosiew, you asked to view the benchmark queue (#21029 (comment)).

Comment Repo PR User Benchmarks Status
#4083377148 apache/datafusion #21029 kosiew ["sql_planner_extended"] running

@adriangbot
Copy link

🤖 Criterion benchmark running (GKE) | trigger
Linux bench-c4083377148-423-7k56m 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing push-down-05-20002 (6cc60be) to ab28234 (merge-base) diff
BENCH_NAME=sql_planner_extended
BENCH_COMMAND=cargo bench --features=parquet --bench sql_planner_extended
BENCH_FILTER=
Results will be posted here when complete

@github-actions github-actions bot added the core Core DataFusion crate label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants