Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029
Draft
kosiew wants to merge 3 commits intoapache:mainfrom
Draft
Make PushDownFilter benchmark sweeps opt-in to reduce long default runtimes#21029kosiew wants to merge 3 commits intoapache:mainfrom
kosiew wants to merge 3 commits intoapache:mainfrom
Conversation
Contributor
Author
|
run benchmark sql_planner_extended |
Contributor
Author
|
show benchmark queue |
|
Hi @kosiew, you asked to view the benchmark queue (#21029 (comment)).
|
|
🤖 Criterion benchmark running (GKE) | trigger |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The recently added PushDownFilter A/B benchmark suite is working as intended, but the full grouped sweeps are expensive by design and can take a long time to complete. Running the full matrix by default slows down routine validation and makes iterative benchmark work unnecessarily heavy.
This PR improves the benchmark workflow by changing the default execution shape. Instead of always running the full predicate/depth sweep, the benchmark now runs a smaller representative subset by default and requires an explicit opt-in for the full sweep. This preserves the ability to run the full coverage matrix when needed, while making routine local validation much faster and better suited for iterative development.
This change supports the benchmarking workflow for PushDownFilter performance work under Part #20002 by making the benchmark easier to run iteratively during development.
What changes are included in this PR?
This PR updates
datafusion/core/benches/sql_planner_extended.rsto split benchmark sweep coverage into two modes:Adds shared constants for the full sweep dimensions:
FULL_PREDICATE_SWEEP = [10, 20, 30, 40, 60]FULL_DEPTH_SWEEP = [1, 2, 3]Adds a smaller default benchmark matrix:
DEFAULT_SWEEP_POINTS = [(10, 1), (30, 2), (60, 3)]Adds an environment-gated opt-in for the full sweep via:
DATAFUSION_PUSH_DOWN_FILTER_FULL_SWEEP=1Introduces helper functions:
include_full_push_down_filter_sweep()push_down_filter_sweep_points()Updates both benchmark groups to iterate over the selected sweep points instead of always expanding the full 5 × 3 matrix:
push_down_filter_hotspot_case_heavy_left_join_abpush_down_filter_control_non_case_left_join_abNet effect:
Are these changes tested?
Yes.
The change is limited to benchmark configuration and iteration behavior in
sql_planner_extended.rs, and it preserves the existing benchmark construction and execution logic for each selected point.Recommended validation for this PR is:
This verifies both:
Because this PR changes benchmark execution scope rather than optimizer semantics, additional functional tests were not added.
Are there any user-facing changes?
Yes, for developers running benchmarks.
By default, the PushDownFilter A/B benchmark groups now run a reduced representative subset instead of the full sweep. Running the full matrix now requires opting in with:
There are no user-facing API changes.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.