Skip to content

Conversation

@adriangb
Copy link
Contributor

Which issue does this PR close?

Related to:

Rationale

Currently, the optimizer calls supports_filters_pushdown() to classify filters during logical optimization. This results in a split representation where:

  • Exact/Inexact filters go to TableScan.filters
  • Unsupported/Inexact/Volatile filters stay as Filter nodes above the scan

This creates several problems (as described in #19894):

  • Filter duplication risk: The same predicate may exist in both a Filter node and TableScan.filters
  • Semantic confusion: Unclear which filters are "pushed down" vs. "logical"
  • Implementation burden: DML operations must collect filters from multiple locations
  • Multi-table safety hazards: UPDATE...FROM scenarios become fragile

What changes are included in this PR?

This PR moves ALL filter expressions to TableScan.filters during logical optimization, deferring classification (Exact/Inexact/Unsupported) to the physical planner.

Changes to push_down_filter.rs:

  • Simplified TableScan case to push ALL filters (except scalar subqueries) to TableScan.filters
  • Removed filter classification logic (now handled by physical planner)

Changes to physical_planner.rs:

  • Enhanced TableScan handler to:
    • Classify filters using supports_filters_pushdown()
    • Create FilterExec for Unsupported/Inexact/Volatile filters
    • Handle projection expansion when filters need columns not in user's projection
    • Apply limits correctly when post-filtering is needed
  • Added compute_scan_projection_with_filters() helper
  • Added create_filter_exec() helper with async UDF support
  • Updated extract_dml_filters() to also extract from TableScan.filters

Behavior Changes:

  1. Logical Plan: All filters (except scalar subqueries) now appear in TableScan.filters instead of as separate Filter nodes
  2. Physical Plan: The physical planner creates FilterExec nodes for Unsupported/Inexact/Volatile filters
  3. Projection Handling: When post-scan filters need columns not in the user's projection, we expand the scan projection and add a final ProjectionExec to trim extra columns

Are these changes tested?

Yes - updated existing tests to match new plan representations:

  • Optimizer tests (snapshot updates)
  • Physical planner tests
  • Core integration tests
  • Dataframe and view tests

Are there any user-facing changes?

Plan output changes: Users will see filters in TableScan with partial_filters= or unsupported_filters= annotations in logical plans, rather than separate Filter: nodes. Physical plans remain functionally equivalent with FilterExec nodes where needed.


🤖 Generated with Claude Code

This PR moves ALL filter expressions to `TableScan.filters` during logical
optimization, deferring classification (Exact/Inexact/Unsupported) to the
physical planner.

## Motivation

Currently, the optimizer calls `supports_filters_pushdown()` to classify
filters during logical optimization, resulting in:
- Exact/Inexact filters going to `TableScan.filters`
- Unsupported/Inexact/Volatile filters staying as `Filter` nodes above the scan

This creates a split representation where filters can exist in two places,
leading to complexity in DML operations and potential for filter duplication.

## Changes

**Target behavior:**
- Optimizer moves ALL filters (except scalar subqueries) to `TableScan.filters`
- Physical planner calls `supports_filters_pushdown()` and creates `FilterExec`
  for post-scan filters (Unsupported/Inexact/Volatile)

**Key implementation details:**
- Simplified `push_down_filter.rs` TableScan case to push all filters
- Enhanced `physical_planner.rs` to classify filters and create FilterExec
- Added projection expansion for filter columns not in user's projection
- Updated `extract_dml_filters` to also extract from `TableScan.filters`

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate labels Jan 31, 2026
@adriangb adriangb marked this pull request as draft January 31, 2026 14:00
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant