refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974

lichuang · 2026-01-24T14:32:15Z

Which issue does this PR close?

This PR unifies SQL planning logic for ORDER BY with HAVING and other clauses by adopting the merged schema approach (#10326).

Problem: DataFusion previously had two different code paths for handling ORDER BY:

add_missing_columns - traverses the plan tree to find Projection nodes and injects missing columns
Merged Schema approach (used by HAVING) - uses a unified schema to resolve expressions directly

This duality caused complex, hard-to-maintain code, non-intuitive handling of queries like SELECT x FROM foo ORDER BY y,and unnecessarily wrapped execution plans.

Solution: Refactor ORDER BY planning to use the merged schema approach, directly adding missing expressions to the SELECT list instead of traversing the plan tree.

Key Changes:
• Simplified order_by_to_sort_expr: Removed the additional_schema parameter and the internal schema merging logic - now input_schema directly instead of building a combined schema
• Added add_missing_order_by_exprs(): A new function that handles missing ORDER BY expressions by adding them directly the SELECT list before planning
• Properly handles aggregate/window functions, column references, and aliases
• Returns error for SELECT DISTINCT with missing ORDER BY expressions
• Supports strict mode (when GROUP BY is present) to preserve proper error messages
• Unified planning flow: ORDER BY expressions are now resolved against the SELECT list first, then missing ones are adde eliminating the need for complex plan tree traversal (add_missing_columns)

This makes the code cleaner, more maintainable, and produces simpler execution plans.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

lichuang · 2026-01-28T07:28:58Z

@alamb @jonahgao

alamb · 2026-01-28T21:06:30Z

This seems to add more code (rather than unify the code paths as the PR comment suggests) 🤔

alamb · 2026-01-28T21:06:56Z

In other words, this PR seems to add more code paths, when the idea was to reduce the code / duplication

lichuang · 2026-01-30T15:21:05Z

@alamb thanks comment. After comment, I Simplified order_by_to_sort_expr by removeing the additional_schema parameter, which is introduced in pr #10234.

github-actions bot added the sql SQL Planner label Jan 24, 2026

lichuang force-pushed the issue-10326 branch 3 times, most recently from 618a289 to 6a0bfa1 Compare January 28, 2026 02:36

github-actions bot added the logical-expr Logical plan and expressions label Jan 28, 2026

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc

2f22ece

lichuang force-pushed the issue-10326 branch from 6a0bfa1 to 2f22ece Compare January 28, 2026 06:45

lichuang marked this pull request as ready for review January 28, 2026 07:05

lichuang added 2 commits January 30, 2026 22:27

Merge branch 'main' into issue-10326

7e2c73c

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc

9aa2f9b

github-actions bot removed the logical-expr Logical plan and expressions label Jan 30, 2026

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc

3056129

github-actions bot added the logical-expr Logical plan and expressions label Jan 30, 2026

lichuang marked this pull request as draft January 30, 2026 18:05

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc

15020d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974

lichuang commented Jan 24, 2026 •

edited

Loading

Uh oh!

lichuang commented Jan 28, 2026

Uh oh!

alamb commented Jan 28, 2026

Uh oh!

alamb commented Jan 28, 2026

Uh oh!

lichuang commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974

Are you sure you want to change the base?

refactor: unify SQL planning for ORDER BY, HAVING, DISTINCT, etc #19974

Conversation

lichuang commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

lichuang commented Jan 28, 2026

Uh oh!

alamb commented Jan 28, 2026

Uh oh!

alamb commented Jan 28, 2026

Uh oh!

lichuang commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lichuang commented Jan 24, 2026 •

edited

Loading

lichuang commented Jan 30, 2026 •

edited

Loading