Skip to content

Conversation

@lichuang
Copy link

@lichuang lichuang commented Jan 24, 2026

Which issue does this PR close?

This PR unifies SQL planning logic for ORDER BY with HAVING and other clauses by adopting the merged schema approach (#10326).

Problem: DataFusion previously had two different code paths for handling ORDER BY:

  1. add_missing_columns - traverses the plan tree to find Projection nodes and injects missing columns
  2. Merged Schema approach (used by HAVING) - uses a unified schema to resolve expressions directly

This duality caused complex, hard-to-maintain code, non-intuitive handling of queries like SELECT x FROM foo ORDER BY y,and unnecessarily wrapped execution plans.

Solution: Refactor ORDER BY planning to use the merged schema approach, directly adding missing expressions to the SELECT list instead of traversing the plan tree.

Key Changes:
• Simplified order_by_to_sort_expr: Removed the additional_schema parameter and the internal schema merging logic - now input_schema directly instead of building a combined schema
• Added add_missing_order_by_exprs(): A new function that handles missing ORDER BY expressions by adding them directly the SELECT list before planning
• Properly handles aggregate/window functions, column references, and aliases
• Returns error for SELECT DISTINCT with missing ORDER BY expressions
• Supports strict mode (when GROUP BY is present) to preserve proper error messages
• Unified planning flow: ORDER BY expressions are now resolved against the SELECT list first, then missing ones are adde eliminating the need for complex plan tree traversal (add_missing_columns)

This makes the code cleaner, more maintainable, and produces simpler execution plans.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label Jan 24, 2026
@lichuang lichuang force-pushed the issue-10326 branch 3 times, most recently from 618a289 to 6a0bfa1 Compare January 28, 2026 02:36
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 28, 2026
@lichuang lichuang marked this pull request as ready for review January 28, 2026 07:05
@lichuang
Copy link
Author

@alamb @jonahgao

@alamb
Copy link
Contributor

alamb commented Jan 28, 2026

This seems to add more code (rather than unify the code paths as the PR comment suggests) 🤔

@alamb
Copy link
Contributor

alamb commented Jan 28, 2026

In other words, this PR seems to add more code paths, when the idea was to reduce the code / duplication

@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Jan 30, 2026
@lichuang
Copy link
Author

lichuang commented Jan 30, 2026

@alamb thanks comment. After comment, I Simplified order_by_to_sort_expr by removeing the additional_schema parameter, which is introduced in pr #10234.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jan 30, 2026
@lichuang lichuang marked this pull request as draft January 30, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants