Timestamp 17998 proposal #31

Omega359 · 2025-11-25T22:41:32Z

PR to refactor and update your upstream PR. I took the liberty of merging up to latest main. Let me know what you think.

The biggest changes are

Removal of ConfiguredZone
All returned timestamps are in the session time zone
Fixed some issues with things besides strings not being properly updated to use session time zone

All my changes are in abbf107

## Which issue does this PR close?  It's very simple, do I need to propose an issue? - Closes #. ## Rationale for this change  We don't need to clone `ExectuionPlan` when `repartitioned()` didn't happened. ## What changes are included in this PR?  Very simple, just remove extra `clone()`. ## Are these changes tested?  Compile passed. ## Are there any user-facing changes?   No Signed-off-by: Smith Cruise <chendingchao1@126.com>

## Which issue does this PR close? \- ## Rationale for this change No need to compile this for production, esp. for downstream users. ## What changes are included in this PR? Fix `Cargo.toml`. ## Are these changes tested? Still compiles & tests. ## Are there any user-facing changes? Less stuff to compile.

## Which issue does this PR close? - related to apache/arrow-rs#8464 ## Rationale for this change Get latest and greatest code from arrow ## What changes are included in this PR? 1. Update to Arrow 57.1.0 2. Update for API changes (comments inline) ## Are these changes tested? Yes, by CI ## Are there any user-facing changes? No --------- Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>

…apache#18923) - Closes apache#18922 --------- Signed-off-by: Nimalan <nimalan.m@protonmail.com>

…18869) ## Which issue does this PR close? - Closes apache#18844 ## What changes are included in this PR? Instead of using `data_type_and_nullable`, this patch removes the indirection and calls `to_field` directly. The deprecated helper added no additional logic, so the PR encourages callers to use the field returned by to_field to access both the data type and the nullability. ## Are these changes tested? Yes

Add section explaining Profile Guided Optimization can provide up to 25% performance improvements. Includes three-stage build process instructions and tips for effective PGO usage. References issue apache#9507. ## Which issue does this PR close? Closes apache#9561 ## Rationale for this change Adds documentation for Profile Guided Optimization (PGO) as requested. PGO can provide up to 25% performance improvements for DataFusion workloads, and users need clear guidance on how to use it. ## What changes are included in this PR? - Added "Profile Guided Optimization (PGO)" section to `docs/source/user-guide/crate-configuration.md` - Three-stage build process instructions (instrumentation, profiling, recompilation) - Tips for effective PGO usage (representative workloads, multiple iterations, combining with other optimizations) - Links to Rust compiler guide and issue apache#9507 ## Are these changes tested? Yes. Documentation changes are validated by the CI workflow which builds the docs and checks for errors. The markdown syntax is valid and follows existing patterns. ## Are there any user-facing changes? Yes. This adds documentation that will be published on the DataFusion website under "Crate Configuration" > "Optimizing Builds". Users will find guidance on using PGO to improve performance.

## Which issue does this PR close? - Part of apache#18881. ## What changes are included in this PR? Deny attribute for allow_attrbute lint on physical-plan, this allows it to not be applied on imported crates, also, a few of the functions no longer required their respective lints, such as too_many_arguments, hence, I had them removed. ## Are these changes tested? I've ran the entire clippy suite before creating a PR. ## Are there any user-facing changes? There weren't any user-facing changes

…r planning time) (apache#18415) ## Which issue does this PR close?  - Closes apache#18413 ## Rationale for this change Avoid a bunch of clones / String copies during planning ## What changes are included in this PR? Change several methods on DFSchema to return `&FieldRef` rather than `&Field` which permits `Arc::clone` rather than a deep `Field` clone ## Are these changes tested? yes by CI I also ran benchmarks that show a small but consistent speedup in the planning benchmarks ## Are there any user-facing changes? Yes, there are several API changes in DFSchema that now return `FieldRef` rather than `Field` which allows using `Arc::clone` rather than `clone`. I have updated the upgrading guide too --------- Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>

…ache#19021) ## Which issue does this PR close? - Follow on to apache#18923 ## Rationale for this change I was confused about some of the tests for `PartitionPruningStatistics` so let's add some more comments to explain what it is doing, and add additional coverage for multi-value columns ## What changes are included in this PR? Add a new test ## Are these changes tested? Only tests ## Are there any user-facing changes? No

…9018) ## Which issue does this PR close? N/A ## Rationale for this change new_repeated is much faster than the iterator itself ## What changes are included in this PR? use `new_repeated` when convert scalar to array for Utf8/LargeUtf8/Binary/LargeBinary ## Are these changes tested? Existing tests ## Are there any user-facing changes? Nope

## Which issue does this PR close?  - Closes apache#19032 ## Rationale for this change See apache#19032 ## What changes are included in this PR? Fix binary ## Are these changes tested? I tested them manually ## Are there any user-facing changes?

…e#19451) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.64.2 to 2.65.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.65.1</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.7.9.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.6.</p> </li> <li> <p>Update <code>prek@latest</code> to 0.2.23.</p> </li> </ul> <h2>2.65.0</h2> <ul> <li> <p>Support <code>cargo-insta</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>, thanks <a href="https://github.com/CommanderStorm"><code>@CommanderStorm</code></a>)</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.2.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p>  <h2>[Unreleased]</h2> <h2>[2.65.1] - 2025-12-21</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.7.9.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.6.</p> </li> <li> <p>Update <code>prek@latest</code> to 0.2.23.</p> </li> </ul> <h2>[2.65.0] - 2025-12-20</h2> <ul> <li> <p>Support <code>cargo-insta</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>, thanks <a href="https://github.com/CommanderStorm"><code>@CommanderStorm</code></a>)</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.2.</p> </li> </ul> <h2>[2.64.2] - 2025-12-19</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.19.0.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.12.12.</p> </li> </ul> <h2>[2.64.1] - 2025-12-18</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.7.8.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.12.11.</p> </li> </ul> <h2>[2.64.0] - 2025-12-17</h2> <ul> <li> <p><code>tool</code> input option now supports whitespace (space, tab, and line) or comma separated list. Previously, only comma-separated list was supported. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1366">#1366</a>)</p> </li> <li> <p>Support <code>prek</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1357">#1357</a>, thanks <a href="https://github.com/j178"><code>@j178</code></a>)</p> </li> <li> <p>Support <code>mdbook-mermaid</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1359">#1359</a>, thanks <a href="https://github.com/CommanderStorm"><code>@CommanderStorm</code></a>)</p> </li> <li> <p>Support <code>martin</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1364">#1364</a>, thanks <a href="https://github.com/CommanderStorm"><code>@CommanderStorm</code></a>)</p> </li> <li> <p>Update <code>trivy@latest</code> to 0.68.2.</p> </li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/b9c5db3aef04caffaf95a1d03931de10fb2a140f"><code>b9c5db3</code></a> Release 2.65.1</li> <li><a href="https://github.com/taiki-e/install-action/commit/7796c0f3bbb4224effb7b9a5b719c58e4dcfffcd"><code>7796c0f</code></a> Update changelog</li> <li><a href="https://github.com/taiki-e/install-action/commit/f071f24b175c459aa9e08923b0660736cbb4a0c2"><code>f071f24</code></a> Update <code>tombi@latest</code> to 0.7.9</li> <li><a href="https://github.com/taiki-e/install-action/commit/874ad324364a4db3fb02c1f5e2b88535e22494a0"><code>874ad32</code></a> Update <code>vacuum@latest</code> to 0.21.6</li> <li><a href="https://github.com/taiki-e/install-action/commit/51bd7eff063004d77f2538d773da6664a6d8ce4d"><code>51bd7ef</code></a> Update <code>vacuum@latest</code> to 0.21.5</li> <li><a href="https://github.com/taiki-e/install-action/commit/e3a472337e859edcca3c6b097ee68fda61485251"><code>e3a4723</code></a> Update <code>prek@latest</code> to 0.2.23</li> <li><a href="https://github.com/taiki-e/install-action/commit/bfc291e1e39400b67eda124e4a7b4380e93b3390"><code>bfc291e</code></a> Release 2.65.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/4620a85cf9e526d28a9eab84e396df233006dbda"><code>4620a85</code></a> Update changelog</li> <li><a href="https://github.com/taiki-e/install-action/commit/09980ef8ed3fa65b50d6c1a4756890d5298af699"><code>09980ef</code></a> Support <code>cargo-insta</code> (<a href="https://redirect.github.com/taiki-e/install-action/issues/1372">#1372</a>)</li> <li><a href="https://github.com/taiki-e/install-action/commit/e6fc9bc5a659502256890b92f1759f4b02235b76"><code>e6fc9bc</code></a> Update <code>vacuum@latest</code> to 0.21.2</li> <li>Additional commits viewable in <a href="https://github.com/taiki-e/install-action/compare/60581cd7025e0e855cebd745379013e286d9c787...b9c5db3aef04caffaf95a1d03931de10fb2a140f">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.64.2&new-version=2.65.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ache#19455) Bumps [sphinx-reredirects](https://github.com/documatt/sphinx-reredirects) from 1.0.0 to 1.1.0. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/documatt/sphinx-reredirects/blob/main/docs/changelog.rst">sphinx-reredirects's changelog</a>.</em></p> <blockquote> <p>1.1.0 (2025-12-22)</p> <hr /> <ul> <li>support Sphinx 9.0 and above</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/documatt/sphinx-reredirects/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sphinx-reredirects&package-manager=pip&previous-version=1.0.0&new-version=1.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [insta](https://github.com/mitsuhiko/insta) from 1.44.3 to 1.45.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/mitsuhiko/insta/releases">insta's releases</a>.</em></p> <blockquote> <h2>1.45.0</h2> <h2>Release Notes</h2> <ul> <li>Add external diff tool support via <code>INSTA_DIFF_TOOL</code> environment variable. When set, insta uses the specified tool (e.g., <code>delta</code>, <code>difftastic</code>) to display snapshot diffs instead of the built-in diff. The tool is invoked as <code><tool> <old_file> <new_file></code>. <a href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a></li> <li>Add <code>test.disable_nextest_doctest</code> config option to <code>insta.yaml</code>, allowing users to silence the nextest doctest warning via config instead of passing <code>--dnd</code> every time. <a href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a></li> <li>Skip non-insta snapshot files in unreferenced detection. Projects using both insta and other snapshot tools (like vitest or jest) can now use <code>--unreferenced=reject</code> without false positives on <code>.snap</code> files from other tools. <a href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a></li> <li>Collect warnings from tests for display after run. Ensures deprecation warnings are visible even when nextest suppresses stdout/stderr from passing tests. <a href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a></li> <li>Update TOML serialization to be up-to-date and backwards-compatible. <a href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a></li> <li>Support <code>clippy::needless_raw_strings</code> lint by only using raw strings when content contains backslashes or quotes. <a href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a></li> </ul> <h2>Install cargo-insta 1.45.0</h2> <h3>Install prebuilt binaries via shell script</h3> <pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-installer.sh | sh </code></pre> <h3>Install prebuilt binaries via powershell script</h3> <pre lang="sh"><code>powershell -ExecutionPolicy Bypass -c "irm https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-installer.ps1 | iex" </code></pre> <h2>Download cargo-insta 1.45.0</h2> <table> <thead> <tr> <th>File</th> <th>Platform</th> <th>Checksum</th> </tr> </thead> <tbody> <tr> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-aarch64-apple-darwin.tar.xz">cargo-insta-aarch64-apple-darwin.tar.xz</a></td> <td>Apple Silicon macOS</td> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-aarch64-apple-darwin.tar.xz.sha256">checksum</a></td> </tr> <tr> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-apple-darwin.tar.xz">cargo-insta-x86_64-apple-darwin.tar.xz</a></td> <td>Intel macOS</td> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-apple-darwin.tar.xz.sha256">checksum</a></td> </tr> <tr> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-pc-windows-msvc.zip">cargo-insta-x86_64-pc-windows-msvc.zip</a></td> <td>x64 Windows</td> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-pc-windows-msvc.zip.sha256">checksum</a></td> </tr> <tr> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-gnu.tar.xz">cargo-insta-x86_64-unknown-linux-gnu.tar.xz</a></td> <td>x64 Linux</td> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-gnu.tar.xz.sha256">checksum</a></td> </tr> <tr> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-musl.tar.xz">cargo-insta-x86_64-unknown-linux-musl.tar.xz</a></td> <td>x64 MUSL Linux</td> <td><a href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-musl.tar.xz.sha256">checksum</a></td> </tr> </tbody> </table> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md">insta's changelog</a>.</em></p> <blockquote> <h2>1.45.0</h2> <ul> <li>Add external diff tool support via <code>INSTA_DIFF_TOOL</code> environment variable. When set, insta uses the specified tool (e.g., <code>delta</code>, <code>difftastic</code>) to display snapshot diffs instead of the built-in diff. The tool is invoked as <code><tool> <old_file> <new_file></code>. <a href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a></li> <li>Add <code>test.disable_nextest_doctest</code> config option to <code>insta.yaml</code>, allowing users to silence the nextest doctest warning via config instead of passing <code>--dnd</code> every time. <a href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a></li> <li>Skip non-insta snapshot files in unreferenced detection. Projects using both insta and other snapshot tools (like vitest or jest) can now use <code>--unreferenced=reject</code> without false positives on <code>.snap</code> files from other tools. <a href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a></li> <li>Collect warnings from tests for display after run. Ensures deprecation warnings are visible even when nextest suppresses stdout/stderr from passing tests. <a href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a></li> <li>Update TOML serialization to be up-to-date and backwards-compatible. <a href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a></li> <li>Support <code>clippy::needless_raw_strings</code> lint by only using raw strings when content contains backslashes or quotes. <a href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/mitsuhiko/insta/commit/681a02612b2030e7fb39fe216dba0a1a9c5c46c9"><code>681a026</code></a> Release 1.45.0 (<a href="https://redirect.github.com/mitsuhiko/insta/issues/847">#847</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/ad233cd21b1022559377072af5bc9b1e0e2fec4a"><code>ad233cd</code></a> Skip non-insta snapshot files in unreferenced detection (<a href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/d8e8dfe7aa5cdc720239398648bc97f9eabb965c"><code>d8e8dfe</code></a> Collect warnings from tests for display after run (<a href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/521812cb86d758d08b0e76051437df2337775d86"><code>521812c</code></a> Support clippy::needless_raw_strings lint (<a href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/5822a95759c8b528bf0b64f997d312c523acc523"><code>5822a95</code></a> Add external diff tool support via INSTA_DIFF_TOOL (<a href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/e50388f534145e353c435420e322bd6ac9cc8bf2"><code>e50388f</code></a> Add config file support for disable_nextest_doctest (<a href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a>)</li> <li><a href="https://github.com/mitsuhiko/insta/commit/5aadfe480601b77bfd27420a7553fd2480b67fed"><code>5aadfe4</code></a> Up-to-date, backwards-compatible TOML (<a href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a>)</li> <li>See full diff in <a href="https://github.com/mitsuhiko/insta/compare/1.44.3...1.45.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=insta&package-manager=cargo&previous-version=1.44.3&new-version=1.45.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor: simplify ToTimestamp* constructors using macros for consist…

…he#19409) ## Which issue does this PR close? * Part of apache#19250 ## Rationale for this change This PR enables support for the `power()` function with negative scale decimals (e.g., `1e4` represented as `1` with scale `-4`) and log for decimal32 decimal64 . ## What changes are included in this PR? - **Updated `pow_decimal_int` logic:** Added support for negative scaling factors. When the adjustment factor is negative, the function now multiplies by instead of dividing. - For decimals with negative scale, the value is first converted to f64 to compute the logarithm. ## Are these changes tested? Yes. * Verified locally using `sqllogictest`. * Covers cases such as `SELECT power(1e4, 2)` which previously returned a "Negative scale is not supported" error. --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

## Which issue does this PR close? Closes apache#18671 ## Rationale for this change With the latest changes for apache#18671 we no longer require `datafusion` crate as a dependency. This will reduce build times for users. Also it guarantees we do not accidentally introduce code that will create a `SessionContext` or any other large binary inside our FFI implementations. ## What changes are included in this PR? - Remove `datafusion` crate from Cargo.toml - Update paths - Apply consistent formatting ## Are these changes tested? Existing unit tests. ## Are there any user-facing changes? No. This only updates paths.

## Which issue does this PR close? - Related to apache#19241 ## Rationale for this change This PR adds benchmarks and tests to ground upcoming `in_list` optimizations: 1. **Realistic Data Patterns**: Adds mixed-length string benchmarks to accurately measure the `StringView` two-stage lookup (prefix check + validation) performance across variable lengths. 2. **Type Coverage**: Adds baseline tests for temporal and decimal types to ensure correctness before they are migrated to specialized evaluation paths. ## What changes are included in this PR? - **Mixed-Length Benchmarks**: Scenarios for `StringArray` and `StringViewArray` with variable lengths, match rates, and null densities. - **Extended Tests**: Coverage for esoteric types (Temporal, Duration, Interval, Decimal256) in `physical-expr`. ## Are these changes tested? Yes, via new unit tests and benchmark verification. ## Are there any user-facing changes? No.

@2010YOUY01

## Which issue does this PR close?  - Closes #. ## Rationale for this change There is a FIXME label in the sqllogictest for the aggregation test in the file https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/sqllogictest/test_files/aggregate_skip_partial.slt#L178 Since the issue related to this comment has been merged, we should fix the test. I have contacted @2010YOUY01 for approval before opening the PR.  ## What changes are included in this PR? - Updated the `aggregate_skip_partial` slt test to cover the case  ## Are these changes tested?  ## Are there any user-facing changes?

## Which issue does this PR close?  - Closes apache#19450 ## Rationale for this change  intermittent failure is due to OS failed to write data and file is empty. ## What changes are included in this PR?  Added flush after tokio file write. (https://docs.rs/tokio/latest/tokio/fs/index.html) ## Are these changes tested?  Yes, this just validate existing test case. ## Are there any user-facing changes?  No.

…che#19246) ## Which issue does this PR close? - closes apache#19238 ## Rationale for this change `SortMergeJoinExec` is currently displayed inconsistently across physical plan formats, see [join.slt](https://github.com/apache/datafusion/blob/20870c18a418ec081d44ecf8a90a30a95aa53138/datafusion/sqllogictest/test_files/joins.slt#L2727) vs. [explain_tree.slt](https://github.com/apache/datafusion/blob/20870c18a418ec081d44ecf8a90a30a95aa53138/datafusion/sqllogictest/test_files/explain_tree.slt#L1203). These examples show that the tree-fmt plan uses SortMergeJoinExec, while the indent-fmt plan uses SortMergeJoin. Standardizing the operator name improves clarity and aligns with the naming conventions of other execution operators. ## What changes are included in this PR? Updates the `DisplayAs` implementation for `SortMergeJoinExec` to output `"SortMergeJoinExec: ..."`. Updates SQL Logic Test expected outputs in `joins.slt` to reflect the unified naming. No functional behavior changes; this is a display/consistency fix. ## Are these changes tested? Yes. This change is encapsulated in existing SQL Logic Tests. I updated those expected outputs to match the new standardized naming. All tests pass with the updated format. ## Are there any user-facing changes? Yes—users inspecting physical plans will now consistently see `SortMergeJoinExec` instead of `SortMergeJoin`.

## Which issue does this PR close? Closes apache#19356 ## Rationale for this change This PR implements the arrow_metadata UDF as requested in issue apache#19356. ## What changes are included in this PR? Added arrow_metadata UDF Refactored Tests ## Are these changes tested? Yes. ## Are there any user-facing changes? Yes.

## Which issue does this PR close? Part of apache#19025 ## Rationale for this change Expand support for binning time data types. ## What changes are included in this PR? Code, tests. ## Are these changes tested? Yes, slt tests. ## Are there any user-facing changes? No. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

## Which issue does this PR close?  - Closes apache#17054 - Part of apache#18889 ## Rationale for this change The math UDF lacks the support for the decimal for the round.  ## What changes are included in this PR? - Add round support for Decimal32/64/128/256 while preserving original precision/scale (no implicit cast to Float64). - Added SLT coverage for decimal round  ## Are these changes tested?  ## Are there any user-facing changes?

## Which issue does this PR close?  - Closes apache#18474 ## Rationale for this change WIthout this fix running clickbench with limited ram panics: ```sql SELECT "UserID", extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM 'benchmarks/data/hits_partitioned' GROUP BY "UserID", m, "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10; ``` ```shell andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion2$ cargo run --bin datafusion-cli -- -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;" Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.40s Running `target/debug/datafusion-cli -m 1G -c 'SELECT "UserID", extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM '\''/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned'\'' GROUP BY "UserID", m, "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10;'` DataFusion CLI v51.0.0 thread 'tokio-runtime-worker' (4994761) panicked at datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53: range end index 2094219 out of range for slice of length 1066 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` ## What changes are included in this PR? Fix the bug. This was almost entirely written by codex (prompt below) <details><summary>Prompt</summary> <p> ``` This command causes a panic cargo run --bin datafusion-cli -- -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;" It panics in ByteViewGroupValueBuilder::take_buffers_with_partial_last I think the problem happens due to a bug in the take_n implementation thread 'tokio-runtime-worker' (4978703) panicked at datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53: range end index 2095248 out of range for slice of length 1370 stack backtrace: 0: __rustc::rust_begin_unwind at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/std/src/panicking.rs:698:5 1: core::panicking::panic_fmt at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panicking.rs:80:14 2: core::slice::index::slice_index_fail::do_panic::runtime at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panic.rs:173:21 3: core::slice::index::slice_index_fail at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panic.rs:178:9 4: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:438:13 5: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:18:15 6: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3628:9 7: datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B>::take_buffers_with_partial_last at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53 8: datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B>::take_n_inner at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:399:18 9: <datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B> as datafusion_physical_plan::aggregates::group_values::multi_group_by::GroupColumn>::take_n at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:541:14 10: <datafusion_physical_plan::aggregates::group_values::multi_group_by::GroupValuesColumn<_> as datafusion_physical_plan::aggregates::group_values::GroupValues>::emit::{{closure}} at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs:1097:32 11: core::iter::adapters::map::map_fold::{{closure}} Please find and fix the bug ``` </p> </details> ## Are these changes tested? Yes, there is a test included ## Are there any user-facing changes?

…of_size` (apache#19441) ## Which issue does this PR close?  - Closes apache#19440 ## Rationale for this change  When we have view scalars (utf8/binary) and we call `to_array_of_size`, the data buffers the resultant arrays have contains duplicate data. This is because the APIs we use don't deduplicate the data, instead appending it each time even though the data is exactly duplicated. ## What changes are included in this PR?  Manually use a builder with deduplication enabled. ## Are these changes tested?  Added test. ## Are there any user-facing changes?  No.

## Which issue does this PR close?  N/A ## Rationale for this change  Some minor things I noticed in `ScalarValue` that I wanted to refactor. ## What changes are included in this PR?  Various refactors. ## Are these changes tested?  Existing tests. ## Are there any user-facing changes?  No.

…he#19432) ## Which issue does this PR close?  - close apache#19417 ## Rationale for this change - see apache#19417 - related to apache#17796 ## What changes are included in this PR? when schema_infer_max_records set to 0 in csv, return datatype as string ## Are these changes tested? add test case for schema_infer_max_records equal to 0 ## Are there any user-facing changes?

## Which issue does this PR close?  - Closes apache#19176 ## Rationale for this change This PR adds custom nullability handling for the Spark LIKE function. Previously, the function was using the default is_nullable which always returns true, which is not correct. ## What changes are included in this PR? - Implemented return_field_from_args() to handle custom nullability logic - The result is nullable if any of the input arguments is nullable - This matches Spark's behavior where LIKE(NULL, pattern) or LIKE(str, NULL) returns NULL - Updated return_type() to use internal_err! pattern to enforce use of return_field_from_args - Added comprehensive nullability tests covering all combinations: - Non-nullable when both inputs are non-nullable - Nullable when first input is nullable - Nullable when second input is nullable - Nullable when both inputs are nullable ## Testing All existing tests pass, including the addition add ones. The implementation follows the same pattern used by other Spark functions in the codebase (like shuffle and array).

…ck (apache#19466) If we have a scalar argument that is null, that means the datatype it is from is already nullable, so theres no need to check both; we only need to check the nullability of the datatype

@Jefffrey

## Which issue does this PR close? - Follow on to apache#19441 ## Rationale for this change In apache#19441 @Jefffrey filed a follow on ticket for arrow-rs apache/arrow-rs#9034 I wanted to leave the context of where it could be used in DataFusion so we remember to use it when available ## What changes are included in this PR? Add a comment with a reference to apache/arrow-rs#9034 ## Are these changes tested?  ## Are there any user-facing changes? No, only comments

…ts (apache#19389) ## Summary This PR extends `get_field` to accept multiple field name arguments for nested struct/map access, enabling `get_field(col, 'a', 'b', 'c')` as equivalent to `col['a']['b']['c']`. **The primary motivation is to make it easier for downstream optimizations to match on and optimize struct/map field access patterns.** By representing `col['a']['b']['c']` as a single `get_field(col, 'a', 'b', 'c')` call rather than nested `get_field(get_field(get_field(col, 'a'), 'b'), 'c')` calls, optimization rules can more easily identify and transform field access patterns. This is related / maybe prep work for apache#19387 but I think is a good improvement in its own right. ## Changes - **Variadic signature**: `get_field` now accepts 2+ arguments (base + one or more field names) - **Type validation at planning time**: Accessing a field on a non-struct/map type (e.g., `get_field({a: 1}, 'a', 'b')`) fails during planning with a clear error message indicating which argument position caused the failure - **Bracket syntax optimization**: The `FieldAccessPlanner` now merges consecutive bracket accesses into a single `get_field` call (e.g., `s['a']['b']` → `get_field(s, 'a', 'b')`) - **Mixed access handling**: Array index access correctly breaks the batching (e.g., `s['a'][0]['b']` → `get_field(array_element(get_field(s, 'a'), 0), 'b')`) ## Example ```sql -- Direct function call with nested access SELECT get_field(my_struct, 'outer', 'inner', 'value'); -- Equivalent bracket syntax (now optimized to single get_field) SELECT my_struct['outer']['inner']['value']; -- EXPLAIN shows single get_field call EXPLAIN SELECT s['a']['b'] FROM t; -- Projection: get_field(t.s, Utf8("a"), Utf8("b")) ``` ## Backwards Compatibility - The original 2-argument form `get_field(struct, 'field')` continues to work unchanged - Existing queries using bracket syntax will automatically benefit from the optimization ## Test plan - [x] Backwards compatibility test for 2-argument form - [x] Multi-level get_field with 2, 3, and 5 levels of nesting - [x] Type validation error tests at argument positions 2, 3, 4 - [x] Non-existent field error tests - [x] Null handling (null at base, null in middle of chain) - [x] Mixed array/struct access (verifies array index breaks batching) - [x] Nullable parent propagation - [x] EXPLAIN test verifying single get_field call for bracket syntax - [x] Minimum argument validation (0 and 1 argument cases) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…e#19474) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.65.1 to 2.65.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's releases</a>.</em></p> <blockquote> <h2>2.65.2</h2> <ul> <li> <p>Update <code>prek@latest</code> to 0.2.24.</p> </li> <li> <p>Update <code>wasmtime@latest</code> to 40.0.0.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.7.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.7.10.</p> </li> <li> <p>Update <code>syft@latest</code> to 1.39.0.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.16.5.</p> </li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's changelog</a>.</em></p> <blockquote> <h1>Changelog</h1> <p>All notable changes to this project will be documented in this file.</p> <p>This project adheres to <a href="https://semver.org">Semantic Versioning</a>.</p>  <h2>[Unreleased]</h2> <h2>[2.65.2] - 2025-12-23</h2> <ul> <li> <p>Update <code>prek@latest</code> to 0.2.24.</p> </li> <li> <p>Update <code>wasmtime@latest</code> to 40.0.0.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.7.</p> </li> <li> <p>Update <code>tombi@latest</code> to 0.7.10.</p> </li> <li> <p>Update <code>syft@latest</code> to 1.39.0.</p> </li> <li> <p>Update <code>cargo-binstall@latest</code> to 1.16.5.</p> </li> </ul> <h2>[2.65.1] - 2025-12-21</h2> <ul> <li> <p>Update <code>tombi@latest</code> to 0.7.9.</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.6.</p> </li> <li> <p>Update <code>prek@latest</code> to 0.2.23.</p> </li> </ul> <h2>[2.65.0] - 2025-12-20</h2> <ul> <li> <p>Support <code>cargo-insta</code>. (<a href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>, thanks <a href="https://github.com/CommanderStorm"><code>@CommanderStorm</code></a>)</p> </li> <li> <p>Update <code>vacuum@latest</code> to 0.21.2.</p> </li> </ul> <h2>[2.64.2] - 2025-12-19</h2> <ul> <li> <p>Update <code>zizmor@latest</code> to 1.19.0.</p> </li> <li> <p>Update <code>mise@latest</code> to 2025.12.12.</p> </li> </ul> <h2>[2.64.1] - 2025-12-18</h2> <ul> <li>Update <code>tombi@latest</code> to 0.7.8.</li> </ul>  </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/taiki-e/install-action/commit/50cee16bd6b97b2579572f83cfa1c0a721b1e336"><code>50cee16</code></a> Release 2.65.2</li> <li><a href="https://github.com/taiki-e/install-action/commit/71c43df374deb4e987a853401d56672726b34ecd"><code>71c43df</code></a> Update <code>prek@latest</code> to 0.2.24</li> <li><a href="https://github.com/taiki-e/install-action/commit/73bd9d0e1c3d9775f7f4b673d4ac89c3cc914b14"><code>73bd9d0</code></a> Update <code>wasmtime@latest</code> to 40.0.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/072fd7e631ab33f76ce13784f12153625b1ddde3"><code>072fd7e</code></a> Update <code>vacuum@latest</code> to 0.21.7</li> <li><a href="https://github.com/taiki-e/install-action/commit/7d7e3b737d71ae6fb6ebcc91171e20c79054fddb"><code>7d7e3b7</code></a> Update <code>tombi@latest</code> to 0.7.10</li> <li><a href="https://github.com/taiki-e/install-action/commit/4574e21caf851d43909442ad8c4f79678e4261b4"><code>4574e21</code></a> Update <code>syft@latest</code> to 1.39.0</li> <li><a href="https://github.com/taiki-e/install-action/commit/300b834288f5053ff7f6c56a2318db756a3a8bcd"><code>300b834</code></a> Update <code>cargo-binstall@latest</code> to 1.16.5</li> <li>See full diff in <a href="https://github.com/taiki-e/install-action/compare/b9c5db3aef04caffaf95a1d03931de10fb2a140f...50cee16bd6b97b2579572f83cfa1c0a721b1e336">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.65.1&new-version=2.65.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ic SELECT list support (apache#19221) ## Which issue does this PR close?  * Closes apache#18991. ## Rationale for this change The current unparser behavior materializes an explicit `1` literal for empty projection lists, generating SQL of the form `SELECT 1 FROM ...` even for dialects (such as PostgreSQL and DataFusion) that support `SELECT FROM ...` with an empty select list. For external or federated sources, this can lead to: * Mismatches between the logical plan schema (empty projection) and the physical schema produced by the generated SQL (single `1` column), which then becomes confusing when converting to Arrow. * Misleading semantics in downstream consumers (e.g. plans that logically represent "no columns" suddenly gain a synthetic column). * Unnecessary data movement / computation when the intent is to operate only on row counts or existence checks. This PR updates the unparser to: * Preserve the empty projection semantics for dialects that support `SELECT FROM ...`, and * Provide a dialect hook so that other backends can continue to use a compatible fallback such as `SELECT 1 FROM ...`. This aligns the generated SQL more closely with the logical plan, improves compatibility with PostgreSQL, and reduces surprises around schema shape for aggregate-style queries over external data sources. ## What changes are included in this PR? This PR makes the following changes: 1. **SelectBuilder semantics for projections** * Change `SelectBuilder.projection` from `Vec<ast::SelectItem>` to `Option<Vec<ast::SelectItem>>` to distinguish: * `None`: projection has not yet been set, * `Some(vec![])`: explicitly empty projection, and * `Some(vec![...])`: non-empty projection. * Update `projection()` to set `Some(value)` and `pop_projections()` to `take()` the projection (returning an empty vec by default). * Redefine `already_projected()` to return `true` whenever the projection has been explicitly set (including the empty case), by checking `projection.is_some()`. * Adjust `build()` and `Default` to work with the new `Option`-typed projection (defaulting to `None` and using `unwrap_or_default()` when building the AST). 2. **Dialect capability: empty select list support** * Extend the `Dialect` trait with a new method: * `fn supports_empty_select_list(&self) -> bool { false }` * Document the intended semantics and behavior across common SQL engines, with the default returning `false` for maximum compatibility. * Override this method in `PostgreSqlDialect` to return `true`, allowing `SELECT FROM ...` to be generated. 3. **Unparser handling of empty projections** * Add a helper on `Unparser`: * `fn empty_projection_fallback(&self) -> Vec<Expr>` * Returns an empty vec if `supports_empty_select_list()` is `true`. * Returns `vec![Expr::Literal(ScalarValue::Int64(Some(1)), None)]` otherwise. * Update `unparse_table_scan_pushdown` to: * Take `&self` instead of being a purely static helper, so it can consult the dialect. * When encountering a `TableScan` with `Some(vec![])` as projection and `already_projected == false`, use `self.empty_projection_fallback()` instead of hard-coding a `1` literal. * Update the few call sites of `unparse_table_scan_pushdown` to call the instance method (`self.unparse_table_scan_pushdown(...)`). 4. **Tests** * Add snapshot tests covering both PostgreSQL and the default dialect for empty projection table scans: * `test_table_scan_with_empty_projection_in_plan_to_sql_postgres` * Asserts `SELECT FROM "table"` for `UnparserPostgreSqlDialect`. * `test_table_scan_with_empty_projection_in_plan_to_sql_default_dialect` * Asserts `SELECT 1 FROM "table"` for `UnparserDefaultDialect`. * Add tests for empty projection with filters: * `test_table_scan_with_empty_projection_and_filter_postgres` * Asserts `SELECT FROM "table" WHERE ("table"."id" > 10)`. * `test_table_scan_with_empty_projection_and_filter_default_dialect` * Asserts `SELECT 1 FROM "table" WHERE ("table".id > 10)`. * These tests complement the existing `table_scan_with_empty_projection_in_plan_to_sql_*` coverage to exercise both dialect-specific behavior and interaction with filters. ## Are these changes tested? Running the [reproducer case](apache@ccdda46) in apache#18991 `cargo run --example empty_select` ``` use datafusion::error::Result; use datafusion::prelude::SessionContext; use datafusion::sql::unparser::{self, Unparser}; #[tokio::main] async fn main() -> Result<()> { let ctx = SessionContext::new(); ctx.sql("create table t (k int, v int)") .await? .collect() .await?; let df = ctx.sql("select from t").await?; let plan = df.into_optimized_plan()?; println!("{}", plan.display_indent()); let sql = Unparser::new(&unparser::dialect::PostgreSqlDialect {}).plan_to_sql(&plan)?; println!("{sql}"); Ok(()) } ``` ``` TableScan: t projection=[] SELECT FROM "t" ``` Yes. * New snapshot tests have been added in `plan_to_sql.rs` to cover: * Empty projections for both the PostgreSQL and default dialects. * Empty projections combined with a filter predicate. * Existing `plan_to_sql` tests continue to pass, ensuring that behavior for non-empty projections and other dialect features is unchanged. ## Are there any user-facing changes? Yes, for users of the SQL unparser: * For dialects that support empty select lists (currently PostgreSQL via `PostgreSqlDialect`): * Logical plans with an explicitly empty projection will now unparse to `SELECT FROM ...` instead of `SELECT 1 FROM ...`. * This more accurately reflects the logical schema (no columns) and avoids introducing a synthetic literal column. * For dialects that do **not** support empty select lists: * The behavior remains effectively the same: the unparser still emits a non-empty projection (currently `SELECT 1 FROM ...`). * The behavior is now routed through the new `supports_empty_select_list` hook, so dialects can opt into different fallbacks in the future if needed. The new `supports_empty_select_list` method on `Dialect` has a default implementation, so existing dialect implementations remain source-compatible and do not require changes. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

…19383) ## Which issue does this PR close? Related to apache#16756 ## Rationale for this change The existing `sql_dialect.rs` example demonstrates `COPY ... STORED AS ...`, which is actually already fully supported by the standard `DFParser`. This PR replaces it with the example from apache#16756: `CREATE EXTERNAL CATALOG ... STORED AS ... LOCATION ...` with automatic table discovery. ## What changes are included in this PR? The first commit updates `dialect.rs` to show that `DFParser` already handles `COPY ... STORED AS`, making it clear this syntax doesn't need customization. Example output from `cargo run --example sql_ops -- dialect`: ``` Query: COPY source_table TO 'file.fasta' STORED AS FASTA --- Parsing without extension --- Standard DFParser: Parsed as Statement::CopyTo: COPY source_table TO file.fasta STORED AS FASTA --- Parsing with extension --- Custom MyParser: Parsed as MyStatement::MyCopyTo: COPY source_table TO 'file.fasta' STORED AS FASTA ``` The second commit adds a new `custom_sql_parser.rs` example that implements `CREATE EXTERNAL CATALOG my_catalog STORED AS <format> LOCATION '<url>'` with automatic table discovery from object storage. It also removes the old `dialect.rs` example. ## Are these changes tested? Yes, the new example is runnable with `cargo run --example sql_ops -- custom_sql_parser` and demonstrates the full flow from parsing custom DDL through registering the catalog to querying discovered tables. Example output: ``` === Part 1: Standard DataFusion Parser === Parsing: CREATE EXTERNAL CATALOG parquet_testing STORED AS parquet LOCATION 'local://workspace/parquet-testing/data' OPTIONS ( 'schema_name' = 'staged_data', 'format.pruning' = 'true' ) Error: SQL error: ParserError("Expected: TABLE, found: CATALOG at Line: 1, Column: 17") === Part 2: Custom Parser === Parsing: CREATE EXTERNAL CATALOG parquet_testing STORED AS parquet LOCATION 'local://workspace/parquet-testing/data' OPTIONS ( 'schema_name' = 'staged_data', 'format.pruning' = 'true' ) Target Catalog: parquet_testing Data Location: local://workspace/parquet-testing/data Resolved Schema: staged_data Registered 69 tables into schema: staged_data Executing: SELECT id, bool_col, tinyint_col FROM parquet_testing.staged_data.alltypes_plain LIMIT 5 +----+----------+-------------+ | id | bool_col | tinyint_col | +----+----------+-------------+ | 4 | true | 0 | | 5 | false | 1 | | 6 | true | 0 | | 7 | false | 1 | | 2 | true | 0 | +----+----------+-------------+ ``` ## Are there any user-facing changes? Documentation only. I replaced the `sql_dialect.rs` example with `custom_sql_parser.rs` and updated the README. No API changes.

) ## Which issue does this PR close? - Closes apache#19423. ## Rationale for this change The functions `arrow_select::merge::merge` and `arrow_select::merge::merge_n` were first implemented for DataFusion in `case.rs`. They have since been generalised and moved to `arrow-rs`. Now that an `arrow-rs` is available that contains these functions, DataFusion should make use of them. ## What changes are included in this PR? - Remove `merge` and `merge_n` from `case.rs` along with the unit tests for those functions - Adapt code for their equivalents from `arrow-rs` ## Are these changes tested? Covered by existing unit tests and SLTs ## Are there any user-facing changes? No

It gives a name (the table name) to each `WorkTable`. This way `WorkTableExec` can recognize its own `WorkTable` Note that it doesn't allow multiple occurrences of the same CTE name: it's not possible to implement things like "join with itself" correctly with only the work table. ## Which issue does this PR close? - Closes apache#18955. ## Rationale for this change Support nested recursive CTEs without co-recursion. This is useful to e.g. implement SPARQL or other graph query languages. ## What changes are included in this PR? ## Are these changes tested? Yes! There is a nested recursive query in the test file ## Are there any user-facing changes? Nested recursive queries are now allowed instead of failing with a "not implemented" error

Fixes apache/datafusion apache#19162 The SparkAbs UDF was using the default is_nullable=true for all outputs, even when inputs were non-nullable. This commit implements return_field_from_args to properly propagate nullability from input arguments. Changes: - Add return_field_from_args implementation to SparkAbs - Output nullability now matches input nullability - Handle edge case where scalar argument is explicitly null - Add tests for nullability behavior ## Which issue does this PR close? Closes apache#19162 ## Rationale for this change [SparkAbs](cci:2://file:///Users/batman/datafusion/datafusion/spark/src/function/math/abs.rs:41:0-43:1) was always returning `nullable=true` even for non-nullable inputs. ## What changes are included in this PR? Implement [return_field_from_args](cci:1://file:///Users/batman/datafusion/datafusion/expr/src/udf.rs:210:4-215:5) to propagate nullability from input arguments. ## Are these changes tested? Yes, added 2 tests for nullability behavior. ## Are there any user-facing changes? No. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

github-actions bot added documentation Improvements or additions to documentation development-process sql logical-expr ffi physical-expr physical-plan catalog common execution datasource functions optimizer core proto spark substrait sqllogictest labels Nov 25, 2025

Omega359 mentioned this pull request Nov 28, 2025

Make to_timestamp aware of execution timezone apache/datafusion#17998

Open

Smith-Cruise and others added 11 commits December 1, 2025 10:03

fix: partition pruning stats pruning when multiple values are present (…

0f1133e

…apache#18923) - Closes apache#18922 --------- Signed-off-by: Nimalan <nimalan.m@protonmail.com>

dependabot bot and others added 30 commits December 22, 2025 08:43

Merge pull request #7

1579ec3

* refactor: simplify ToTimestamp* constructors using macros for consist…

Small doc fix.

783b684

Merge branch 'main' into timestamp-17998

5764925

Refactor Spark crc32 & sha1 to remove unnecessary scalar argument che…

67b526a

…ck (apache#19466) If we have a scalar argument that is null, that means the datatype it is from is already nullable, so theres no need to check both; we only need to check the nullability of the datatype

Merge branch 'main' into timestamp-17998

8bd8df0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Timestamp 17998 proposal #31

Timestamp 17998 proposal #31

Uh oh!

Omega359 commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Timestamp 17998 proposal #31

Are you sure you want to change the base?

Timestamp 17998 proposal #31

Uh oh!

Conversation

Omega359 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Omega359 commented Nov 25, 2025 •

edited

Loading