Skip to content

Conversation

@Omega359
Copy link

@Omega359 Omega359 commented Nov 25, 2025

PR to refactor and update your upstream PR. I took the liberty of merging up to latest main. Let me know what you think.

The biggest changes are

  • Removal of ConfiguredZone
  • All returned timestamps are in the session time zone
  • Fixed some issues with things besides strings not being properly updated to use session time zone

All my changes are in abbf107

Smith-Cruise and others added 11 commits December 1, 2025 10:03
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->
It's very simple, do I need to propose an issue?
- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

We don't need to clone `ExectuionPlan` when `repartitioned()` didn't
happened.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Very simple, just remove extra `clone()`.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Compile passed.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No

Signed-off-by: Smith Cruise <chendingchao1@126.com>
## Which issue does this PR close?
\-

## Rationale for this change
No need to compile this for production, esp. for downstream users.

## What changes are included in this PR?
Fix `Cargo.toml`.

## Are these changes tested?
Still compiles & tests.

## Are there any user-facing changes?
Less stuff to compile.
## Which issue does this PR close?
- related to apache/arrow-rs#8464

## Rationale for this change

Get latest and greatest code from arrow

## What changes are included in this PR?

1. Update to Arrow 57.1.0
2. Update for API changes (comments inline)

## Are these changes tested?

Yes, by CI

## Are there any user-facing changes?

No

---------

Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
…apache#18923)

- Closes apache#18922

---------

Signed-off-by: Nimalan <nimalan.m@protonmail.com>
…18869)

## Which issue does this PR close?


- Closes apache#18844


## What changes are included in this PR?
Instead of using `data_type_and_nullable`, this patch removes the
indirection and calls `to_field` directly. The deprecated helper added
no additional logic, so the PR encourages callers to use the field
returned by to_field to access both the data type and the nullability.


## Are these changes tested?
Yes
Add section explaining Profile Guided Optimization can provide up to 25%
performance improvements. Includes three-stage build process
instructions and tips for effective PGO usage. References issue apache#9507.

## Which issue does this PR close?

Closes apache#9561 

## Rationale for this change

Adds documentation for Profile Guided Optimization (PGO) as requested.
PGO can provide up to 25% performance improvements for DataFusion
workloads, and users need clear guidance on how to use it.

## What changes are included in this PR?

- Added "Profile Guided Optimization (PGO)" section to
`docs/source/user-guide/crate-configuration.md`
- Three-stage build process instructions (instrumentation, profiling,
recompilation)
- Tips for effective PGO usage (representative workloads, multiple
iterations, combining with other optimizations)
- Links to Rust compiler guide and issue apache#9507

## Are these changes tested?

Yes. Documentation changes are validated by the CI workflow which builds
the docs and checks for errors. The markdown syntax is valid and follows
existing patterns.

## Are there any user-facing changes?

Yes. This adds documentation that will be published on the DataFusion
website under "Crate Configuration" > "Optimizing Builds". Users will
find guidance on using PGO to improve performance.
## Which issue does this PR close?

- Part of apache#18881.

## What changes are included in this PR?

Deny attribute for allow_attrbute lint on physical-plan, this allows it
to not be applied on imported crates, also, a few of the functions no
longer required their respective lints, such as too_many_arguments,
hence, I had them removed.

## Are these changes tested?

I've ran the entire clippy suite before creating a PR.

## Are there any user-facing changes?

There weren't any user-facing changes
…r planning time) (apache#18415)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

-  Closes apache#18413

## Rationale for this change

Avoid a bunch of clones / String copies during planning

## What changes are included in this PR?

Change several methods on DFSchema to return `&FieldRef` rather than
`&Field` which permits `Arc::clone` rather than a deep `Field` clone

## Are these changes tested?

yes by CI

I also ran benchmarks that show a small but consistent speedup in the
planning benchmarks

## Are there any user-facing changes?

Yes, there are several API changes in DFSchema that now return
`FieldRef` rather than `Field` which allows using `Arc::clone` rather
than `clone`. I have updated the upgrading guide too

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…ache#19021)

## Which issue does this PR close?

- Follow on to apache#18923

## Rationale for this change

I was confused about some of the tests for `PartitionPruningStatistics`
so let's add some
more comments to explain what it is doing, and add additional coverage
for multi-value columns


## What changes are included in this PR?

Add a new test 

## Are these changes tested?

Only tests 
## Are there any user-facing changes?

No
…9018)

## Which issue does this PR close?

N/A

## Rationale for this change

new_repeated is much faster than the iterator itself

## What changes are included in this PR?

use `new_repeated` when convert scalar to array for
Utf8/LargeUtf8/Binary/LargeBinary

## Are these changes tested?
Existing tests

## Are there any user-facing changes?

Nope
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19032

## Rationale for this change

See apache#19032

## What changes are included in this PR?

Fix binary
## Are these changes tested?

I tested them manually

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
dependabot bot and others added 30 commits December 22, 2025 08:43
…e#19451)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.64.2 to 2.65.1.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.65.1</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.7.9.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.6.</p>
</li>
<li>
<p>Update <code>prek@latest</code> to 0.2.23.</p>
</li>
</ul>
<h2>2.65.0</h2>
<ul>
<li>
<p>Support <code>cargo-insta</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>,
thanks <a
href="https://github.com/CommanderStorm"><code>@​CommanderStorm</code></a>)</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.2.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.65.1] - 2025-12-21</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.7.9.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.6.</p>
</li>
<li>
<p>Update <code>prek@latest</code> to 0.2.23.</p>
</li>
</ul>
<h2>[2.65.0] - 2025-12-20</h2>
<ul>
<li>
<p>Support <code>cargo-insta</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>,
thanks <a
href="https://github.com/CommanderStorm"><code>@​CommanderStorm</code></a>)</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.2.</p>
</li>
</ul>
<h2>[2.64.2] - 2025-12-19</h2>
<ul>
<li>
<p>Update <code>zizmor@latest</code> to 1.19.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.12.12.</p>
</li>
</ul>
<h2>[2.64.1] - 2025-12-18</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.7.8.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.12.11.</p>
</li>
</ul>
<h2>[2.64.0] - 2025-12-17</h2>
<ul>
<li>
<p><code>tool</code> input option now supports whitespace (space, tab,
and line) or comma separated list. Previously, only comma-separated list
was supported. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1366">#1366</a>)</p>
</li>
<li>
<p>Support <code>prek</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1357">#1357</a>,
thanks <a href="https://github.com/j178"><code>@​j178</code></a>)</p>
</li>
<li>
<p>Support <code>mdbook-mermaid</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1359">#1359</a>,
thanks <a
href="https://github.com/CommanderStorm"><code>@​CommanderStorm</code></a>)</p>
</li>
<li>
<p>Support <code>martin</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1364">#1364</a>,
thanks <a
href="https://github.com/CommanderStorm"><code>@​CommanderStorm</code></a>)</p>
</li>
<li>
<p>Update <code>trivy@latest</code> to 0.68.2.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/b9c5db3aef04caffaf95a1d03931de10fb2a140f"><code>b9c5db3</code></a>
Release 2.65.1</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7796c0f3bbb4224effb7b9a5b719c58e4dcfffcd"><code>7796c0f</code></a>
Update changelog</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/f071f24b175c459aa9e08923b0660736cbb4a0c2"><code>f071f24</code></a>
Update <code>tombi@latest</code> to 0.7.9</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/874ad324364a4db3fb02c1f5e2b88535e22494a0"><code>874ad32</code></a>
Update <code>vacuum@latest</code> to 0.21.6</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/51bd7eff063004d77f2538d773da6664a6d8ce4d"><code>51bd7ef</code></a>
Update <code>vacuum@latest</code> to 0.21.5</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e3a472337e859edcca3c6b097ee68fda61485251"><code>e3a4723</code></a>
Update <code>prek@latest</code> to 0.2.23</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/bfc291e1e39400b67eda124e4a7b4380e93b3390"><code>bfc291e</code></a>
Release 2.65.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/4620a85cf9e526d28a9eab84e396df233006dbda"><code>4620a85</code></a>
Update changelog</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/09980ef8ed3fa65b50d6c1a4756890d5298af699"><code>09980ef</code></a>
Support <code>cargo-insta</code> (<a
href="https://redirect.github.com/taiki-e/install-action/issues/1372">#1372</a>)</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/e6fc9bc5a659502256890b92f1759f4b02235b76"><code>e6fc9bc</code></a>
Update <code>vacuum@latest</code> to 0.21.2</li>
<li>Additional commits viewable in <a
href="https://github.com/taiki-e/install-action/compare/60581cd7025e0e855cebd745379013e286d9c787...b9c5db3aef04caffaf95a1d03931de10fb2a140f">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.64.2&new-version=2.65.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ache#19455)

Bumps
[sphinx-reredirects](https://github.com/documatt/sphinx-reredirects)
from 1.0.0 to 1.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/documatt/sphinx-reredirects/blob/main/docs/changelog.rst">sphinx-reredirects's
changelog</a>.</em></p>
<blockquote>
<p>1.1.0 (2025-12-22)</p>
<hr />
<ul>
<li>support Sphinx 9.0 and above</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/documatt/sphinx-reredirects/commits">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sphinx-reredirects&package-manager=pip&previous-version=1.0.0&new-version=1.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.44.3 to 1.45.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/mitsuhiko/insta/releases">insta's
releases</a>.</em></p>
<blockquote>
<h2>1.45.0</h2>
<h2>Release Notes</h2>
<ul>
<li>Add external diff tool support via <code>INSTA_DIFF_TOOL</code>
environment variable. When set, insta uses the specified tool (e.g.,
<code>delta</code>, <code>difftastic</code>) to display snapshot diffs
instead of the built-in diff. The tool is invoked as <code>&lt;tool&gt;
&lt;old_file&gt; &lt;new_file&gt;</code>. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a></li>
<li>Add <code>test.disable_nextest_doctest</code> config option to
<code>insta.yaml</code>, allowing users to silence the nextest doctest
warning via config instead of passing <code>--dnd</code> every time. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a></li>
<li>Skip non-insta snapshot files in unreferenced detection. Projects
using both insta and other snapshot tools (like vitest or jest) can now
use <code>--unreferenced=reject</code> without false positives on
<code>.snap</code> files from other tools. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a></li>
<li>Collect warnings from tests for display after run. Ensures
deprecation warnings are visible even when nextest suppresses
stdout/stderr from passing tests. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a></li>
<li>Update TOML serialization to be up-to-date and backwards-compatible.
<a
href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a></li>
<li>Support <code>clippy::needless_raw_strings</code> lint by only using
raw strings when content contains backslashes or quotes. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a></li>
</ul>
<h2>Install cargo-insta 1.45.0</h2>
<h3>Install prebuilt binaries via shell script</h3>
<pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf
https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-installer.sh
| sh
</code></pre>
<h3>Install prebuilt binaries via powershell script</h3>
<pre lang="sh"><code>powershell -ExecutionPolicy Bypass -c &quot;irm
https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-installer.ps1
| iex&quot;
</code></pre>
<h2>Download cargo-insta 1.45.0</h2>
<table>
<thead>
<tr>
<th>File</th>
<th>Platform</th>
<th>Checksum</th>
</tr>
</thead>
<tbody>
<tr>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-aarch64-apple-darwin.tar.xz">cargo-insta-aarch64-apple-darwin.tar.xz</a></td>
<td>Apple Silicon macOS</td>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-aarch64-apple-darwin.tar.xz.sha256">checksum</a></td>
</tr>
<tr>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-apple-darwin.tar.xz">cargo-insta-x86_64-apple-darwin.tar.xz</a></td>
<td>Intel macOS</td>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-apple-darwin.tar.xz.sha256">checksum</a></td>
</tr>
<tr>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-pc-windows-msvc.zip">cargo-insta-x86_64-pc-windows-msvc.zip</a></td>
<td>x64 Windows</td>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-pc-windows-msvc.zip.sha256">checksum</a></td>
</tr>
<tr>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-gnu.tar.xz">cargo-insta-x86_64-unknown-linux-gnu.tar.xz</a></td>
<td>x64 Linux</td>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-gnu.tar.xz.sha256">checksum</a></td>
</tr>
<tr>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-musl.tar.xz">cargo-insta-x86_64-unknown-linux-musl.tar.xz</a></td>
<td>x64 MUSL Linux</td>
<td><a
href="https://github.com/mitsuhiko/insta/releases/download/1.45.0/cargo-insta-x86_64-unknown-linux-musl.tar.xz.sha256">checksum</a></td>
</tr>
</tbody>
</table>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md">insta's
changelog</a>.</em></p>
<blockquote>
<h2>1.45.0</h2>
<ul>
<li>Add external diff tool support via <code>INSTA_DIFF_TOOL</code>
environment variable. When set, insta uses the specified tool (e.g.,
<code>delta</code>, <code>difftastic</code>) to display snapshot diffs
instead of the built-in diff. The tool is invoked as <code>&lt;tool&gt;
&lt;old_file&gt; &lt;new_file&gt;</code>. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a></li>
<li>Add <code>test.disable_nextest_doctest</code> config option to
<code>insta.yaml</code>, allowing users to silence the nextest doctest
warning via config instead of passing <code>--dnd</code> every time. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a></li>
<li>Skip non-insta snapshot files in unreferenced detection. Projects
using both insta and other snapshot tools (like vitest or jest) can now
use <code>--unreferenced=reject</code> without false positives on
<code>.snap</code> files from other tools. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a></li>
<li>Collect warnings from tests for display after run. Ensures
deprecation warnings are visible even when nextest suppresses
stdout/stderr from passing tests. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a></li>
<li>Update TOML serialization to be up-to-date and backwards-compatible.
<a
href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a></li>
<li>Support <code>clippy::needless_raw_strings</code> lint by only using
raw strings when content contains backslashes or quotes. <a
href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/mitsuhiko/insta/commit/681a02612b2030e7fb39fe216dba0a1a9c5c46c9"><code>681a026</code></a>
Release 1.45.0 (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/847">#847</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/ad233cd21b1022559377072af5bc9b1e0e2fec4a"><code>ad233cd</code></a>
Skip non-insta snapshot files in unreferenced detection (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/846">#846</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/d8e8dfe7aa5cdc720239398648bc97f9eabb965c"><code>d8e8dfe</code></a>
Collect warnings from tests for display after run (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/840">#840</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/521812cb86d758d08b0e76051437df2337775d86"><code>521812c</code></a>
Support clippy::needless_raw_strings lint (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/828">#828</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/5822a95759c8b528bf0b64f997d312c523acc523"><code>5822a95</code></a>
Add external diff tool support via INSTA_DIFF_TOOL (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/844">#844</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/e50388f534145e353c435420e322bd6ac9cc8bf2"><code>e50388f</code></a>
Add config file support for disable_nextest_doctest (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/842">#842</a>)</li>
<li><a
href="https://github.com/mitsuhiko/insta/commit/5aadfe480601b77bfd27420a7553fd2480b67fed"><code>5aadfe4</code></a>
Up-to-date, backwards-compatible TOML (<a
href="https://redirect.github.com/mitsuhiko/insta/issues/834">#834</a>)</li>
<li>See full diff in <a
href="https://github.com/mitsuhiko/insta/compare/1.44.3...1.45.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=insta&package-manager=cargo&previous-version=1.44.3&new-version=1.45.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* refactor: simplify ToTimestamp* constructors using macros for consist…
…he#19409)

## Which issue does this PR close?

* Part of apache#19250 

## Rationale for this change

This PR enables support for the `power()` function with negative scale
decimals (e.g., `1e4` represented as `1` with scale `-4`) and log for
decimal32 decimal64 .

## What changes are included in this PR?

- **Updated `pow_decimal_int` logic:** Added support for negative
scaling factors. When the adjustment factor is negative, the function
now multiplies by instead of dividing.
- For decimals with negative scale, the value is first converted to f64
to compute the logarithm.

## Are these changes tested?

Yes.

* Verified locally using `sqllogictest`.
* Covers cases such as `SELECT power(1e4, 2)` which previously returned
a "Negative scale is not supported" error.

---------

Co-authored-by: Oleks V <comphead@users.noreply.github.com>
## Which issue does this PR close?

Closes apache#18671 

## Rationale for this change

With the latest changes for apache#18671 we no longer require `datafusion`
crate as a dependency. This will reduce build times for users. Also it
guarantees we do not accidentally introduce code that will create a
`SessionContext` or any other large binary inside our FFI
implementations.

## What changes are included in this PR?

- Remove `datafusion` crate from Cargo.toml
- Update paths
- Apply consistent formatting

## Are these changes tested?

Existing unit tests.

## Are there any user-facing changes?

No. This only updates paths.
## Which issue does this PR close?

- Related to apache#19241

## Rationale for this change

This PR adds benchmarks and tests to ground upcoming `in_list`
optimizations:

1. **Realistic Data Patterns**: Adds mixed-length string benchmarks to
accurately measure the `StringView` two-stage lookup (prefix check +
validation) performance across variable lengths.

2. **Type Coverage**: Adds baseline tests for temporal and decimal types
to ensure correctness before they are migrated to specialized evaluation
paths.

## What changes are included in this PR?

- **Mixed-Length Benchmarks**: Scenarios for `StringArray` and
`StringViewArray` with variable lengths, match rates, and null
densities.

- **Extended Tests**: Coverage for esoteric types (Temporal, Duration,
Interval, Decimal256) in `physical-expr`.

## Are these changes tested?

Yes, via new unit tests and benchmark verification.

## Are there any user-facing changes?

No.
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

There is a FIXME label in the sqllogictest for the aggregation test in
the file


https://github.com/apache/datafusion/blob/2e3707e380172a4ba1ae5efabe7bd27a354bfb2d/datafusion/sqllogictest/test_files/aggregate_skip_partial.slt#L178

Since the issue related to this comment has been merged, we should fix
the test. I have contacted @2010YOUY01 for approval before opening the
PR.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Updated the `aggregate_skip_partial` slt test to cover the case 

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19450 

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
intermittent failure is due to OS failed to write data and file is
empty.
## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Added flush after tokio file write.
(https://docs.rs/tokio/latest/tokio/fs/index.html)
## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes, this just validate existing test case. 
## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No.
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…che#19246)

## Which issue does this PR close?
- closes apache#19238

## Rationale for this change

`SortMergeJoinExec` is currently displayed inconsistently across
physical plan formats, see
[join.slt](https://github.com/apache/datafusion/blob/20870c18a418ec081d44ecf8a90a30a95aa53138/datafusion/sqllogictest/test_files/joins.slt#L2727)
vs.
[explain_tree.slt](https://github.com/apache/datafusion/blob/20870c18a418ec081d44ecf8a90a30a95aa53138/datafusion/sqllogictest/test_files/explain_tree.slt#L1203).

These examples show that the tree-fmt plan uses SortMergeJoinExec, while
the indent-fmt plan uses SortMergeJoin.

Standardizing the operator name improves clarity and aligns with the
naming conventions of other execution operators.

## What changes are included in this PR?

Updates the `DisplayAs` implementation for `SortMergeJoinExec` to output
`"SortMergeJoinExec: ..."`.

Updates SQL Logic Test expected outputs in `joins.slt` to reflect the
unified naming.

No functional behavior changes; this is a display/consistency fix.

## Are these changes tested?

Yes. This change is encapsulated in existing SQL Logic Tests. I updated
those expected outputs to match the new standardized naming.

All tests pass with the updated format.

## Are there any user-facing changes?

Yes—users inspecting physical plans will now consistently see
`SortMergeJoinExec` instead of `SortMergeJoin`.
## Which issue does this PR close?
Closes apache#19356

## Rationale for this change
This PR implements the arrow_metadata UDF as requested in issue apache#19356.

## What changes are included in this PR?
Added arrow_metadata UDF
Refactored Tests

## Are these changes tested?
Yes.

## Are there any user-facing changes?
Yes.
## Which issue does this PR close?

Part of apache#19025

## Rationale for this change

Expand support for binning time data types.

## What changes are included in this PR?

Code, tests.

## Are these changes tested?

Yes, slt tests.

## Are there any user-facing changes?

No.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#17054
- Part of apache#18889

## Rationale for this change

The math UDF lacks the support for the decimal for the round.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Add round support for Decimal32/64/128/256 while preserving original
precision/scale (no implicit cast to Float64).
- Added SLT coverage for decimal round


<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#18474

## Rationale for this change

WIthout this fix running clickbench with limited ram panics:

```sql
SELECT
"UserID",
extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM 'benchmarks/data/hits_partitioned' GROUP BY "UserID", m, "SearchPhrase"
ORDER BY COUNT(*) DESC LIMIT 10;
```

```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion2$ cargo run --bin datafusion-cli -- -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.40s
     Running `target/debug/datafusion-cli -m 1G -c 'SELECT "UserID", extract(minute FROM to_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM '\''/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned'\'' GROUP BY "UserID", m, "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10;'`
DataFusion CLI v51.0.0

thread 'tokio-runtime-worker' (4994761) panicked at datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53:
range end index 2094219 out of range for slice of length 1066
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```

## What changes are included in this PR?

Fix the bug.

This was almost entirely written by codex (prompt below)

<details><summary>Prompt</summary>
<p>

```
 This command causes a panic

  cargo run --bin datafusion-cli -- -m 1G -c "SELECT \"UserID\", extract(minute FROM to_timestamp_seconds(\"EventTime\")) AS m, \"SearchPhrase\", COUNT(*) FROM '/Users/andrewlamb/Software/datafusion/benchmarks/data/hits_partitioned' GROUP BY \"UserID\", m, \"SearchPhrase\" ORDER BY COUNT(*) DESC LIMIT 10;"

  It panics in ByteViewGroupValueBuilder::take_buffers_with_partial_last

  I think the problem happens due to a bug in the take_n implementation

  thread 'tokio-runtime-worker' (4978703) panicked at datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53:
  range end index 2095248 out of range for slice of length 1370
  stack backtrace:
     0: __rustc::rust_begin_unwind
               at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/std/src/panicking.rs:698:5
     1: core::panicking::panic_fmt
               at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panicking.rs:80:14
     2: core::slice::index::slice_index_fail::do_panic::runtime
               at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panic.rs:173:21
     3: core::slice::index::slice_index_fail
               at /rustc/ded5c06cf21d2b93bffd5d884aa6e96934ee4234/library/core/src/panic.rs:178:9
     4: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
               at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:438:13
     5: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
               at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:18:15
     6: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
               at /Users/andrewlamb/.rustup/toolchains/1.92.0-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3628:9
     7: datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B>::take_buffers_with_partial_last
               at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:466:53
     8: datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B>::take_n_inner
               at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:399:18
     9: <datafusion_physical_plan::aggregates::group_values::multi_group_by::bytes_view::ByteViewGroupValueBuilder<B> as datafusion_physical_plan::aggregates::group_values::multi_group_by::GroupColumn>::take_n
               at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:541:14
    10: <datafusion_physical_plan::aggregates::group_values::multi_group_by::GroupValuesColumn<_> as datafusion_physical_plan::aggregates::group_values::GroupValues>::emit::{{closure}}
               at ./datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs:1097:32
    11: core::iter::adapters::map::map_fold::{{closure}}


  Please find and fix the bug
```

</p>
</details> 

## Are these changes tested?
Yes, there is a test included

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…of_size` (apache#19441)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19440

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

When we have view scalars (utf8/binary) and we call `to_array_of_size`,
the data buffers the resultant arrays have contains duplicate data. This
is because the APIs we use don't deduplicate the data, instead appending
it each time even though the data is exactly duplicated.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Manually use a builder with deduplication enabled.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added test.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Some minor things I noticed in `ScalarValue` that I wanted to refactor.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Various refactors.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…he#19432)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- close apache#19417

## Rationale for this change

- see apache#19417
- related to apache#17796

## What changes are included in this PR?

when schema_infer_max_records set to 0 in csv, return datatype as string

## Are these changes tested?

add test case for schema_infer_max_records equal to 0

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19176 

## Rationale for this change

This PR adds custom nullability handling for the Spark LIKE function.
Previously, the function was using the default is_nullable which always
returns true, which is not correct.

## What changes are included in this PR?

- Implemented return_field_from_args() to handle custom nullability
logic
  - The result is nullable if any of the input arguments is nullable
- This matches Spark's behavior where LIKE(NULL, pattern) or LIKE(str,
NULL) returns NULL
- Updated return_type() to use internal_err! pattern to enforce use of
return_field_from_args
- Added comprehensive nullability tests covering all combinations:
  - Non-nullable when both inputs are non-nullable
  - Nullable when first input is nullable
  - Nullable when second input is nullable
  - Nullable when both inputs are nullable

## Testing
All existing tests pass, including the addition add ones.

The implementation follows the same pattern used by other Spark
functions in the codebase (like shuffle and array).
…ck (apache#19466)

If we have a scalar argument that is null, that means the datatype it is
from is already nullable, so theres no need to check both; we only need
to check the nullability of the datatype
## Which issue does this PR close?

- Follow on to apache#19441

## Rationale for this change

In apache#19441 @Jefffrey filed a
follow on ticket for arrow-rs
apache/arrow-rs#9034

I wanted to leave the context of where it could be used in DataFusion so
we remember to use it when available

## What changes are included in this PR?

Add a comment with a reference to
apache/arrow-rs#9034

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No, only comments
…ts (apache#19389)

## Summary

This PR extends `get_field` to accept multiple field name arguments for
nested struct/map access, enabling `get_field(col, 'a', 'b', 'c')` as
equivalent to `col['a']['b']['c']`.

**The primary motivation is to make it easier for downstream
optimizations to match on and optimize struct/map field access
patterns.** By representing `col['a']['b']['c']` as a single
`get_field(col, 'a', 'b', 'c')` call rather than nested
`get_field(get_field(get_field(col, 'a'), 'b'), 'c')` calls,
optimization rules can more easily identify and transform field access
patterns.

This is related / maybe prep work for apache#19387 but I think is a good
improvement in its own right.

## Changes

- **Variadic signature**: `get_field` now accepts 2+ arguments (base +
one or more field names)
- **Type validation at planning time**: Accessing a field on a
non-struct/map type (e.g., `get_field({a: 1}, 'a', 'b')`) fails during
planning with a clear error message indicating which argument position
caused the failure
- **Bracket syntax optimization**: The `FieldAccessPlanner` now merges
consecutive bracket accesses into a single `get_field` call (e.g.,
`s['a']['b']` → `get_field(s, 'a', 'b')`)
- **Mixed access handling**: Array index access correctly breaks the
batching (e.g., `s['a'][0]['b']` → `get_field(array_element(get_field(s,
'a'), 0), 'b')`)

## Example

```sql
-- Direct function call with nested access
SELECT get_field(my_struct, 'outer', 'inner', 'value');

-- Equivalent bracket syntax (now optimized to single get_field)
SELECT my_struct['outer']['inner']['value'];

-- EXPLAIN shows single get_field call
EXPLAIN SELECT s['a']['b'] FROM t;
-- Projection: get_field(t.s, Utf8("a"), Utf8("b"))
```

## Backwards Compatibility

- The original 2-argument form `get_field(struct, 'field')` continues to
work unchanged
- Existing queries using bracket syntax will automatically benefit from
the optimization

## Test plan

- [x] Backwards compatibility test for 2-argument form
- [x] Multi-level get_field with 2, 3, and 5 levels of nesting
- [x] Type validation error tests at argument positions 2, 3, 4
- [x] Non-existent field error tests
- [x] Null handling (null at base, null in middle of chain)
- [x] Mixed array/struct access (verifies array index breaks batching)
- [x] Nullable parent propagation
- [x] EXPLAIN test verifying single get_field call for bracket syntax
- [x] Minimum argument validation (0 and 1 argument cases)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
…e#19474)

Bumps
[taiki-e/install-action](https://github.com/taiki-e/install-action) from
2.65.1 to 2.65.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/releases">taiki-e/install-action's
releases</a>.</em></p>
<blockquote>
<h2>2.65.2</h2>
<ul>
<li>
<p>Update <code>prek@latest</code> to 0.2.24.</p>
</li>
<li>
<p>Update <code>wasmtime@latest</code> to 40.0.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.7.</p>
</li>
<li>
<p>Update <code>tombi@latest</code> to 0.7.10.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.39.0.</p>
</li>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.16.5.</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md">taiki-e/install-action's
changelog</a>.</em></p>
<blockquote>
<h1>Changelog</h1>
<p>All notable changes to this project will be documented in this
file.</p>
<p>This project adheres to <a href="https://semver.org">Semantic
Versioning</a>.</p>
<!-- raw HTML omitted -->
<h2>[Unreleased]</h2>
<h2>[2.65.2] - 2025-12-23</h2>
<ul>
<li>
<p>Update <code>prek@latest</code> to 0.2.24.</p>
</li>
<li>
<p>Update <code>wasmtime@latest</code> to 40.0.0.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.7.</p>
</li>
<li>
<p>Update <code>tombi@latest</code> to 0.7.10.</p>
</li>
<li>
<p>Update <code>syft@latest</code> to 1.39.0.</p>
</li>
<li>
<p>Update <code>cargo-binstall@latest</code> to 1.16.5.</p>
</li>
</ul>
<h2>[2.65.1] - 2025-12-21</h2>
<ul>
<li>
<p>Update <code>tombi@latest</code> to 0.7.9.</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.6.</p>
</li>
<li>
<p>Update <code>prek@latest</code> to 0.2.23.</p>
</li>
</ul>
<h2>[2.65.0] - 2025-12-20</h2>
<ul>
<li>
<p>Support <code>cargo-insta</code>. (<a
href="https://redirect.github.com/taiki-e/install-action/pull/1372">#1372</a>,
thanks <a
href="https://github.com/CommanderStorm"><code>@​CommanderStorm</code></a>)</p>
</li>
<li>
<p>Update <code>vacuum@latest</code> to 0.21.2.</p>
</li>
</ul>
<h2>[2.64.2] - 2025-12-19</h2>
<ul>
<li>
<p>Update <code>zizmor@latest</code> to 1.19.0.</p>
</li>
<li>
<p>Update <code>mise@latest</code> to 2025.12.12.</p>
</li>
</ul>
<h2>[2.64.1] - 2025-12-18</h2>
<ul>
<li>Update <code>tombi@latest</code> to 0.7.8.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/taiki-e/install-action/commit/50cee16bd6b97b2579572f83cfa1c0a721b1e336"><code>50cee16</code></a>
Release 2.65.2</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/71c43df374deb4e987a853401d56672726b34ecd"><code>71c43df</code></a>
Update <code>prek@latest</code> to 0.2.24</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/73bd9d0e1c3d9775f7f4b673d4ac89c3cc914b14"><code>73bd9d0</code></a>
Update <code>wasmtime@latest</code> to 40.0.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/072fd7e631ab33f76ce13784f12153625b1ddde3"><code>072fd7e</code></a>
Update <code>vacuum@latest</code> to 0.21.7</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/7d7e3b737d71ae6fb6ebcc91171e20c79054fddb"><code>7d7e3b7</code></a>
Update <code>tombi@latest</code> to 0.7.10</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/4574e21caf851d43909442ad8c4f79678e4261b4"><code>4574e21</code></a>
Update <code>syft@latest</code> to 1.39.0</li>
<li><a
href="https://github.com/taiki-e/install-action/commit/300b834288f5053ff7f6c56a2318db756a3a8bcd"><code>300b834</code></a>
Update <code>cargo-binstall@latest</code> to 1.16.5</li>
<li>See full diff in <a
href="https://github.com/taiki-e/install-action/compare/b9c5db3aef04caffaf95a1d03931de10fb2a140f...50cee16bd6b97b2579572f83cfa1c0a721b1e336">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=taiki-e/install-action&package-manager=github_actions&previous-version=2.65.1&new-version=2.65.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ic SELECT list support (apache#19221)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

* Closes apache#18991.

## Rationale for this change

The current unparser behavior materializes an explicit `1` literal for
empty projection lists, generating SQL of the form `SELECT 1 FROM ...`
even for dialects (such as PostgreSQL and DataFusion) that support
`SELECT FROM ...` with an empty select list.

For external or federated sources, this can lead to:

* Mismatches between the logical plan schema (empty projection) and the
physical schema produced by the generated SQL (single `1` column), which
then becomes confusing when converting to Arrow.
* Misleading semantics in downstream consumers (e.g. plans that
logically represent "no columns" suddenly gain a synthetic column).
* Unnecessary data movement / computation when the intent is to operate
only on row counts or existence checks.

This PR updates the unparser to:

* Preserve the empty projection semantics for dialects that support
`SELECT FROM ...`, and
* Provide a dialect hook so that other backends can continue to use a
compatible fallback such as `SELECT 1 FROM ...`.

This aligns the generated SQL more closely with the logical plan,
improves compatibility with PostgreSQL, and reduces surprises around
schema shape for aggregate-style queries over external data sources.

## What changes are included in this PR?

This PR makes the following changes:

1. **SelectBuilder semantics for projections**

* Change `SelectBuilder.projection` from `Vec<ast::SelectItem>` to
`Option<Vec<ast::SelectItem>>` to distinguish:

     * `None`: projection has not yet been set,
     * `Some(vec![])`: explicitly empty projection, and
     * `Some(vec![...])`: non-empty projection.
* Update `projection()` to set `Some(value)` and `pop_projections()` to
`take()` the projection (returning an empty vec by default).
* Redefine `already_projected()` to return `true` whenever the
projection has been explicitly set (including the empty case), by
checking `projection.is_some()`.
* Adjust `build()` and `Default` to work with the new `Option`-typed
projection (defaulting to `None` and using `unwrap_or_default()` when
building the AST).

2. **Dialect capability: empty select list support**

   * Extend the `Dialect` trait with a new method:

     * `fn supports_empty_select_list(&self) -> bool { false }`
* Document the intended semantics and behavior across common SQL
engines, with the default returning `false` for maximum compatibility.
* Override this method in `PostgreSqlDialect` to return `true`, allowing
`SELECT FROM ...` to be generated.

3. **Unparser handling of empty projections**

   * Add a helper on `Unparser`:

     * `fn empty_projection_fallback(&self) -> Vec<Expr>`

* Returns an empty vec if `supports_empty_select_list()` is `true`.
* Returns `vec![Expr::Literal(ScalarValue::Int64(Some(1)), None)]`
otherwise.
   * Update `unparse_table_scan_pushdown` to:

* Take `&self` instead of being a purely static helper, so it can
consult the dialect.
* When encountering a `TableScan` with `Some(vec![])` as projection and
`already_projected == false`, use `self.empty_projection_fallback()`
instead of hard-coding a `1` literal.
* Update the few call sites of `unparse_table_scan_pushdown` to call the
instance method (`self.unparse_table_scan_pushdown(...)`).

4. **Tests**

* Add snapshot tests covering both PostgreSQL and the default dialect
for empty projection table scans:

     * `test_table_scan_with_empty_projection_in_plan_to_sql_postgres`

       * Asserts `SELECT FROM "table"` for `UnparserPostgreSqlDialect`.
* `test_table_scan_with_empty_projection_in_plan_to_sql_default_dialect`

       * Asserts `SELECT 1 FROM "table"` for `UnparserDefaultDialect`.
   * Add tests for empty projection with filters:

     * `test_table_scan_with_empty_projection_and_filter_postgres`

       * Asserts `SELECT FROM "table" WHERE ("table"."id" > 10)`.
* `test_table_scan_with_empty_projection_and_filter_default_dialect`

       * Asserts `SELECT 1 FROM "table" WHERE ("table".id > 10)`.
* These tests complement the existing
`table_scan_with_empty_projection_in_plan_to_sql_*` coverage to exercise
both dialect-specific behavior and interaction with filters.

## Are these changes tested?

Running the [reproducer
case](apache@ccdda46)
in apache#18991

`cargo run --example empty_select`

```
use datafusion::error::Result;
use datafusion::prelude::SessionContext;
use datafusion::sql::unparser::{self, Unparser};

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.sql("create table t (k int, v int)")
        .await?
        .collect()
        .await?;

    let df = ctx.sql("select from t").await?;

    let plan = df.into_optimized_plan()?;
    println!("{}", plan.display_indent());
    let sql =
        Unparser::new(&unparser::dialect::PostgreSqlDialect {}).plan_to_sql(&plan)?;
    println!("{sql}");

    Ok(())
}
```

```
TableScan: t projection=[]
SELECT FROM "t"
```

Yes.

* New snapshot tests have been added in `plan_to_sql.rs` to cover:

  * Empty projections for both the PostgreSQL and default dialects.
  * Empty projections combined with a filter predicate.
* Existing `plan_to_sql` tests continue to pass, ensuring that behavior
for non-empty projections and other dialect features is unchanged.

## Are there any user-facing changes?

Yes, for users of the SQL unparser:

* For dialects that support empty select lists (currently PostgreSQL via
`PostgreSqlDialect`):

* Logical plans with an explicitly empty projection will now unparse to
`SELECT FROM ...` instead of `SELECT 1 FROM ...`.
* This more accurately reflects the logical schema (no columns) and
avoids introducing a synthetic literal column.
* For dialects that do **not** support empty select lists:

* The behavior remains effectively the same: the unparser still emits a
non-empty projection (currently `SELECT 1 FROM ...`).
* The behavior is now routed through the new
`supports_empty_select_list` hook, so dialects can opt into different
fallbacks in the future if needed.

The new `supports_empty_select_list` method on `Dialect` has a default
implementation, so existing dialect implementations remain
source-compatible and do not require changes.

## LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.
…19383)

## Which issue does this PR close?

Related to apache#16756

## Rationale for this change

The existing `sql_dialect.rs` example demonstrates `COPY ... STORED AS
...`, which is actually already fully supported by the standard
`DFParser`.

This PR replaces it with the example from apache#16756: `CREATE EXTERNAL
CATALOG ... STORED AS ... LOCATION ...` with automatic table discovery.

## What changes are included in this PR?

The first commit updates `dialect.rs` to show that `DFParser` already
handles `COPY ... STORED AS`, making it clear this syntax doesn't need
customization.

Example output from `cargo run --example sql_ops -- dialect`:

```
Query: COPY source_table TO 'file.fasta' STORED AS FASTA
--- Parsing without extension ---
Standard DFParser: Parsed as Statement::CopyTo: COPY source_table TO file.fasta STORED AS FASTA

--- Parsing with extension ---
Custom MyParser: Parsed as MyStatement::MyCopyTo: COPY source_table TO 'file.fasta' STORED AS FASTA
```

The second commit adds a new `custom_sql_parser.rs` example that
implements `CREATE EXTERNAL CATALOG my_catalog STORED AS <format>
LOCATION '<url>'` with automatic table discovery from object storage. It
also removes the old `dialect.rs` example.

## Are these changes tested?

Yes, the new example is runnable with `cargo run --example sql_ops --
custom_sql_parser` and demonstrates the full flow from parsing custom
DDL through registering the catalog to querying discovered tables.

Example output:

```
=== Part 1: Standard DataFusion Parser ===

Parsing: CREATE EXTERNAL CATALOG parquet_testing
         STORED AS parquet
         LOCATION 'local://workspace/parquet-testing/data'
         OPTIONS (
           'schema_name' = 'staged_data',
           'format.pruning' = 'true'
         )

Error: SQL error: ParserError("Expected: TABLE, found: CATALOG at Line: 1, Column: 17")

=== Part 2: Custom Parser ===

Parsing: CREATE EXTERNAL CATALOG parquet_testing
         STORED AS parquet
         LOCATION 'local://workspace/parquet-testing/data'
         OPTIONS (
           'schema_name' = 'staged_data',
           'format.pruning' = 'true'
         )

  Target Catalog: parquet_testing
  Data Location: local://workspace/parquet-testing/data
  Resolved Schema: staged_data
  Registered 69 tables into schema: staged_data
Executing: SELECT id, bool_col, tinyint_col FROM parquet_testing.staged_data.alltypes_plain LIMIT 5

+----+----------+-------------+
| id | bool_col | tinyint_col |
+----+----------+-------------+
| 4  | true     | 0           |
| 5  | false    | 1           |
| 6  | true     | 0           |
| 7  | false    | 1           |
| 2  | true     | 0           |
+----+----------+-------------+
```

## Are there any user-facing changes?

Documentation only. I replaced the `sql_dialect.rs` example with
`custom_sql_parser.rs` and updated the README. No API changes.
)

## Which issue does this PR close?

- Closes apache#19423.

## Rationale for this change

The functions `arrow_select::merge::merge` and
`arrow_select::merge::merge_n` were first implemented for DataFusion in
`case.rs`. They have since been generalised and moved to `arrow-rs`. Now
that an `arrow-rs` is available that contains these functions,
DataFusion should make use of them.

## What changes are included in this PR?

- Remove `merge` and `merge_n` from `case.rs` along with the unit tests
for those functions
- Adapt code for their equivalents from `arrow-rs`

## Are these changes tested?

Covered by existing unit tests and SLTs

## Are there any user-facing changes?

No
It gives a name (the table name) to each `WorkTable`. This way
`WorkTableExec` can recognize its own `WorkTable`

Note that it doesn't allow multiple occurrences of the same CTE name:
it's not possible to implement things like "join with itself" correctly
with only the work table.

## Which issue does this PR close?

- Closes apache#18955.

## Rationale for this change

Support nested recursive CTEs without co-recursion. This is useful to
e.g. implement SPARQL or other graph query languages.

## What changes are included in this PR?

## Are these changes tested?

Yes! There is a nested recursive query in the test file

## Are there any user-facing changes?

Nested recursive queries are now allowed instead of failing with a "not
implemented" error
Fixes apache/datafusion apache#19162

The SparkAbs UDF was using the default is_nullable=true for all outputs,
even when inputs were non-nullable. This commit implements
return_field_from_args to properly propagate nullability from input
arguments.

Changes:
- Add return_field_from_args implementation to SparkAbs
- Output nullability now matches input nullability
- Handle edge case where scalar argument is explicitly null
- Add tests for nullability behavior

## Which issue does this PR close?

Closes apache#19162

## Rationale for this change


[SparkAbs](cci:2://file:///Users/batman/datafusion/datafusion/spark/src/function/math/abs.rs:41:0-43:1)
was always returning `nullable=true` even for non-nullable inputs.

## What changes are included in this PR?

Implement
[return_field_from_args](cci:1://file:///Users/batman/datafusion/datafusion/expr/src/udf.rs:210:4-215:5)
to propagate nullability from input arguments.

## Are these changes tested?

Yes, added 2 tests for nullability behavior.

## Are there any user-facing changes?

No.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.