Skip to content

Conversation

@WillAyd
Copy link
Contributor

@WillAyd WillAyd commented Mar 18, 2025

Rationale for this change

This helps simplify the steps to build pyarrow by leveraging Meson, a build system strongly inspired by Python's syntax. In it's current form, it requires Arrow to be installed on the host system, but in the future we may even be able to have PyArrow build Arrow as a subproject, as needed

What changes are included in this PR?

This PR adds Meson configuration files to the Python code base within Arrow.

Are these changes tested?

Yes

Are there any user-facing changes?

We may want to deprecate the traditional setup.py way of building PyArrow alongside this.

@github-actions
Copy link

⚠️ GitHub issue #36411 has been automatically assigned in GitHub to PR creator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing this, include the following code:

# no-op placeholder
arrow_dep = dependency('', required: false)

if get_option('wrap_mode') != 'forcefallback'
  arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'], required: false)
endif

And then shift the rest to look like this:

if not arrow_dep.found()
    cmake = import('cmake')
    # further lookups
    # ...
    arrow_dep = arrow_proj.dependency('arrow_shared')
endif

Copy link

@eli-schwartz eli-schwartz Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this does:

  • check build options for wrap_mode, which is a builtin meson option allowing you to choose whether you wish to resolve bundled dependencies or look for system dependencies. It defaults to finding system dependencies, but when users run meson with --wrap-mode=forcefallback they are asking to explicitly avoid system deps
  • first try to find an arrow dependency, using both names it might be available as:
    • "arrow" (pkgconfig)
    • "Arrow" (cmake, yes capitalization does matter), with modules: ensuring we pick up the correct cmake find_package() variable
  • if it is not available, required: false means we continue to import the cmake subproject as a fallback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to avoid doing if/else checks:

$ cat subprojects/arrow.wrap

[wrap-file]
directory = arrow
method = cmake

[provide]
arrow = arrow_static_dep

However, using wrap files with method=cmake doesn't (currently) allow you to pass your add_cmake_defines. If you didn't need any defines, then you could simply do this:

arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'])

and you would not need any if/else, it would automatically build the cmake subproject if either:

  • wrap-mode=forcefallback
  • no system arrow was found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that wrap file work? The directory to the cpp source is in arrow/cpp whereas the wrap file itself will be located in arrow/python/subprojects - how would that resolve to the right directory?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you included a symlink anyway, I chose not to bother including any mechanism for downloading the wrap contents. Meson skips over that because the directory already exists with the correct content.

The key benefit of the wrap file is that it allows specifying in ini syntax:

  • the subproject should use the method=cmake automatically, when used via dependency()
  • the autogenerated arrow_static_dep (maybe this should be arrow_shared_dep instead?) will fulfill dependency('arrow')

Again, it's missing the necessary cmake defines so it may not be worth pursuing further.

@WillAyd WillAyd force-pushed the use-meson-python branch 2 times, most recently from cf5b610 to b902e1d Compare March 18, 2025 21:50
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 18, 2025
@WillAyd WillAyd force-pushed the use-meson-python branch 6 times, most recently from eabf11f to 7be3f7b Compare March 18, 2025 22:43
Copy link

@eli-schwartz eli-schwartz Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could still use setuptools-scm with meson if you want.

project(
    'pyarrow',
    # ...., 
    version: run_command('python3', '-m', 'setuptools_scm', '--force-write-version-files', check: true).stdout().strip(),
)

@WillAyd WillAyd force-pushed the use-meson-python branch 2 times, most recently from 05ff60d to 6e0a5fe Compare March 19, 2025 03:58
@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 20, 2025

@kou I have made some offline progress on this, but one of the things I am getting stuck on is how the pyarrow C++ modules are being compiled. From what I understand, the current build process will compile Cython modules first (at least lib.pyx) and from that auto-generate lib.h and lib_api.h headers that the pyarrow modules can then reference (?)

Assuming that understanding is correct, where in the process are lib.h and lib_api.h being generated? I found the CMake command that copies them from the source to the build folder, but I can't figure out where they come from in the first place. Any guidance would be appreciated.

@kou
Copy link
Member

kou commented Mar 20, 2025

The following codes may be related:

if(${property_is_api})
set(_generated_files "${output_file}" "${_name}.h" "${_name}_api.h")
elseif(${property_is_public})
set(_generated_files "${output_file}" "${_name}.h")
else()
set(_generated_files "${output_file}")
endif()

set_source_files_properties(pyarrow/lib.pyx PROPERTIES CYTHON_API TRUE)

@WillAyd
Copy link
Contributor Author

WillAyd commented Mar 20, 2025

Ah nevermind I think I have figured it out. So it looks like Cython generates the header files in the build directory when compiling lib.pyx, so the idea is to copy those header files to a directory structure in the build directory that the sources can resolve to.

I'll have to think about the best way to accomplish that via Meson.

@WillAyd WillAyd force-pushed the use-meson-python branch 2 times, most recently from d67b903 to ba8b276 Compare March 20, 2025 18:01
@WillAyd WillAyd force-pushed the use-meson-python branch 6 times, most recently from a2d07ad to 4ff818e Compare March 21, 2025 00:39
@github-actions github-actions bot added the awaiting changes Awaiting changes label Jan 14, 2026
@WillAyd
Copy link
Contributor Author

WillAyd commented Jan 14, 2026

Wow thanks for finding that! Feel free to push directly - thanks @raulcd !

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 14, 2026
@raulcd

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@raulcd

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@raulcd
Copy link
Member

raulcd commented Jan 29, 2026

I have been able to fix the sdist failures and all the wheels applying the changes on this commit:
946977c
The CI jobs can be seen here:
#48882 (comment)
#48882 (comment)

I think we can apply the commit to this branch, validate CI and continue with review discussions.

Thanks @WillAyd for your help during the process of fixing all the issues that we've found during the last couple of weeks!

@raulcd
Copy link
Member

raulcd commented Jan 30, 2026

@github-actions crossbow submit -g python -g wheel

@github-actions
Copy link

Revision: b5ee816

Submitted crossbow builds: ursacomputing/crossbow @ actions-616f6b7be9

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
python-sdist GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-1.3.4-numpy-1.21.2 GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.12-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.13-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.13-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.14 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-debian-13-python-3-amd64 GitHub Actions
test-debian-13-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions
wheel-macos-monterey-cp310-cp310-amd64 GitHub Actions
wheel-macos-monterey-cp310-cp310-arm64 GitHub Actions
wheel-macos-monterey-cp311-cp311-amd64 GitHub Actions
wheel-macos-monterey-cp311-cp311-arm64 GitHub Actions
wheel-macos-monterey-cp312-cp312-amd64 GitHub Actions
wheel-macos-monterey-cp312-cp312-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-macos-monterey-cp314-cp314-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314-arm64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-amd64 GitHub Actions
wheel-macos-monterey-cp314-cp314t-arm64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314-arm64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-amd64 GitHub Actions
wheel-manylinux-2-28-cp314-cp314t-arm64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314-arm64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-amd64 GitHub Actions
wheel-musllinux-1-2-cp314-cp314t-arm64 GitHub Actions
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp314-cp314-amd64 GitHub Actions
wheel-windows-cp314-cp314t-amd64 GitHub Actions

@WillAyd
Copy link
Contributor Author

WillAyd commented Jan 30, 2026

From a quick glance at some of the crossbow failures, looks like we are missing meson-python in python/requirements-wheel-build.txt

@raulcd
Copy link
Member

raulcd commented Jan 30, 2026

That's curious, I wonder why it worked on the PR here (#48882 (comment)):
Example: wheel-macos-monterey-cp311-cp311-amd64

2026-01-29T10:26:49.2206740Z + python -m build --sdist --wheel . -Csetup-args=-Dbuildtype=release -Csetup-args=-Dacero=enabled -Csetup-args=-Dazure=enabled -Csetup-args=-Dcuda=auto -Csetup-args=-Ddataset=enabled -Csetup-args=-Dflight=enabled -Csetup-args=-Dgandiva=disabled -Csetup-args=-Dgcs=enabled -Csetup-args=-Dhdfs=enabled -Csetup-args=-Dorc=enabled -Csetup-args=-Dparquet=enabled -Csetup-args=-Dparquet_require_encryption=enabled -Csetup-args=-Ds3=enabled -Csetup-args=-Dsubstrait=enabled -Ccompile-args=-v -Csetup-args=--pkg-config-path=/Users/runner/work/crossbow/crossbow/build/install/lib/pkgconfig
2026-01-29T10:26:49.3473820Z * Creating isolated environment: venv+pip...
2026-01-29T10:26:49.3538720Z * Installing packages in isolated environment:
2026-01-29T10:26:49.3539450Z   - cython >= 3.1
2026-01-29T10:26:49.3540040Z   - meson-python
2026-01-29T10:26:49.3540370Z   - numpy>=1.25
2026-01-29T10:26:49.3541200Z   - setuptools_scm[toml]>=8
2026-01-29T10:26:52.6034430Z * Getting build dependencies for sdist...
2026-01-29T10:26:52.8860060Z * Building sdist...

Edit: Sorry! I've just realized. I've moved to --no-isolation due to rebasing the latest changes. That makes sense, I'll add meson-python to the build-wheel requirements.

@WillAyd
Copy link
Contributor Author

WillAyd commented Jan 30, 2026

In the log you linked it looks like there is no --no-isolation flag, so the pip install is grabbing its own local copy of meson-python from pyproject.toml.

In the crossbow failures, --no-isolation is there, so its up to the environment to provide meson-python

@raulcd
Copy link
Member

raulcd commented Jan 30, 2026

@github-actions crossbow submit wheel-*cp313-cp313-amd64

@github-actions
Copy link

Revision: d8844b4

Submitted crossbow builds: ursacomputing/crossbow @ actions-6e13bde35b

Task Status
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants