Skip to content

Commit 93c81fa

Browse files
committed
Merge branch 'main' into col-1214
2 parents 2b813bf + c609dfa commit 93c81fa

File tree

21 files changed

+1062
-967
lines changed

21 files changed

+1062
-967
lines changed

.github/workflows/build.yml

Lines changed: 98 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,10 @@ jobs:
4848
uv run --no-project ruff check --output-format=github python/
4949
uv run --no-project ruff format --check python/
5050
51+
- name: Run codespell
52+
run: |
53+
uv run --no-project codespell --toml pyproject.toml
54+
5155
generate-license:
5256
runs-on: ubuntu-latest
5357
steps:
@@ -271,7 +275,100 @@ jobs:
271275
with:
272276
name: dist
273277
pattern: dist-*
274-
278+
279+
# Documentation build job that runs after wheels are built
280+
build-docs:
281+
name: Build docs
282+
runs-on: ubuntu-latest
283+
needs: [build-manylinux-x86_64] # Only need the Linux wheel for docs
284+
# Only run docs on main branch pushes, tags, or PRs
285+
if: github.event_name == 'push' || github.event_name == 'pull_request'
286+
steps:
287+
- name: Set target branch
288+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')
289+
id: target-branch
290+
run: |
291+
set -x
292+
if test '${{ github.ref }}' = 'refs/heads/main'; then
293+
echo "value=asf-staging" >> "$GITHUB_OUTPUT"
294+
elif test '${{ github.ref_type }}' = 'tag'; then
295+
echo "value=asf-site" >> "$GITHUB_OUTPUT"
296+
else
297+
echo "Unsupported input: ${{ github.ref }} / ${{ github.ref_type }}"
298+
exit 1
299+
fi
300+
301+
- name: Checkout docs sources
302+
uses: actions/checkout@v5
303+
304+
- name: Checkout docs target branch
305+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')
306+
uses: actions/checkout@v5
307+
with:
308+
fetch-depth: 0
309+
ref: ${{ steps.target-branch.outputs.value }}
310+
path: docs-target
311+
312+
- name: Setup Python
313+
uses: actions/setup-python@v5
314+
with:
315+
python-version: "3.11"
316+
317+
- name: Install dependencies
318+
uses: astral-sh/setup-uv@v6
319+
with:
320+
enable-cache: true
321+
322+
# Download the Linux wheel built in the previous job
323+
- name: Download pre-built Linux wheel
324+
uses: actions/download-artifact@v5
325+
with:
326+
name: dist-manylinux-x86_64
327+
path: wheels/
328+
329+
# Install from the pre-built wheel
330+
- name: Install from pre-built wheel
331+
run: |
332+
set -x
333+
uv venv
334+
# Install documentation dependencies
335+
uv sync --dev --no-install-package datafusion --group docs
336+
# Install the pre-built wheel
337+
WHEEL=$(find wheels/ -name "*.whl" | head -1)
338+
if [ -n "$WHEEL" ]; then
339+
echo "Installing wheel: $WHEEL"
340+
uv pip install "$WHEEL"
341+
else
342+
echo "ERROR: No wheel found!"
343+
exit 1
344+
fi
345+
346+
- name: Build docs
347+
run: |
348+
set -x
349+
cd docs
350+
curl -O https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv
351+
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet
352+
uv run --no-project make html
353+
354+
- name: Copy & push the generated HTML
355+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref_type == 'tag')
356+
run: |
357+
set -x
358+
cd docs-target
359+
# delete anything but: 1) '.'; 2) '..'; 3) .git/
360+
find ./ | grep -vE "^./$|^../$|^./.git" | xargs rm -rf
361+
cp ../.asf.yaml .
362+
cp -r ../docs/build/html/* .
363+
git status --porcelain
364+
if [ "$(git status --porcelain)" != "" ]; then
365+
git config user.name "github-actions[bot]"
366+
git config user.email "github-actions[bot]@users.noreply.github.com"
367+
git add --all
368+
git commit -m 'Publish built docs triggered by ${{ github.sha }}'
369+
git push || git push --force
370+
fi
371+
275372
# NOTE: PyPI publish needs to be done manually for now after release passed the vote
276373
# release:
277374
# name: Publish in PyPI

.github/workflows/docs.yaml

Lines changed: 0 additions & 95 deletions
This file was deleted.

.github/workflows/test.yaml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,6 @@ jobs:
8080
with:
8181
enable-cache: true
8282

83-
- name: Check documentation
84-
if: ${{ matrix.python-version == '3.10' && matrix.toolchain == 'stable' }}
85-
run: |
86-
uv sync --dev --group docs --no-install-package datafusion
87-
uv run --no-project maturin develop --uv
88-
uv run --no-project docs/build.sh
89-
9083
- name: Run tests
9184
env:
9285
RUST_BACKTRACE: 1

.pre-commit-config.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,13 @@ repos:
4545
types: [file, rust]
4646
language: system
4747

48+
- repo: https://github.com/codespell-project/codespell
49+
rev: v2.4.1
50+
hooks:
51+
- id: codespell
52+
args: [ --toml, "pyproject.toml"]
53+
additional_dependencies:
54+
- tomli
55+
4856
default_language_version:
4957
python: python3

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ and for `uv run` commands the additional parameter `--no-project`
233233
git clone git@github.com:apache/datafusion-python.git
234234
# cd to the repo root
235235
cd datafusion-python/
236-
# create the virtual enviornment
236+
# create the virtual environment
237237
uv sync --dev --no-install-package datafusion
238238
# activate the environment
239239
source .venv/bin/activate

docs/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,4 +35,4 @@ help:
3535
# Catch-all target: route all unknown targets to Sphinx using the new
3636
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
3737
%: Makefile
38-
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
38+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) --fail-on-warning

docs/source/contributor-guide/ffi.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ optimization levels. If you wish to go down this route, there are two approaches
195195
have identified you can use.
196196

197197
#. Re-export all of ``datafusion-python`` yourself with your extensions built in.
198-
#. Carefully synchonize your software releases with the ``datafusion-python`` CI build
198+
#. Carefully synchronize your software releases with the ``datafusion-python`` CI build
199199
system so that your libraries use the exact same compiler, features, and
200200
optimization level.
201201

docs/source/contributor-guide/introduction.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Bootstrap:
4343
4444
# fetch this repo
4545
git clone git@github.com:apache/datafusion-python.git
46-
# create the virtual enviornment
46+
# create the virtual environment
4747
uv sync --dev --no-install-package datafusion
4848
# activate the environment
4949
source .venv/bin/activate

docs/source/user-guide/common-operations/expressions.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Arrays
6464
------
6565

6666
For columns that contain arrays of values, you can access individual elements of the array by index
67-
using bracket indexing. This is similar to callling the function
67+
using bracket indexing. This is similar to calling the function
6868
:py:func:`datafusion.functions.array_element`, except that array indexing using brackets is 0 based,
6969
similar to Python arrays and ``array_element`` is 1 based indexing to be compatible with other SQL
7070
approaches.
@@ -82,6 +82,13 @@ approaches.
8282
Indexing an element of an array via ``[]`` starts at index 0 whereas
8383
:py:func:`~datafusion.functions.array_element` starts at index 1.
8484

85+
Starting in DataFusion 49.0.0 you can also create slices of array elements using
86+
slice syntax from Python.
87+
88+
.. ipython:: python
89+
90+
df.select(col("a")[1:3].alias("second_two_elements"))
91+
8592
To check if an array is empty, you can use the function :py:func:`datafusion.functions.array_empty` or `datafusion.functions.empty`.
8693
This function returns a boolean indicating whether the array is empty.
8794

docs/source/user-guide/common-operations/windows.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ In this section you will learn about window functions. A window function utilize
2424
multiple rows to produce a result for each individual row, unlike an aggregate function that
2525
provides a single value for multiple rows.
2626

27-
The window functions are availble in the :py:mod:`~datafusion.functions` module.
27+
The window functions are available in the :py:mod:`~datafusion.functions` module.
2828

2929
We'll use the pokemon dataset (from Ritchie Vink) in the following examples.
3030

@@ -99,8 +99,8 @@ If you do not specify a Window Frame, the frame will be set depending on the fol
9999
criteria.
100100

101101
* If an ``order_by`` clause is set, the default window frame is defined as the rows between
102-
unbounded preceeding and the current row.
103-
* If an ``order_by`` is not set, the default frame is defined as the rows betwene unbounded
102+
unbounded preceding and the current row.
103+
* If an ``order_by`` is not set, the default frame is defined as the rows between unbounded
104104
and unbounded following (the entire partition).
105105

106106
Window Frames are defined by three parameters: unit type, starting bound, and ending bound.
@@ -116,7 +116,7 @@ The unit types available are:
116116
``order_by`` clause.
117117

118118
In this example we perform a "rolling average" of the speed of the current Pokemon and the
119-
two preceeding rows.
119+
two preceding rows.
120120

121121
.. ipython:: python
122122

0 commit comments

Comments
 (0)