Skip to content

GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls#49908

Merged
pitrou merged 2 commits intoapache:mainfrom
fenfeng9:fix/ree-count-nulls
May 6, 2026
Merged

GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls#49908
pitrou merged 2 commits intoapache:mainfrom
fenfeng9:fix/ree-count-nulls

Conversation

@fenfeng9
Copy link
Copy Markdown
Contributor

@fenfeng9 fenfeng9 commented May 2, 2026

Rationale for this change

The count kernel used GetNullCount(), which reports the physical null count. For run-end encoded arrays, this ignored nulls in the encoded values child.

What changes are included in this PR?

Use ComputeLogicalNullCount() in the count kernel so run-end encoded arrays are counted correctly. Add C++ and Python tests for this case.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented May 2, 2026

The added tests use a simplified version of the reproducer from the issue.

Reproduce

import pyarrow as pa
import pyarrow.compute as pc

array = pa.array([1, None])
encoded = pc.run_end_encode(array)

print(f"plain only_null: ", pc.count(array, mode="only_null"))
print(f"run_end_encode only_null: ", pc.count(encoded, mode="only_null"))

Result

plain only_null:  1
run_end_encode only_null:  0

@fenfeng9 fenfeng9 changed the title GH-49888: [C++][Compute] Count logical nulls in run-end encoded arrays GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls May 2, 2026
Comment thread cpp/src/arrow/compute/kernels/aggregate_test.cc Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 4, 2026
Comment thread python/pyarrow/tests/test_compute.py Outdated
Copy link
Copy Markdown
Contributor

@tadeja tadeja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix, @fenfeng9 ! 🌞 A few additional test suggestions from my end as inline comments.
The MinGW64 CI failures look unrelated. There were some overlapping failures on main.

@fenfeng9 fenfeng9 force-pushed the fix/ree-count-nulls branch from 9deb821 to 5fa94cc Compare May 5, 2026 13:07
@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented May 5, 2026

Thanks for the suggestion. I updated the C++ and Python tests.

@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented May 5, 2026

The original behavior is:

Reproduce

import pyarrow as pa
import pyarrow.compute as pc


array = pa.array([1, 1, None, None, None, 2, 2, 2, None, 3])
encoded = pc.run_end_encode(array)
# Logical slice: [None, None, 2, 2, 2, None].
slice_plain = array.slice(3, 6)
slice_encoded = encoded.slice(3, 6)

print("pyarrow:", pa.__version__)
print()

print(f"{'case':<12} {'only_valid':>10} {'only_null':>10} {'all':>6}")
for name, value in [
    ("plain", array),
    ("ree", encoded),
    ("slice plain", slice_plain),
    ("slice ree", slice_encoded),
]:
    print(
        f"{name:<12} "
        f"{pc.count(value, mode='only_valid').as_py():>10} "
        f"{pc.count(value, mode='only_null').as_py():>10} "
        f"{pc.count(value, mode='all').as_py():>6}"
    )

Result

pyarrow: 24.0.0

case         only_valid  only_null    all
plain                 6          4     10
ree                  10          0     10
slice plain           3          3      6
slice ree             6          0      6

@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented May 5, 2026

The test failures look unrelated.

@fenfeng9 fenfeng9 requested a review from tadeja May 5, 2026 15:04
Copy link
Copy Markdown
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fenfeng9 !

@pitrou
Copy link
Copy Markdown
Member

pitrou commented May 6, 2026

And thanks @tadeja for the useful reviewing!

@pitrou pitrou merged commit 1b3f313 into apache:main May 6, 2026
61 of 64 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label May 6, 2026
@github-actions github-actions Bot added the awaiting committer review Awaiting committer review label May 6, 2026
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 1b3f313.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants