GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls by fenfeng9 · Pull Request #49908 · apache/arrow

fenfeng9 · 2026-05-02T10:34:40Z

Rationale for this change

The count kernel used GetNullCount(), which reports the physical null count. For run-end encoded arrays, this ignored nulls in the encoded values child.

What changes are included in this PR?

Use ComputeLogicalNullCount() in the count kernel so run-end encoded arrays are counted correctly. Add C++ and Python tests for this case.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

GitHub Issue: [C++][Compute] count kernel miscounts when run-end encoded array contains null #49888

fenfeng9 · 2026-05-02T10:40:47Z

The added tests use a simplified version of the reproducer from the issue.

Reproduce

import pyarrow as pa
import pyarrow.compute as pc

array = pa.array([1, None])
encoded = pc.run_end_encode(array)

print(f"plain only_null: ", pc.count(array, mode="only_null"))
print(f"run_end_encode only_null: ", pc.count(encoded, mode="only_null"))

Result

plain only_null:  1
run_end_encode only_null:  0

tadeja

Thank you for the fix, @fenfeng9 ! 🌞 A few additional test suggestions from my end as inline comments.
The MinGW64 CI failures look unrelated. There were some overlapping failures on main.

… arrays

fenfeng9 · 2026-05-05T13:07:45Z

Thanks for the suggestion. I updated the C++ and Python tests.

fenfeng9 · 2026-05-05T13:11:11Z

The original behavior is:

Reproduce

import pyarrow as pa
import pyarrow.compute as pc


array = pa.array([1, 1, None, None, None, 2, 2, 2, None, 3])
encoded = pc.run_end_encode(array)
# Logical slice: [None, None, 2, 2, 2, None].
slice_plain = array.slice(3, 6)
slice_encoded = encoded.slice(3, 6)

print("pyarrow:", pa.__version__)
print()

print(f"{'case':<12} {'only_valid':>10} {'only_null':>10} {'all':>6}")
for name, value in [
    ("plain", array),
    ("ree", encoded),
    ("slice plain", slice_plain),
    ("slice ree", slice_encoded),
]:
    print(
        f"{name:<12} "
        f"{pc.count(value, mode='only_valid').as_py():>10} "
        f"{pc.count(value, mode='only_null').as_py():>10} "
        f"{pc.count(value, mode='all').as_py():>6}"
    )

Result

pyarrow: 24.0.0

case         only_valid  only_null    all
plain                 6          4     10
ree                  10          0     10
slice plain           3          3      6
slice ree             6          0      6

fenfeng9 · 2026-05-05T15:04:22Z

The test failures look unrelated.

pitrou

Thank you @fenfeng9 !

pitrou · 2026-05-06T14:23:23Z

And thanks @tadeja for the useful reviewing!

conbench-apache-arrow · 2026-05-06T15:17:48Z

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 1b3f313.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

fenfeng9 requested review from AlenkaF, raulcd and rok as code owners May 2, 2026 10:34

github-actions Bot added Component: C++ Component: Python awaiting review Awaiting review labels May 2, 2026

fenfeng9 changed the title ~~GH-49888: [C++][Compute] Count logical nulls in run-end encoded arrays~~ GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls May 2, 2026

tadeja reviewed May 4, 2026

View reviewed changes

Comment thread cpp/src/arrow/compute/kernels/aggregate_test.cc Outdated

github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 4, 2026

tadeja reviewed May 4, 2026

View reviewed changes

Comment thread python/pyarrow/tests/test_compute.py Outdated

tadeja suggested changes May 4, 2026

View reviewed changes

fenfeng9 added 2 commits May 5, 2026 20:54

apacheGH-49888: [C++][Compute] Count logical nulls in run-end encoded…

a16aac5

… arrays

apacheGH-49888: Strengthen count tests for run-end encoded arrays

5fa94cc

fenfeng9 force-pushed the fix/ree-count-nulls branch from 9deb821 to 5fa94cc Compare May 5, 2026 13:07

fenfeng9 requested a review from tadeja May 5, 2026 15:04

pitrou approved these changes May 6, 2026

View reviewed changes

pitrou merged commit 1b3f313 into apache:main May 6, 2026
61 of 64 checks passed

pitrou removed the awaiting committer review Awaiting committer review label May 6, 2026

pitrou mentioned this pull request May 6, 2026

[C++][Compute] count kernel miscounts when run-end encoded array contains null #49888

Closed

github-actions Bot added the awaiting committer review Awaiting committer review label May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls#49908

GH-49888: [C++][Compute] Fix count for run-end encoded arrays with nulls#49908
pitrou merged 2 commits intoapache:mainfrom
fenfeng9:fix/ree-count-nulls

fenfeng9 commented May 2, 2026 •

edited by github-actions Bot

Loading

Uh oh!

fenfeng9 commented May 2, 2026

Uh oh!

Uh oh!

Uh oh!

tadeja left a comment

Uh oh!

fenfeng9 commented May 5, 2026

Uh oh!

fenfeng9 commented May 5, 2026

Uh oh!

fenfeng9 commented May 5, 2026

Uh oh!

pitrou left a comment

Uh oh!

pitrou commented May 6, 2026

Uh oh!

Uh oh!

conbench-apache-arrow Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fenfeng9 commented May 2, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

fenfeng9 commented May 2, 2026

Reproduce

Result

Uh oh!

Uh oh!

Uh oh!

tadeja left a comment

Choose a reason for hiding this comment

Uh oh!

fenfeng9 commented May 5, 2026

Uh oh!

fenfeng9 commented May 5, 2026

Reproduce

Result

Uh oh!

fenfeng9 commented May 5, 2026

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

pitrou commented May 6, 2026

Uh oh!

Uh oh!

conbench-apache-arrow Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fenfeng9 commented May 2, 2026 •

edited by github-actions Bot

Loading