Skip to content

perf: Optimize split_part using bulk-NULL string builders#22283

Open
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/perf-split-part-builder
Open

perf: Optimize split_part using bulk-NULL string builders#22283
neilconway wants to merge 1 commit into
apache:mainfrom
neilconway:neilc/perf-split-part-builder

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

split_part currently uses the Arrow StringBuilder types and computes NULLs on a per-row basis. This PR switches to using the new bulk-NULL string builders.

Benchmarks (Arm64):

  • scalar_utf8_single_char / pos_first: 44.6 µs → 39.1 µs (−11.2%)
  • scalar_utf8_single_char / pos_middle: 102.6 µs → 95.8 µs (−6.4%)
  • scalar_utf8_single_char / pos_negative: 48.6 µs → 42.5 µs (−12.4%)
  • scalar_utf8_multi_char / pos_middle: 134.1 µs → 130.4 µs (−2.9%)
  • scalar_utf8_long_strings / pos_middle: 1089 µs → 1101 µs (+1.3%, within noise)
  • scalar_utf8view_long_parts / pos_middle: 140.6 µs → 138.0 µs (−2.0%, within noise)
  • scalar_utf8view_very_long_parts / pos_first: 68.9 µs → 69.4 µs (+1.3%, within noise)
  • array_utf8_single_char / pos_middle: 360.2 µs → 346.6 µs (−3.9%)
  • array_utf8_multi_char / pos_middle: 354.3 µs → 343.2 µs (−2.2%, borderline)

What changes are included in this PR?

  • Switch to new string builder types; compute NULLs in bulk via NullBuffer::union_many

Are these changes tested?

Yes, covered by existing tests.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the functions Changes to functions implementation label May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize split_part using bulk-NULL string builders

1 participant