perf: Optimize array_sort for arrays of non-primitive types#21006
perf: Optimize array_sort for arrays of non-primitive types#21006neilconway wants to merge 4 commits intoapache:mainfrom
array_sort for arrays of non-primitive types#21006Conversation
|
Benchmarks (M4 Max): |
array_sort for arrays of non-primitive types
| let list_arr = if elements.is_empty() { | ||
| GenericListArray::<OffsetSize>::new_null(field, row_count) | ||
| /// Select elements from `values` at the given `indices` using `compute::take`. | ||
| fn take_by_indices<OffsetSize: OffsetSizeTrait>( |
There was a problem hiding this comment.
We use the same pattern in a few different files (e.g., array_reverse, array_distinct) -- probably worth moving to a shared helper. I'll look at that as a separate PR though.
| if list_array.is_null(i) { | ||
| array_lengths.push(0); | ||
| } else { | ||
| let arr_ref = list_array.value(i); |
There was a problem hiding this comment.
This existing code seems very inefficient to me as it converts each list to an individual array and sorts those and then also concatenates those small arrays.
I think one could make a sort kernel that sorts the lists directly in a target buffer, that would be much faster.
| values: sliced, | ||
| options: sort_options, | ||
| }]; | ||
| compute::lexsort_to_indices(&sort_columns, None)? |
There was a problem hiding this comment.
Ideally you could run a kernel on an entire array as well instead of slicing.
| .map(|a| a.as_ref()) | ||
| .collect::<Vec<&dyn Array>>(); | ||
| let sorted_values = if indices.is_empty() { | ||
| values.slice(0, 0) |
There was a problem hiding this comment.
This might keep on to the slice (not sure)? Better would be to create a new empty array.
| indices: &[OffsetSize], | ||
| ) -> Result<ArrayRef> { | ||
| let indices_array: ArrayRef = if OffsetSize::IS_LARGE { | ||
| Arc::new(UInt64Array::from( |
There was a problem hiding this comment.
Could save this UInt64Array as indices is already Vec<OffsetSize> and could be converted to UInt64Array / UInt32Array without allocation
| array_lengths.push(sorted_array.len()); | ||
| arrays.push(sorted_array); | ||
| for &local_idx in sorted_indices.values() { | ||
| indices.push(start + OffsetSize::usize_as(local_idx as usize)); |
There was a problem hiding this comment.
can use indices.extend here as well
Which issue does this PR close?
array_sort#21005.Rationale for this change
array_sortis fast for arrays of primitive types, because it dispatches to Arrow'ssortkernel, which supports in-place sorting for primitive types. However, for arrays of non-primitive types, Arrow's sort kernel doessort_by_indicesand thentake; we callsortfor every row and thenconcatthe results. It would be faster to callsort_by_indicesfor each row ourselves, accumulate all the sorted indices for the entire batch, and then do a singletaketo construct the result set.For arrays of strings, this improves
array_sortperformance by ~40% for 5-element arrays, ~15% for 20-element arrays, and 5% for 100-element arrays. The improvement diminishes as array length increases because the cost of the sort itself begins to dominate.What changes are included in this PR?
array_sortarray_sortarray_sortAre these changes tested?
Yes.
Are there any user-facing changes?
No.