Skip to content

Add trailing_nans parameter to control NaN placement in sort results#235

Open
r-devulap wants to merge 11 commits into
numpy:mainfrom
r-devulap:nan
Open

Add trailing_nans parameter to control NaN placement in sort results#235
r-devulap wants to merge 11 commits into
numpy:mainfrom
r-devulap:nan

Conversation

@r-devulap
Copy link
Copy Markdown
Member

@r-devulap r-devulap commented May 11, 2026

Summary of changes in nan branch

trailing_nans parameter

Added bool trailing_nans = true to all sort/select routines (qsort, qselect, partial_qsort, argsort, argselect). When hasnan=true, this controls where NaN values land in the result:

  • trailing_nans=true (default): NaNs at the end, regardless of sort direction
  • trailing_nans=false: NaNs at the beginning, regardless of sort direction

Previously the library replaced NaNs with ±inf as a workaround; now it partitions them directly to the correct end. Bit-exact NaN payloads are preserved.

descending parameter for argselect

Added bool descending = false to argselect, completing its API to match argsort and qselect. When descending=true:

  • The k-th element is the k-th largest
  • All elements before index k are ≥ it
  • SIMD path partitions at mirror position arrsize-1-k then reverses; scalar path selects a std::greater comparator

Tests

  • Added trailing/leading NaN coverage for qsort, argsort, and qselect
  • Replaced the single test_argselect with four tests: ascending, descending, trailing NaNs, leading NaNs
  • New utils/custom-compare.h with NaN-aware comparators shared across tests

All sorting and selection routines (qsort, qselect, partial_qsort,
argsort, argselect) now accept an optional `bool trailing_nans`
parameter (default: true). When hasnan=true:

- trailing_nans=true  (default): NaN values appear at the end of the
  result, independent of sort direction.
- trailing_nans=false: NaN values appear at the beginning of the
  result, independent of sort direction.

Previously, NaN placement was coupled to the sort direction: ascending
always put NaN at the end, descending always put NaN at the beginning.
This change decouples the two.

Implementation notes:
- qsort path: replace_inf_with_nan now takes both `descending` and
  `trailing_nans`; when they differ, the real-value portion is rotated
  before NaN slots are written back.
- qselect path: NaN is moved to the desired end before partitioning,
  keeping the existing move_nans_to_{end,start}_of_array helpers.
- argsort path: after std_argsort_withnan + optional reverse, a
  std::rotate moves NaN indices to the desired end if needed.
- argselect path: comparator lambda respects trailing_nans.
- keyvalue_qsort/select path: descending reverse now only reverses
  the real portion [0, index_last_elem], leaving trailing NaN in place.
- Scalar fallback: four NaN-aware comparators (compare_nan_end/begin,
  compare_arg_nan_end/begin) added to utils/custom-compare.h.

Tests updated to use compare_nan_end for descending reference sorts
and to skip value-comparison checks when arr[k] is NaN.
@seberg
Copy link
Copy Markdown
Member

seberg commented May 11, 2026

Nice, I have to take a slightly closer look. But should it be nans_largest=true or so (I dunno what the best name is). Because I think with the current choice passing descending=true, has_nan=true would default to trailing_nans=true and thus change the result?
(Just a very minor translation of where you use the direct value vs. the "if they differ".)

@MaanasArora
Copy link
Copy Markdown

MaanasArora commented May 11, 2026

Thanks, this is nice! Going to take a deeper look soon - but, I was wondering if it's actually easier to just change replace_nan_with_inf when possible, to use negative (versus positive) infinity based on trailing_nans? Rotation seems a bit roundabout and could have some overhead, I suppose. Though not entirely familiar with how possible that is (though I did try it locally it and it seemed to work).

@r-devulap
Copy link
Copy Markdown
Member Author

Thanks, this is nice! Going to take a deeper look soon - but, I was wondering if it's actually easier to just change replace_nan_with_inf when possible, to use negative (versus positive) infinity based on trailing_nans? Rotation seems a bit roundabout and could have some overhead, I suppose. Though not entirely familiar with how possible that is (though I did try it locally it and it seemed to work).

It might easier to just reuse existing helper functions: move_nans_to_end_of_array and move_nans_to_start_of_array to move the nan's at the start and then only sort the remaining elements. I will make that change.

r-devulap added 9 commits May 25, 2026 04:01
- Add bool descending = false parameter to all argselect APIs and
  internal implementations (avx512, avx2, scalar, static, dispatch)
- In xss_argselect: reverse the index array after partitioning when
  descending is true; pass descending to std_argselect_withnan for
  NaN-array fast path
- std_argselect_withnan: honour descending flag in the comparator
- scalar argselect: pick comparator once based on descending and
  trailing_nans, then call nth_element once
- Tests: replace test_argselect with four typed tests covering
  ascending, descending, trailing NaNs and leading NaNs
- IS_ARG_PARTITIONED: add descending parameter forwarded to
  IS_ARR_PARTITIONED
argselect_ always partitions in ascending order. For descending, we
must select at position arrsize-1-k so the k-th largest lands at that
mirror index; reversing then moves it to position k with the correct
left/right partition invariant.
@r-devulap
Copy link
Copy Markdown
Member Author

@seberg @MaanasArora This looks good to me - happy to merge once you sign off.

@r-devulap
Copy link
Copy Markdown
Member Author

Nice, I have to take a slightly closer look. But should it be nans_largest=true or so (I dunno what the best name is). Because I think with the current choice passing descending=true, has_nan=true would default to trailing_nans=true and thus change the result?

NumPy, as of today, does not use a descending flag. I’m not sure which other libraries rely on this flag, and I’m not particularly concerned about that change. The newly introduced trailing_nans flag handles NaN placement independently of sort order—it simply lets users specify where NaNs should appear, regardless of whether the sort is ascending or descending. Personally, I don’t find assigning an order to NaNs meaningful. In that sense, this flag mainly provides users with a clear preference for where they want their NaNs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants