Skip to content

[CUB] Replace Shuffle(Up|Down|Index) with cuda::device::warp_shuffle - RadixSort only#8395

Open
fbusato wants to merge 2 commits intoNVIDIA:mainfrom
fbusato:warp-shuffle-radix-sort
Open

[CUB] Replace Shuffle(Up|Down|Index) with cuda::device::warp_shuffle - RadixSort only#8395
fbusato wants to merge 2 commits intoNVIDIA:mainfrom
fbusato:warp-shuffle-radix-sort

Conversation

@fbusato
Copy link
Copy Markdown
Contributor

@fbusato fbusato commented Apr 13, 2026

Description

Split

The change only affects RadixSort routines. This helps to isolate potential performance regressions

@fbusato fbusato self-assigned this Apr 13, 2026
@fbusato fbusato requested a review from a team as a code owner April 13, 2026 18:18
@fbusato fbusato added the cub For all items related to CUB label Apr 13, 2026
@fbusato fbusato added this to CCCL Apr 13, 2026
@fbusato fbusato requested a review from a team as a code owner April 13, 2026 18:18
@fbusato fbusato requested a review from jrhemstad April 13, 2026 18:18
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 13, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 13, 2026
@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good to me. However, it looks like there were no benchmark results

@bernhardmgruber
Copy link
Copy Markdown
Contributor

Perf results look ok on sm120, but not on the other architectures.

@fbusato
Copy link
Copy Markdown
Contributor Author

fbusato commented Apr 14, 2026

Perf results look ok on sm120, but not on the other architectures.

It is fun that I just checked the SASS for Radix Sort on SM90, SM100, SM120 for all benchmark types (int8_t, int16_t, int32_t, int64_t, int128_t, float, double). No difference, they are identical.

If we are not able to get stable performance comparison even for identical SASS, we have a problem (@gevtushenko )

Coincidence: NVIDIA/nvbench#316 (comment)

@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Benchmark Results

Benchmark comparison cancelled.

Results
Artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cub For all items related to CUB

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants