NanoVDB `PointsToGrid::countNodes`: Use segmented radix sort for higher tile counts by swahtz · Pull Request #2170 · AcademySoftwareFoundation/openvdb

swahtz · 2026-02-17T03:37:20Z

This pull request introduces a performance optimization to the voxel key sorting process in the implementation of PointsToGrid. The main improvement is the addition of a bulk segmented sort path for cases with many tiles instead of a serial loop of kernel launches per-tile, which significantly speeds up sorting in large datasets. I found that creating a grid for the Stanford dragon at a voxel size that produced 200 tiles, the speedup to sorting was 19x and for a voxel size producing 6000 tiles, the speedup was 73x. The end-to-end PointsToGrid improvements were 17% for the case with 6,000 tiles. For low tile counts, I found that the performance of segmented radix sort was worse than the original so I include a fallback when tile counts are low.

Performance and algorithm improvements:

Added a new BulkVoxelKeyFunctor struct and associated kernel launch to efficiently compute voxel keys for all points in a single pass (instead of multiple kernel launches) when the number of tiles exceeds a threshold. This enables a bulk segmented sort path for large tile counts, improving performance for large datasets.
Modified the sorting logic to choose between bulk segmented sort and serial per-tile sort based on the number of tiles (SEGMENTED_SORT_TILE_THRESHOLD). Bulk sort is used for large tile counts, while the original per-tile sort is retained for small tile counts where the original per-tile sort was faster.

Minor fixes:

Fixed the setVerbose method to only set the local verbosity variable, removing the flag manipulation for clarity and correctness.

… key computation. Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

harrism · 2026-02-17T21:27:44Z

nanovdb/nanovdb/tools/cuda/PointsToGrid.cuh

+        // Binary search in prefix-sum offsets to find tile index for this point
+        uint32_t lo = 0, hi = numTiles + 1;
+        while (lo < hi) {
+            uint32_t mid = (lo + hi) / 2;
+            if (d_tile_offsets[mid] <= uint32_t(tid)) {
+                lo = mid + 1;
+            } else {
+                hi = mid;
+            }
+        }


💡 suggestion: ‏Use thrust::lower_bound here with a thrust::seq execution policy to do the same thing more literately and robustly.

Good idea, thanks. I actually think is upper_bound but the same gist.

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Copilot

Pull request overview

This PR optimizes nanovdb::tools::cuda::PointsToGrid::countNodes by adding a bulk segmented radix-sort path for voxel-key sorting when the number of tiles is high, while keeping the existing per-tile sorting path for low tile counts to avoid overhead regressions.

Changes:

Added a bulk voxel-key generation kernel (BulkVoxelKeyFunctor) to compute voxel keys for all points in one launch when tile counts are high.
Switched sorting to cub::DeviceSegmentedRadixSort::SortPairs for high tile counts using computed per-tile segment offsets, with a threshold-based fallback to the original per-tile loop.
Simplified setVerbose to only update the local verbosity level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

harrism

Looks like a great optimization. Well done.

harrism · 2026-02-18T02:14:02Z

nanovdb/nanovdb/tools/cuda/PointsToGrid.cuh

+                uint64_t(NanoLeaf< BuildT>::CoordToOffset(ijk));        // voxel offset:  8^3 =  2^9,   i.e. first 9 bits
+        };// voxelKey lambda functor
+        // Find tile index for this point via upper_bound in prefix-sum offsets
+        const uint64_t tileID = thrust::upper_bound(thrust::seq, d_tile_offsets, d_tile_offsets + numTiles + 1, uint32_t(tid)) - d_tile_offsets - 1;


👏 praise: ‏Nice!

matthewdcong · 2026-02-18T22:51:51Z

Is the difference in absolute time for small numbers of tiles large enough to warrant keeping the older path?

swahtz · 2026-02-18T23:56:22Z

Is the difference in absolute time for small numbers of tiles large enough to warrant keeping the older path?

I was unsure if we should keep the older path too. For small numbers of tiles, the overhead for the segmented sort makes it more expensive. For 100k points and 4 tiles, the new segmented sort is 0.371ms and the old serial per-tile sort is .278ms, a 34% regression (running on my Ada RTX 6000). For small tile counts, do you think it's reasonable to trade that off for code complexity?

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Using segmented radix sort for higher tile counts for optimized voxel…

4cefc75

… key computation. Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz requested a review from kmuseth as a code owner February 17, 2026 03:37

swahtz added the nanovdb label Feb 17, 2026

harrism reviewed Feb 17, 2026

View reviewed changes

Switch handrolled binary search to thrust::upper_bound

6802f1f

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

swahtz requested a review from Copilot February 17, 2026 21:46

Copilot started reviewing on behalf of swahtz February 17, 2026 21:47 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

harrism approved these changes Feb 18, 2026

View reviewed changes

Implemented segmented radix sort in DistributedPointsToGrid

540d274

Signed-off-by: Jonathan Swartz <jonathan@jswartz.info>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

NanoVDB `PointsToGrid::countNodes`: Use segmented radix sort for higher tile counts#2170

NanoVDB `PointsToGrid::countNodes`: Use segmented radix sort for higher tile counts#2170
swahtz wants to merge 3 commits intoAcademySoftwareFoundation:masterfrom
swahtz:pointstogrid_segmented_sort

swahtz commented Feb 17, 2026 •

edited

Loading

Uh oh!

harrism Feb 17, 2026

Uh oh!

swahtz Feb 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

harrism left a comment

Uh oh!

harrism Feb 18, 2026

Uh oh!

matthewdcong commented Feb 18, 2026

Uh oh!

swahtz commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

swahtz commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harrism Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

swahtz Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

harrism left a comment

Choose a reason for hiding this comment

Uh oh!

harrism Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

matthewdcong commented Feb 18, 2026

Uh oh!

swahtz commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

swahtz commented Feb 17, 2026 •

edited

Loading