Skip to content

Comments

Fix radix sort interface and threaded mesh improvements and extensions#4083

Open
garth-wells wants to merge 31 commits intomainfrom
garth/mesh-hot-loops
Open

Fix radix sort interface and threaded mesh improvements and extensions#4083
garth-wells wants to merge 31 commits intomainfrom
garth/mesh-hot-loops

Conversation

@garth-wells
Copy link
Member

@garth-wells garth-wells commented Feb 11, 2026

dolfinx::sort_perm didn't respect BITS template parameter. PR fixes this, which gives a significant performance boost for some cases.

@garth-wells garth-wells marked this pull request as ready for review February 15, 2026 11:01
Copy link
Contributor

@schnellerhase schnellerhase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removes thread spawning for num_threads=1, recovering previous sequential execution path. New sorting algorithm affects non-threaded execution path.

const common::IndexMap& vertex_index_map)
auto build_entity_list
= [](std::span<std::int32_t> entity_list,
std::span<std::int32_t> entity_list_sorted, auto&& cell_idx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restrict to std::int32_t ranges.

int num_entities_per_cell = cell_type_entities[k].size();
std::vector<std::jthread> threads(num_threads);
for (int i = 0; i < num_threads; ++i)
if (num_threads > 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should allow for compile time deduction for no threading configuration. Resolvable with #3716.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to overcomplicate - it will have no measurable runtime effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not about performance. It ensures optionality of the functionality. When passing num_threads around explicitly this needs to be deduced from the associated type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. The if/else switch isn't inside a hot loop, so what's the issue?

std::vector<std::int32_t> sort_order(
entity_list_sorted.size() / num_vertices_per_entity, 0);
std::iota(sort_order.begin(), sort_order.end(), 0);
boost::sort::parallel_stable_sort(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds new library dependency to Boost.Sort.sort_by_perm in particular avoids lexicographical_compare - performance impact?

@garth-wells garth-wells changed the title Threaded mesh improvements and extensions Fix radix sort interface and threaded mesh improvements and extensions Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants