Fix radix sort interface and threaded mesh improvements and extensions#4083
Fix radix sort interface and threaded mesh improvements and extensions#4083garth-wells wants to merge 31 commits intomainfrom
Conversation
schnellerhase
left a comment
There was a problem hiding this comment.
Removes thread spawning for num_threads=1, recovering previous sequential execution path. New sorting algorithm affects non-threaded execution path.
| const common::IndexMap& vertex_index_map) | ||
| auto build_entity_list | ||
| = [](std::span<std::int32_t> entity_list, | ||
| std::span<std::int32_t> entity_list_sorted, auto&& cell_idx, |
There was a problem hiding this comment.
Restrict to std::int32_t ranges.
| int num_entities_per_cell = cell_type_entities[k].size(); | ||
| std::vector<std::jthread> threads(num_threads); | ||
| for (int i = 0; i < num_threads; ++i) | ||
| if (num_threads > 1) |
There was a problem hiding this comment.
Should allow for compile time deduction for no threading configuration. Resolvable with #3716.
There was a problem hiding this comment.
No need to overcomplicate - it will have no measurable runtime effect.
There was a problem hiding this comment.
Not about performance. It ensures optionality of the functionality. When passing num_threads around explicitly this needs to be deduced from the associated type.
There was a problem hiding this comment.
I don't follow. The if/else switch isn't inside a hot loop, so what's the issue?
| std::vector<std::int32_t> sort_order( | ||
| entity_list_sorted.size() / num_vertices_per_entity, 0); | ||
| std::iota(sort_order.begin(), sort_order.end(), 0); | ||
| boost::sort::parallel_stable_sort( |
There was a problem hiding this comment.
Adds new library dependency to Boost.Sort.sort_by_perm in particular avoids lexicographical_compare - performance impact?
… garth/mesh-hot-loops
dolfinx::sort_permdidn't respectBITStemplate parameter. PR fixes this, which gives a significant performance boost for some cases.