-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
✨ feat: Support parallel computation for NDT #6376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Support parallel computation for NDT by OpenMP with different neighbor search method
Use signed std::ptrdiff_t instead of std::size_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds OpenMP-based parallel computation support to the Normal Distributions Transform (NDT) registration algorithm, along with multiple neighbor search methods to optimize performance. The changes introduce thread-level parallelization to speed up the derivative computation phase of NDT.
Key changes:
- Adds a
NeighborSearchMethodenum with four options: KDTREE (original), DIRECT26, DIRECT7, and DIRECT1 - Implements
setNumberOfThreads()method with OpenMP support for parallel execution - Parallelizes the main computation loop in
computeDerivatives()using OpenMP directives - Updates test to verify functionality with the new API
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| registration/include/pcl/registration/ndt.h | Adds enum for neighbor search methods, declares thread control API and private member variables for threading configuration |
| registration/include/pcl/registration/impl/ndt.hpp | Implements setNumberOfThreads() with OpenMP detection, parallelizes derivative computation loop, adds switch statement for different search methods |
| test/registration/test_ndt.cpp | Updates test to explicitly configure search method and thread count for validation |
Comments suppressed due to low confidence (1)
registration/include/pcl/registration/impl/ndt.hpp:269
- The OpenMP parallel region has a critical thread safety issue. The variables
score,score_gradient, andhessianare marked as shared, but multiple threads will update them concurrently (viascore +=on line 266 and calls toupdateDerivativeswhich modifiesscore_gradientandhessian). These concurrent updates to shared variables will cause race conditions, leading to incorrect results. You need to use reduction clauses forscoreand critical sections or atomic operations forscore_gradientandhessian, or alternatively use thread-local accumulators that are combined after the parallel region.
#pragma omp parallel for num_threads(threads_) schedule(dynamic, 32) \
shared(score, score_gradient, hessian) firstprivate(neighborhood, distances)
// Update gradient and hessian for each point, line 17 in Algorithm 2 [Magnusson 2009]
for (std::ptrdiff_t idx = 0; idx < static_cast<std::ptrdiff_t>(input_->size());
idx++) {
// Transformed Point
const auto& x_trans_pt = trans_cloud[idx];
// Find neighbors with different search method
switch (search_method_) {
case KDTREE:
// Radius search has been experimentally faster than direct neighbor checking.
target_cells_.radiusSearch(x_trans_pt, resolution_, neighborhood, distances);
break;
case DIRECT26:
target_cells_.getNeighborhoodAtPoint(x_trans_pt, neighborhood);
break;
case DIRECT7:
target_cells_.getFaceNeighborsAtPoint(x_trans_pt, neighborhood);
break;
case DIRECT1:
target_cells_.getVoxelAtPoint(x_trans_pt, neighborhood);
break;
}
// Original Point
const Eigen::Vector3d x = (*input_)[idx].getVector3fMap().template cast<double>();
const Eigen::Vector3d x_trans_position =
x_trans_pt.getVector3fMap().template cast<double>();
for (const auto& cell : neighborhood) {
// Denorm point, x_k' in Equations 6.12 and 6.13 [Magnusson 2009]
const Eigen::Vector3d x_trans = x_trans_position - cell->getMean();
// Inverse Covariance of Occupied Voxel
// Uses precomputed covariance for speed.
const Eigen::Matrix3d c_inv = cell->getInverseCov();
// Compute derivative of transform function w.r.t. transform vector, J_E and H_E
// in Equations 6.18 and 6.20 [Magnusson 2009]
computePointDerivatives(x);
// Update score, gradient and hessian, lines 19-21 in Algorithm 2, according to
// Equations 6.10, 6.12 and 6.13, respectively [Magnusson 2009]
score +=
updateDerivatives(score_gradient, hessian, x_trans, c_inv, compute_hessian);
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Apply copilot suggestion to fix some error
mvieth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to at least put a omp critical region around computePointDerivatives(x) and updateDerivatives(...) because they both use global variables.
Also maybe put NeighborSearchMethod into the NormalDistributionTransform class, to make it clear that they belong together?
Support parallel computation for NDT by OpenMP with different neighbor search method