Skip to content

Conversation

@Tao-Sheng
Copy link

Support parallel computation for NDT by OpenMP with different neighbor search method

Support parallel computation for NDT by OpenMP with different neighbor search method
Use signed std::ptrdiff_t instead of std::size_t
@larshg larshg added module: registration changelog: enhancement Meta-information for changelog generation labels Dec 11, 2025
@larshg larshg added this to the pcl-1.16.0 milestone Dec 11, 2025
@larshg larshg requested review from Copilot and mvieth December 11, 2025 06:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenMP-based parallel computation support to the Normal Distributions Transform (NDT) registration algorithm, along with multiple neighbor search methods to optimize performance. The changes introduce thread-level parallelization to speed up the derivative computation phase of NDT.

Key changes:

  • Adds a NeighborSearchMethod enum with four options: KDTREE (original), DIRECT26, DIRECT7, and DIRECT1
  • Implements setNumberOfThreads() method with OpenMP support for parallel execution
  • Parallelizes the main computation loop in computeDerivatives() using OpenMP directives
  • Updates test to verify functionality with the new API

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
registration/include/pcl/registration/ndt.h Adds enum for neighbor search methods, declares thread control API and private member variables for threading configuration
registration/include/pcl/registration/impl/ndt.hpp Implements setNumberOfThreads() with OpenMP detection, parallelizes derivative computation loop, adds switch statement for different search methods
test/registration/test_ndt.cpp Updates test to explicitly configure search method and thread count for validation
Comments suppressed due to low confidence (1)

registration/include/pcl/registration/impl/ndt.hpp:269

  • The OpenMP parallel region has a critical thread safety issue. The variables score, score_gradient, and hessian are marked as shared, but multiple threads will update them concurrently (via score += on line 266 and calls to updateDerivatives which modifies score_gradient and hessian). These concurrent updates to shared variables will cause race conditions, leading to incorrect results. You need to use reduction clauses for score and critical sections or atomic operations for score_gradient and hessian, or alternatively use thread-local accumulators that are combined after the parallel region.
#pragma omp parallel for num_threads(threads_) schedule(dynamic, 32)                   \
    shared(score, score_gradient, hessian) firstprivate(neighborhood, distances)
  // Update gradient and hessian for each point, line 17 in Algorithm 2 [Magnusson 2009]
  for (std::ptrdiff_t idx = 0; idx < static_cast<std::ptrdiff_t>(input_->size());
       idx++) {
    // Transformed Point
    const auto& x_trans_pt = trans_cloud[idx];

    // Find neighbors with different search method
    switch (search_method_) {
    case KDTREE:
      // Radius search has been experimentally faster than direct neighbor checking.
      target_cells_.radiusSearch(x_trans_pt, resolution_, neighborhood, distances);
      break;
    case DIRECT26:
      target_cells_.getNeighborhoodAtPoint(x_trans_pt, neighborhood);
      break;
    case DIRECT7:
      target_cells_.getFaceNeighborsAtPoint(x_trans_pt, neighborhood);
      break;
    case DIRECT1:
      target_cells_.getVoxelAtPoint(x_trans_pt, neighborhood);
      break;
    }

    // Original Point
    const Eigen::Vector3d x = (*input_)[idx].getVector3fMap().template cast<double>();
    const Eigen::Vector3d x_trans_position =
        x_trans_pt.getVector3fMap().template cast<double>();

    for (const auto& cell : neighborhood) {
      // Denorm point, x_k' in Equations 6.12 and 6.13 [Magnusson 2009]
      const Eigen::Vector3d x_trans = x_trans_position - cell->getMean();
      // Inverse Covariance of Occupied Voxel
      // Uses precomputed covariance for speed.
      const Eigen::Matrix3d c_inv = cell->getInverseCov();

      // Compute derivative of transform function w.r.t. transform vector, J_E and H_E
      // in Equations 6.18 and 6.20 [Magnusson 2009]
      computePointDerivatives(x);
      // Update score, gradient and hessian, lines 19-21 in Algorithm 2, according to
      // Equations 6.10, 6.12 and 6.13, respectively [Magnusson 2009]
      score +=
          updateDerivatives(score_gradient, hessian, x_trans, c_inv, compute_hessian);
    }
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Apply copilot suggestion to fix some error
Copy link
Member

@mvieth mvieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to at least put a omp critical region around computePointDerivatives(x) and updateDerivatives(...) because they both use global variables.
Also maybe put NeighborSearchMethod into the NormalDistributionTransform class, to make it clear that they belong together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: enhancement Meta-information for changelog generation module: registration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants