Skip to content

Refactor NSS and DNSS implementations for improved readability and modularity#2

Closed
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2
Closed

Refactor NSS and DNSS implementations for improved readability and modularity#2
eclipse0922 wants to merge 2 commits intomasterfrom
refactor_2

Conversation

@eclipse0922
Copy link
Owner

@eclipse0922 eclipse0922 commented Feb 8, 2026

User description

  • Rewrote legacy code into clean C++17 style.
  • Added missing method implementations and input validation.
  • Introduced options struct for configurable parameters in NSS and DNSS.
  • Implemented CUDA support for DNSS rotational feature computation.
  • Added new CMake configuration for building with or without CUDA.
  • Updated README to reflect changes and provide build instructions.
  • Added .gitignore to exclude build directories.

PR Type

Enhancement, Bug fix, Documentation


Description

  • Refactored NSS and DNSS implementations into clean C++17 with modular design, replacing legacy Concurrency and Eigen dependencies.

  • Introduced configurable Options structs for both classes, enabling deterministic behavior, bucket configuration, and optional CUDA acceleration.

  • Implemented CUDA kernel for DNSS rotational feature computation with fallback to CPU path when CUDA is unavailable or fails.

  • Updated README with algorithm notes, build instructions, and fixed mathematical inconsistencies (e.g., radians usage, input validation).


Changes walkthrough 📝

Relevant files
Enhancement
NSS.cpp
Refactored NSS/DNSS core logic and sampling                           

NSS.cpp

  • Replaced old bucket-based sampling with modular helper functions and
    clean C++17 logic.
  • Added computeCenteredAndScaledVertices, computeRotationalReturnValue,
    and sphericalBucketIndex helpers.
  • Implemented lazy removal strategy using active masks and priority
    queues for DNSS sampling.
  • Added input validation, deterministic seeding, and CUDA integration
    point.
  • +545/-310
    NSS.h
    Modernized header with Options and clean API                         

    NSS.h

  • Replaced legacy internal state with Options structs for
    configurability.
  • Removed Eigen and concurrency dependencies; added minimal glm::fvec3
    fallback implementation.
  • Added setUseCuda, getUseCuda, and bucket index helper methods.
  • Improved documentation and clarified parameter semantics.
  • +162/-123
    dnss_cuda.cu
    Added CUDA acceleration for DNSS                                                 

    dnss_cuda.cu

  • Implemented CUDA kernel for rotational normal and return computation.
  • Added host-side wrapper with error handling and fallback capability.
  • Used efficient memory layout and device synchronization.
  • +217/-0 
    Configuration changes
    CMakeLists.txt
    Added CUDA build configuration                                                     

    CMakeLists.txt

  • Added DNSS_ENABLE_CUDA option to conditionally compile CUDA support.
  • Integrated dnss_cuda.cu and set appropriate compile definitions and
    properties.
  • Ensured C++17 standard and proper linking with CUDA runtime.
  • +21/-0   
    pr-agent.yml
    Updated PR review agent config                                                     

    .github/workflows/pr-agent.yml

  • Enhanced PR agent configuration with stricter review requirements and
    Korean instructions.
  • Enabled detailed code analysis for bugs, memory leaks, thread safety,
    and effort estimation.
  • +10/-0   
    Documentation
    README.md
    Updated README with build and usage docs                                 

    README.md

  • Expanded documentation to include algorithm notes, build instructions,
    and version improvements.
  • Clarified bucket configurations, CUDA usage, and references.
  • Added fix list and usage examples for CPU/CUDA builds.
  • +50/-10 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @chatgpt-codex-connector
    Copy link

    You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

    Repository owner deleted a comment from github-actions bot Feb 8, 2026
    Repository owner deleted a comment from github-actions bot Feb 8, 2026
    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    Persistent review updated to latest commit 2ba9ba9

    @eclipse0922
    Copy link
    Owner Author

    /improve

    …dularity
    
    - Rewrote legacy code into clean C++17 style.
    - Added missing method implementations and input validation.
    - Introduced options struct for configurable parameters in NSS and DNSS.
    - Implemented CUDA support for DNSS rotational feature computation.
    - Added new CMake configuration for building with or without CUDA.
    - Updated README to reflect changes and provide build instructions.
    - Added .gitignore to exclude build directories.
    Repository owner deleted a comment from github-actions bot Feb 8, 2026
    Repository owner deleted a comment from github-actions bot Feb 8, 2026
    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🏅 Score: 85
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Memory leak in CUDA error path

    In the CUDA implementation (DNSSComputeRotationalFeaturesCuda), when cudaMemcpy from device to host fails (lines 192-193), the function frees device memory but returns false. However, the caller (DNSS::dualNormalSpaceSampling) does not clear the output vectors before returning, potentially leaving stale or uninitialized data in rotational_normals and rotational_returns. This could lead to incorrect sampling behavior or crashes downstream.

    NSS::NSS(const Options &options)
    	: m_options(options), m_runtime_seed(options.random_seed)
    {
    }
    
    void NSS::setOptions(const Options &options)
    {
    	m_options = options;
    Thread safety issue with random seed

    The nextSeed() method modifies the member variable m_runtime_seed without any synchronization mechanism. In a multi-threaded environment where multiple NSS or DNSS instances share the same seed or when methods are called concurrently, this can lead to race conditions and non-deterministic behavior, undermining the purpose of the deterministic option.

    	m_runtime_seed = m_runtime_seed * 1664525u + 1013904223u;
    	return m_runtime_seed;
    }
    Potential division by zero in CUDA kernel

    In the CUDA kernel computeRotationalFeaturesKernel, the code computes normal_norm and sets it to 1.0f if it's too small (lines 51-55). However, it then divides by normal_norm to normalize the normal vector (lines 60-62). If normal_norm was exactly zero before the check, this could still cause numerical instability or incorrect results due to the order of operations.

    float normal_norm = sqrtf(nx * nx + ny * ny + nz * nz);
    if (!(isfinite(normal_norm)) || normal_norm <= kEpsilon)
    {
    	normal_norm = 1.0f;
    }
    
    const float pdx = px / point_norm;
    const float pdy = py / point_norm;
    const float pdz = pz / point_norm;
    const float ndx = nx / normal_norm;
    const float ndy = ny / normal_norm;
    const float ndz = nz / normal_norm;

    @github-actions
    Copy link

    github-actions bot commented Feb 8, 2026

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add theta zero-check to prevent division by near-zero

    The rotational return calculation uses std::atan and std::acos, which expect
    radians, but the original code comments and context suggest angle handling may be
    inconsistent. Verify that theta is consistently in radians throughout the
    implementation. Additionally, the division by theta in rr_positive and rr_negative
    could lead to numerical instability if theta is near zero; consider adding a check
    to prevent division by values smaller than kEpsilon.

    NSS.cpp [51-89]

     float computeRotationalReturnValue(const glm::fvec3 &normalized_point, const glm::fvec3 &normal, float theta)
     {
     	const float point_norm = vectorLength(normalized_point);
     	if (!std::isfinite(point_norm) || point_norm <= kEpsilon)
    +	{
    +		return 0.0f;
    +	}
    +
    +	if (std::abs(theta) <= kEpsilon)
     	{
     		return 0.0f;
     	}
     
     	const glm::fvec3 point_dir = normalized_point / point_norm;
     	const glm::fvec3 normal_dir = normalizeSafe(normal);
     	const float dot_pn = clampFloat(dotProduct(point_dir, normal_dir), -1.0f, 1.0f);
     	const float beta = std::acos(dot_pn);
     
     	const float pp = 2.0f * point_norm * std::sin(theta * 0.5f);
     	const float sin_beta = std::sin(beta);
     	const float cos_beta = std::cos(beta);
     
     	const float pq_positive = pp * std::cos(beta - theta * 0.5f);
     	const float pq_negative = pp * std::cos(-beta - theta * 0.5f);
     
     	const float denominator_positive = std::max(kEpsilon, point_norm - pq_positive * cos_beta);
     	const float denominator_negative = std::max(kEpsilon, point_norm - pq_negative * cos_beta);
     
     	const float atan_positive = std::atan((pq_positive * sin_beta) / denominator_positive);
     	const float atan_negative = std::atan((pq_negative * (-sin_beta)) / denominator_negative);
     
     	const float gamma_positive = theta - atan_positive;
     	const float gamma_negative = theta - atan_negative;
     
     	const float rr_positive = point_norm * gamma_positive / theta;
     	const float rr_negative = point_norm * gamma_negative / theta;
     	const float rotational_return = std::max(rr_positive, rr_negative);
     
     	if (!std::isfinite(rotational_return))
     	{
     		return 0.0f;
     	}
     	return std::max(0.0f, rotational_return);
     }
    Suggestion importance[1-10]: 8

    __

    Why: The suggestion addresses a critical numerical stability issue where division by theta could cause undefined behavior if theta is near zero. Adding a check for std::abs(theta) <= kEpsilon prevents this potential crash, making the function robust against edge cases. This is a high-impact improvement for correctness.

    Medium
    Handle degenerate case by zero-filling normalized vertices

    The computeCenteredAndScaledVertices function computes max_radius using
    vectorLength, which internally uses std::sqrt. If all vertices are identical,
    max_radius will be zero, and the division by max_radius (as inv_radius) will cause
    undefined behavior. Although there's a check for max_radius <= kEpsilon, it returns
    early without populating normalized_vertices, which may leave them uninitialized.
    Consider ensuring normalized_vertices is filled with the centroid even in degenerate
    cases, or document that the caller must handle empty input.

    NSS.cpp [139-174]

     void computeCenteredAndScaledVertices(
     	const std::vector<glm::fvec3> &vertices,
     	std::vector<glm::fvec3> &normalized_vertices)
     {
     	normalized_vertices.clear();
     	normalized_vertices.resize(vertices.size(), glm::fvec3(0.0f));
     	if (vertices.empty())
     	{
     		return;
     	}
     
     	glm::fvec3 centroid(0.0f, 0.0f, 0.0f);
     	for (const glm::fvec3 &vertex : vertices)
     	{
     		centroid += vertex;
     	}
     	centroid /= static_cast<float>(vertices.size());
     
     	float max_radius = 0.0f;
     	for (std::size_t i = 0; i < vertices.size(); ++i)
     	{
     		normalized_vertices[i] = vertices[i] - centroid;
     		max_radius = std::max(max_radius, vectorLength(normalized_vertices[i]));
     	}
     
     	if (max_radius <= kEpsilon)
     	{
    +		for (glm::fvec3 &vertex : normalized_vertices)
    +		{
    +			vertex = glm::fvec3(0.0f, 0.0f, 0.0f);
    +		}
     		return;
     	}
     
     	const float inv_radius = 1.0f / max_radius;
     	for (glm::fvec3 &vertex : normalized_vertices)
     	{
     		vertex *= inv_radius;
     	}
     }
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion ensures consistent behavior in degenerate cases where all vertices are identical (max_radius <= kEpsilon). While the original code returns early without populating normalized_vertices, the suggestion fills them with zeros, making the function's behavior more predictable. This is a good improvement for robustness but not critical.

    Medium
    General
    Clamp bucket indices directly to prevent out-of-bounds

    The sphericalBucketIndex function uses kTwoPi for azimuth normalization, but the
    calculation azimuth / kTwoPi * azimuth_bins may produce an azimuth_index that equals
    azimuth_bins due to floating-point inaccuracies. Although there's a boundary check,
    it's better to clamp the result directly using std::min to avoid potential
    out-of-bounds indexing before the explicit checks.

    NSS.cpp [91-137]

     int sphericalBucketIndex(
     	const glm::fvec3 &raw_normal,
     	int azimuth_bins,
     	int z_bins,
     	bool fold_antipodal)
     {
     	if (azimuth_bins <= 0 || z_bins <= 0)
     	{
     		return -1;
     	}
     
     	glm::fvec3 normal = normalizeSafe(raw_normal);
     
     	if (fold_antipodal)
     	{
     		const bool flip =
     			normal.z < 0.0f ||
     			(normal.z == 0.0f && (normal.y < 0.0f || (normal.y == 0.0f && normal.x < 0.0f)));
     		if (flip)
     		{
     			normal = -normal;
     		}
     	}
     
     	float azimuth = std::atan2(normal.y, normal.x);
     	if (azimuth < 0.0f)
     	{
     		azimuth += kTwoPi;
     	}
     
     	const float z_min = fold_antipodal ? 0.0f : -1.0f;
     	const float z_normalized = clampFloat((normal.z - z_min) / (1.0f - z_min), 0.0f, 1.0f);
     
     	int azimuth_index = static_cast<int>(std::floor(azimuth / kTwoPi * static_cast<float>(azimuth_bins)));
     	int z_index = static_cast<int>(std::floor(z_normalized * static_cast<float>(z_bins)));
     
    -	if (azimuth_index >= azimuth_bins)
    -	{
    -		azimuth_index = azimuth_bins - 1;
    -	}
    -	if (z_index >= z_bins)
    -	{
    -		z_index = z_bins - 1;
    -	}
    +	azimuth_index = std::min(azimuth_index, azimuth_bins - 1);
    +	z_index = std::min(z_index, z_bins - 1);
     
     	return azimuth_index * z_bins + z_index;
     }
    Suggestion importance[1-10]: 6

    __

    Why: The suggestion improves robustness by using std::min to clamp indices directly, preventing potential out-of-bounds access due to floating-point inaccuracies. While the existing boundary checks provide some protection, proactively clamping is a good defensive programming practice. However, it's a moderate improvement rather than critical.

    Low

    Repository owner deleted a comment from github-actions bot Feb 8, 2026
    @eclipse0922 eclipse0922 closed this Feb 8, 2026
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant