-
Notifications
You must be signed in to change notification settings - Fork 1
Add golden file regression tests and code cleanup #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a comprehensive golden file test infrastructure that verifies solver outputs match pre-computed reference files with near-bitwise tolerance (1e-10). This enables safe refactoring by catching any numerical changes. Test cases (6 total): - channel_komega: 2D channel with SST k-omega transport model - channel_earsm: 2D channel with EARSM-WJ algebraic stress model - channel_mlp: 2D channel with NN-MLP eddy viscosity model - channel_tbnn: 2D channel with NN-TBNN anisotropy model - mixing_length: 2D channel with mixing length baseline model - laminar_3d: 3D duct with laminar flow Each test runs 10 time steps and compares u, v, (w), p, (k, omega) fields against version-controlled reference files. Tests complete in ~6 seconds. New files: - tests/golden/test_golden.cpp: Unified test driver - tests/golden/regenerate_golden.sh: Script to regenerate references - tests/golden/reference/*.dat: 25 golden reference files CI integration: - New test suite: ./scripts/run_ci_local.sh golden - Added run_golden_test() helper function for parameterized tests - Golden tests run as part of 'all' and 'full' test suites 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The golden reference files were originally generated on a GPU build, but CI runs in CPU mode. This caused failures due to small numerical differences between CPU and GPU turbulence model implementations. Regenerated all 23 reference files using the CPU build to match the CI environment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deleted 8 unused functions: - ScalarField::min_interior() (fields.cpp/hpp) - TensorField::fill() both 2D and 3D versions (fields.cpp/hpp) - RANSSolver::print_velocity_profile() (solver.cpp/hpp) - TurbulenceGEP::compute_gep_factor() (turbulence_gep.cpp/hpp) - TimingStats::print_gpu_utilization_summary() (timing.cpp/hpp) - TimingStats::assert_gpu_dominant() (timing.cpp/hpp) - MLP::scale_input() (nn_core.cpp/hpp) - PoissonSolver::sor_iteration() (poisson_solver.cpp/hpp) Verified: - CPU build passes all tests - GPU build passes all tests - Golden file tests pass on CPU 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Golden files regenerated on GPU build to serve as the stable reference. These should NOT need to change during refactoring phases. Also added strain rate inline helpers to gpu_utils.hpp: - strain_magnitude_frobenius() - standard 2D Frobenius norm - strain_magnitude_frobenius_3d() - 3D version - strain_magnitude_simple() - without factor of 2 - strain_squared() / strain_squared_3d() - for production terms 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds golden file regression testing infrastructure, removes deprecated public APIs (scaling helpers, diagnostic methods, solver utilities), introduces GPU-compatible strain-rate calculation functions, and refactors turbulence transport to unify CPU/GPU execution paths using precomputed wall distances and device-view parameters. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
examples/08_duct_flow/dns_unsteady.cfg (1)
69-70: Document the purpose of perturbation_amplitude parameter.The new
perturbation_amplitude = 0.1parameter is added without explanation. Consider adding a comment describing how this parameter affects the simulation (e.g., initial condition perturbation, forcing amplitude, etc.).scripts/run_ci_local.sh (1)
165-215: Consider adding parameter validation for case_name.The
run_golden_test()function is well-structured and handles edge cases appropriately (missing binary, skipped tests). Consider adding validation to ensurecase_nameis not empty to catch invocation errors early.Optional: Add parameter validation
run_golden_test() { local test_name=$1 local case_name=$2 local timeout_secs=${3:-60} + + if [ -z "$case_name" ]; then + log_failure "$test_name (case_name not provided)" + FAILED=$((FAILED + 1)) + return 1 + fi local test_binary="${BUILD_DIR}/test_golden"src/gpu_kernels.cpp (1)
97-97: Unused parameteru_refafter refactoring.The
u_refparameter is passed tocompute_mlp_scalar_features_gpubut is no longer used in the feature calculations (replaced by localu_ref_local). It's only cast to(void)on line 175. Consider removing it from the function signature if it's no longer needed.Also applies to: 175-175
src/turbulence_baseline.cpp (1)
282-284: Consider usingstrain_magnitude_frobeniusfor consistency.The GPU path at line 284 and CPU path at line 365 still use inline strain magnitude calculations. For consistency with
mixing_length_cell_kernel(line 49) and to centralize the formula, consider usingstrain_magnitude_frobenius(Sxx, Syy, Sxy)here as well.🔎 Suggested change for line 284
- const double S_mag = sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy)); + const double S_mag = strain_magnitude_frobenius(Sxx, Syy, Sxy);src/turbulence_transport.cpp (1)
840-848: Performance: GPU-CPU-GPU roundtrip for wall boundary conditions.The wall boundary conditions require syncing k and omega from device to host (line 842), applying BCs on the host (lines 844-845), then syncing back to device (line 847). This adds two full field transfers per time step.
Consider implementing GPU-resident wall BC kernels in a future optimization pass to eliminate this roundtrip.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (23)
tests/golden/reference/channel_earsm_k.datis excluded by!**/*.dattests/golden/reference/channel_earsm_omega.datis excluded by!**/*.dattests/golden/reference/channel_earsm_p.datis excluded by!**/*.dattests/golden/reference/channel_earsm_u.datis excluded by!**/*.dattests/golden/reference/channel_earsm_v.datis excluded by!**/*.dattests/golden/reference/channel_komega_k.datis excluded by!**/*.dattests/golden/reference/channel_komega_omega.datis excluded by!**/*.dattests/golden/reference/channel_komega_p.datis excluded by!**/*.dattests/golden/reference/channel_komega_u.datis excluded by!**/*.dattests/golden/reference/channel_komega_v.datis excluded by!**/*.dattests/golden/reference/channel_mlp_p.datis excluded by!**/*.dattests/golden/reference/channel_mlp_u.datis excluded by!**/*.dattests/golden/reference/channel_mlp_v.datis excluded by!**/*.dattests/golden/reference/channel_tbnn_p.datis excluded by!**/*.dattests/golden/reference/channel_tbnn_u.datis excluded by!**/*.dattests/golden/reference/channel_tbnn_v.datis excluded by!**/*.dattests/golden/reference/laminar_3d_p.datis excluded by!**/*.dattests/golden/reference/laminar_3d_u.datis excluded by!**/*.dattests/golden/reference/laminar_3d_v.datis excluded by!**/*.dattests/golden/reference/laminar_3d_w.datis excluded by!**/*.dattests/golden/reference/mixing_length_p.datis excluded by!**/*.dattests/golden/reference/mixing_length_u.datis excluded by!**/*.dattests/golden/reference/mixing_length_v.datis excluded by!**/*.dat
📒 Files selected for processing (23)
CMakeLists.txtexamples/08_duct_flow/dns_unsteady.cfginclude/fields.hppinclude/gpu_utils.hppinclude/nn_core.hppinclude/poisson_solver.hppinclude/solver.hppinclude/timing.hppinclude/turbulence_gep.hppinclude/turbulence_transport.hppscripts/run_ci_local.shsrc/fields.cppsrc/gpu_kernels.cppsrc/nn_core.cppsrc/poisson_solver.cppsrc/solver.cppsrc/timing.cppsrc/turbulence_baseline.cppsrc/turbulence_gep.cppsrc/turbulence_transport.cpptests/golden/regenerate_golden.shtests/golden/test_golden.cpptests/test_transport_trace.cpp
💤 Files with no reviewable changes (8)
- src/turbulence_gep.cpp
- src/nn_core.cpp
- src/solver.cpp
- include/timing.hpp
- src/timing.cpp
- src/poisson_solver.cpp
- include/turbulence_gep.hpp
- src/fields.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,cc,cxx}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx}: Do not add platform-specific tolerances with#ifdef __APPLE__or similar platform checks in tests
Include code comments explaining WHY, not WHAT; document numerical algorithms; note non-obvious optimizations
Files:
src/gpu_kernels.cppsrc/turbulence_transport.cppsrc/turbulence_baseline.cpp
**/*.{cpp,cc,cxx,h,hpp}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx,h,hpp}: Avoid overly strict floating-point comparisons (e.g.,==for doubles); use appropriate tolerances based on algorithm
Fix all compiler warnings before pushing; use[[maybe_unused]]for intentionally unused variables in assertions
Useconstfor read-only references and mark methodsconstif they don't modify state
Use RAII for resource management; use smart pointers or RAII wrappers instead of manualnew/deleteormalloc/free
UseOmpDeviceBufferwrapper for GPU buffer management instead of manual allocation
Check return values and use exceptions for error conditions with informative error messages
Files:
src/gpu_kernels.cppinclude/poisson_solver.hppinclude/solver.hppinclude/gpu_utils.hppinclude/nn_core.hppsrc/turbulence_transport.cppinclude/fields.hppinclude/turbulence_transport.hppsrc/turbulence_baseline.cpp
include/**/*.{h,hpp}
📄 CodeRabbit inference engine (.cursorrules)
Every public function needs documentation explaining parameters, return values, and side effects; include usage examples for complex functions
Files:
include/poisson_solver.hppinclude/solver.hppinclude/gpu_utils.hppinclude/nn_core.hppinclude/fields.hppinclude/turbulence_transport.hpp
**/*.{sh,slurm,batch}
📄 CodeRabbit inference engine (.cursor/rules/lean-changes.mdc)
Always submit slurm jobs using the embers QOS
Files:
scripts/run_ci_local.sh
🧠 Learnings (9)
📓 Common learnings
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_physics_validation.cpp : Navier-Stokes solver validation tests must verify: viscous terms, pressure gradient, parabolic velocity profiles, divergence-free constraint, momentum balance, channel symmetry, cross-model consistency, and numerical stability
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_physics_validation.cpp : Navier-Stokes solver validation tests must verify: viscous terms, pressure gradient, parabolic velocity profiles, divergence-free constraint, momentum balance, channel symmetry, cross-model consistency, and numerical stability
Applied to files:
CMakeLists.txtsrc/gpu_kernels.cppinclude/solver.hppinclude/gpu_utils.hppscripts/run_ci_local.shsrc/turbulence_transport.cppinclude/turbulence_transport.hppsrc/turbulence_baseline.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Use `OmpDeviceBuffer` wrapper for GPU buffer management instead of manual allocation
Applied to files:
include/gpu_utils.hppsrc/turbulence_transport.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Explicitly upload data to GPU; check `USE_GPU_OFFLOAD` is defined; verify `omp_get_num_devices() > 0` at runtime for GPU offload code
Applied to files:
include/gpu_utils.hppsrc/turbulence_transport.cppsrc/turbulence_baseline.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_tg_validation.cpp : Taylor-Green Vortex test must validate energy decay with <5% error over 100 timesteps using viscous terms, time integration, and periodic BCs
Applied to files:
examples/08_duct_flow/dns_unsteady.cfgscripts/run_ci_local.shsrc/turbulence_transport.cppinclude/turbulence_transport.hppsrc/turbulence_baseline.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Applied to files:
scripts/run_ci_local.sh
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
Applied to files:
scripts/run_ci_local.sh
📚 Learning: 2025-12-25T20:38:56.549Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursor/rules/lean-debug.mdc:0-0
Timestamp: 2025-12-25T20:38:56.549Z
Learning: For GitHub CI debugging, run `./tools/ci_packet.sh <runid>` via terminal tool.
Applied to files:
scripts/run_ci_local.sh
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Create feature branches from latest main; run `./test_before_ci.sh` to verify starting state before development
Applied to files:
scripts/run_ci_local.sh
🧬 Code graph analysis (4)
src/gpu_kernels.cpp (7)
include/turbulence_transport.hpp (4)
delta(124-124)delta(124-124)nu(120-120)nu(120-120)include/turbulence_gep.hpp (4)
delta(49-49)delta(49-49)nu(48-48)nu(48-48)include/turbulence_nn_mlp.hpp (2)
delta(50-50)delta(50-50)include/turbulence_nn_tbnn.hpp (2)
delta(52-52)delta(52-52)include/turbulence_model.hpp (4)
delta(193-193)delta(193-193)nu(189-189)nu(189-189)include/turbulence_baseline.hpp (4)
delta(40-40)delta(40-40)delta(85-85)delta(85-85)include/features.hpp (1)
nu(129-129)
src/turbulence_transport.cpp (2)
include/gpu_kernels.hpp (1)
compute_gradients_from_mac_gpu(30-46)src/features.cpp (2)
compute_gradients_from_mac(26-59)compute_gradients_from_mac(26-30)
include/turbulence_transport.hpp (3)
include/turbulence_nn_mlp.hpp (3)
mesh(35-43)mesh(63-63)mesh(96-96)include/turbulence_nn_tbnn.hpp (6)
mesh(37-45)mesh(63-63)mesh(106-106)mesh(108-108)mesh(113-120)mesh(123-123)include/turbulence_earsm.hpp (7)
mesh(115-115)mesh(119-126)mesh(158-158)mesh(170-170)mesh(320-320)mesh(325-325)mesh(329-337)
src/turbulence_baseline.cpp (1)
include/gpu_utils.hpp (2)
strain_magnitude_frobenius(28-30)strain_magnitude_frobenius(28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: gpu-tests
- GitHub Check: build-and-test (ubuntu-latest, Debug)
🔇 Additional comments (18)
include/solver.hpp (1)
144-144: This function is unused and safe to remove.
print_velocity_profilehas no references in the codebase, confirming it is a debug helper that can be safely removed as part of the cleanup.examples/08_duct_flow/dns_unsteady.cfg (1)
45-45: Verify timestep reduction impact on simulation runtime.The timestep
dtwas reduced from 0.005 to 0.001 (5× smaller). Withmax_iter = 30000, this yieldsT_final ≈ 30, which is inconsistent with the comment on line 48 stating "T_final ~ 150, about 12 flow-throughs."Either update
max_iterto maintain the intended simulation duration or verify that the shorter duration is acceptable for this DNS case.include/fields.hpp (3)
33-34: LGTM - Comment correctly reflects API change.The comment has been appropriately updated to reflect that only
max_interior()remains aftermin_interior()was removed.
33-34: min_interior() removal is safe and complete.No remaining callers of
ScalarField::min_interior()exist in the repository. The method removal has no dangling references.
158-228: TensorField::fill() removal verified — no remaining callers found.Confirmation that the component-wise
fill()overloads for TensorField have been successfully removed with no remaining callers in the codebase. All.fill()calls target ScalarField or VectorField objects only.CMakeLists.txt (1)
330-339: LGTM - Golden test infrastructure well-integrated.The golden file regression test suite is properly configured with individual test cases for each scenario. The test binary and test case registration follow established patterns in the CMakeLists.txt file.
scripts/run_ci_local.sh (2)
8-10: LGTM - Usage documentation updated correctly.The usage instructions properly document the new
goldentest suite option.
376-388: LGTM - Golden tests properly integrated into CI flow.The golden file regression tests are appropriately included in the
all,golden, andfulltest suites. The test cases match those registered in CMakeLists.txt, and timeout values are reasonable for regression tests.include/nn_core.hpp (1)
91-91: Removal ofscale_input()is safe; no remaining callers found.No references to
scale_input()exist elsewhere in the codebase. The public API removal is complete.include/poisson_solver.hpp (1)
70-70: No remaining callers ofsor_iteration()found. The method has been successfully removed from the public API, and onlysor_rb_iteration()remains available. The codebase contains no references to the removed method.src/gpu_kernels.cpp (1)
147-165: Verify feature 5 behavior is intentional.With the change to
u_ref_local = max(u_mag, 1e-10), feature 5 (u_mag / u_ref_local) will essentially always be 1.0 for any non-stagnant flow. This reduces feature informativeness. The comment states this matches CPU behavior, so verify this is the intended normalization approach.src/turbulence_baseline.cpp (1)
48-49: Good refactor to use centralized strain magnitude helper.Using
strain_magnitude_frobeniusfromgpu_utils.hppimproves code consistency across the codebase.include/turbulence_transport.hpp (1)
267-268: New state fields for CPU/GPU consistency.The addition of
k_old_,omega_old_, andwall_dist_supports the unified Jacobi iteration approach. These enable identical arithmetic between CPU and GPU paths by using snapshot arrays rather than in-place updates during the parallel loop.include/gpu_utils.hpp (1)
21-61: Well-designed GPU-compatible strain rate helpers.Good addition of centralized, documented strain rate functions. The separation between Frobenius norm, simple magnitude, and squared variants is useful, and the 3D versions provide good coverage. Proper use of
#pragma omp declare targetensures GPU compatibility.src/turbulence_transport.cpp (4)
26-50: Good addition of device-callable helper functions.The
dmin,dmax,dclamp, anddmax0helpers provide GPU-compatible alternatives tostd::min/std::maxsince standard library functions may not be available in GPU target regions. These are properly placed within the#pragma omp declare targetblock.
239-244: Pre-computed wall distance for unified CPU/GPU kernel.Good approach to pre-compute wall distances once during initialization, avoiding repeated mesh queries in the hot loop.
545-571: Well-designed device-callable F1 blending function.The
compute_F1_localhelper is a pure function that computes the SST blending function using only local cell data. This design ensures identical arithmetic between CPU and GPU paths.
938-975: Unified closure kernel is correct.The
update()function's closure computation is embarrassingly parallel since each cell'snu_tdepends only on local values. The unified CPU/GPU kernel correctly handles this case.
| // GPU: use device pointers (parallel loop = Jacobi automatically) | ||
| size_t u_total_size = (size_t)mesh.total_Ny() * (mesh.total_Nx() + 1); | ||
| size_t v_total_size = (size_t)(mesh.total_Ny() + 1) * mesh.total_Nx(); | ||
| u_ptr = device_view->u_face; | ||
| v_ptr = device_view->v_face; | ||
| k_read_ptr = device_view->k; | ||
| omega_read_ptr = device_view->omega; | ||
| k_write_ptr = device_view->k; | ||
| omega_write_ptr = device_view->omega; | ||
| nu_t_ptr = device_view->nu_t; | ||
| wall_dist_ptr = device_view->wall_distance; | ||
| u_stride_local = device_view->u_stride; | ||
| v_stride_local = device_view->v_stride; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: GPU path reads and writes to same buffer, causing race condition.
The GPU path sets k_read_ptr = device_view->k and k_write_ptr = device_view->k (same for omega). In a parallel loop, this means cells may read updated values from neighboring cells depending on execution order, leading to non-deterministic results.
The CPU path correctly uses snapshot arrays (k_old_, omega_old_) for Jacobi iteration, but the GPU path does not. Either:
- Allocate separate read/write buffers for GPU, or
- Use
device_view's existing structure to provide separate input/output buffers
🔎 Conceptual fix
The GPU path should use separate read and write buffers, similar to the CPU path:
// GPU: use device pointers (parallel loop = Jacobi automatically)
+ // Need separate read buffers for true Jacobi iteration
+ // Option 1: Use k_old/omega_old device buffers if available
+ // Option 2: Double-buffer scheme with device_view
u_ptr = device_view->u_face;
v_ptr = device_view->v_face;
- k_read_ptr = device_view->k;
- omega_read_ptr = device_view->omega;
+ k_read_ptr = device_view->k_prev; // Separate snapshot buffer
+ omega_read_ptr = device_view->omega_prev;
k_write_ptr = device_view->k;
omega_write_ptr = device_view->omega;Committable suggestion skipped: line range outside the PR's diff.
| const int idx_ip2 = jj * cell_stride + (ii + 2); | ||
| const int idx_im2 = jj * cell_stride + (ii - 2); | ||
| const int idx_jp2 = (jj + 2) * cell_stride + ii; | ||
| const int idx_jm2 = (jj - 2) * cell_stride + ii; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for Nghost/Ng definitions and usage
rg -n "Nghost|Ng\s*=" --type cpp -A 2 -B 2Repository: comp-physics/cfd-nn
Length of output: 26078
🏁 Script executed:
# Check the specific file and context around lines 716-719
cat -n src/turbulence_transport.cpp | sed -n '700,830p'Repository: comp-physics/cfd-nn
Length of output: 10316
🏁 Script executed:
# Look for mesh initialization and ghost cell configuration
rg -n "ghost" --type cpp -i | head -30Repository: comp-physics/cfd-nn
Length of output: 2582
🏁 Script executed:
# Check if there's validation or enforcement of Nghost >= 2
rg -n "Nghost\s*[<>=]+" --type cpp -B 2 -A 2Repository: comp-physics/cfd-nn
Length of output: 396
🏁 Script executed:
# Check test setups to see what Nghost values are used
rg -n "Nghost\s*=" tests/ --type cpp | head -20Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check mesh initialization to see if Nghost has constraints
rg -n "class Mesh|Nghost" include/mesh.hpp -A 5 -B 2Repository: comp-physics/cfd-nn
Length of output: 1933
🏁 Script executed:
# Check how the transport update function is called and if there are guards
rg -n "update_transport_rk2_sst_internal" --type cpp -B 3 -A 3Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check if Nghost is validated anywhere in solver or mesh setup
rg -n "assert.*Nghost|Nghost.*assert|Nghost.*<|Nghost.*>" --type cppRepository: comp-physics/cfd-nn
Length of output: 488
🏁 Script executed:
# Check the exact loop bounds to confirm the analysis
sed -n '703,710p' src/turbulence_transport.cppRepository: comp-physics/cfd-nn
Length of output: 390
🏁 Script executed:
# Check if the function is always called with Nghost >= 2
rg -n "update_transport_rk2_sst" src/solver.cpp -B 5 -A 5Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check default mesh creation to see if Nghost is always set to 2
rg -n "Mesh\(" --type cpp -A 3 | head -40Repository: comp-physics/cfd-nn
Length of output: 682
🏁 Script executed:
# Check Mesh constructor to see what default Nghost value it uses
rg -n "Mesh::" include/mesh.hpp -A 10 | head -50Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check if there's a Mesh() default constructor defined
rg -n "Mesh()" --type cpp -A 3 -B 1 | head -40Repository: comp-physics/cfd-nn
Length of output: 2171
🏁 Script executed:
# Check what happens in actual solver initialization
rg -n "initialize\|setup" src/solver.cpp -A 2 | grep -i "mesh\|nghost" -A 2 -B 2 | head -30Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Find Mesh constructor definition in mesh.hpp
rg -n "Mesh(" include/mesh.hpp -B 2 -A 8 | head -60Repository: comp-physics/cfd-nn
Length of output: 123
🏁 Script executed:
# Check mesh.cpp for any default Mesh constructor
rg -n "Mesh::" src/mesh.cpp | head -20Repository: comp-physics/cfd-nn
Length of output: 701
🏁 Script executed:
# Check the create_uniform factory method
sed -n '211,230p' src/mesh.cppRepository: comp-physics/cfd-nn
Length of output: 407
🏁 Script executed:
# Check if there's a default Mesh() constructor
grep -n "Mesh::Mesh\|Mesh()" include/mesh.hppRepository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Look at the complete Mesh class definition for constructors
sed -n '1,50p' include/mesh.hppRepository: comp-physics/cfd-nn
Length of output: 1837
🏁 Script executed:
# Check if Nghost is initialized anywhere by default in Mesh struct
grep -n "Nghost\s*=" include/mesh.hppRepository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check actual usages in tests to see what Nghost values are used
rg "create_uniform\|init_uniform" tests/ --type cpp -A 1 | head -40Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Search for actual mesh creation in tests with Nghost values
rg -n "Mesh.*create_uniform\|init_uniform" tests/ --type cpp -B 2 -A 1 | head -50Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Check test files more directly
find tests -name "*.cpp" -exec grep -l "Mesh\|create_uniform\|Nghost" {} \;Repository: comp-physics/cfd-nn
Length of output: 1089
🏁 Script executed:
# Look at a specific test file to see mesh setup
head -100 tests/test_mesh.cppRepository: comp-physics/cfd-nn
Length of output: 2784
🏁 Script executed:
# Verify the exact init_uniform signature and default Nghost
sed -n '31,50p' src/mesh.cppRepository: comp-physics/cfd-nn
Length of output: 725
🏁 Script executed:
# Check what actual tests use for mesh initialization
rg -n "init_uniform|create_uniform" tests/test_solver_cpu_gpu.cpp -B 2 -A 2 | head -30Repository: comp-physics/cfd-nn
Length of output: 1056
🏁 Script executed:
# Check if there's a default parameter for nghost in the header
sed -n '1,150p' include/mesh.hpp | grep -n "init_uniform"Repository: comp-physics/cfd-nn
Length of output: 154
🏁 Script executed:
# Check the exact function signature in mesh.hpp
rg -n "init_uniform" include/mesh.hpp -A 3Repository: comp-physics/cfd-nn
Length of output: 460
🏁 Script executed:
# Verify what nghost value is passed when omitted - check if compilation even works
rg -n "mesh.init_uniform.*;" tests/test_solver_cpu_gpu.cpp -A 1 -B 1 | head -20Repository: comp-physics/cfd-nn
Length of output: 45
🏁 Script executed:
# Confirm the 3D version also has default nghost = 1
sed -n '137,142p' include/mesh.hppRepository: comp-physics/cfd-nn
Length of output: 292
Verify ghost cell count is sufficient for 2-cell stencil.
The indices idx_ip2, idx_im2, idx_jp2, idx_jm2 at lines 716-719 access cells two positions away for computing gradients at neighbor cells (lines 770-806). This requires at least Ng >= 2 ghost cells. However, the default nghost parameter in init_uniform() is 1 (mesh.hpp lines 134, 140), and test code calls mesh.init_uniform() without passing the nghost argument (e.g., test_solver_cpu_gpu.cpp lines 150, 263, 383), which means the code runs with only 1 ghost cell by default. With Ng=1, accessing ii+2 or ii-2 will cause out-of-bounds access since the valid range is only [0, Nx+1] but ii+2 reaches Nx+2.
User description
Summary
Changes
Test plan
🤖 Generated with Claude Code
PR Type
Tests, Enhancement
Description
Add comprehensive golden file regression tests for numerical reproducibility
Unify CPU/GPU SST k-ω transport implementation with identical arithmetic
Add strain rate magnitude helper functions to gpu_utils.hpp
Remove orphan functions from multiple files (Phase 1 cleanup)
Diagram Walkthrough
File Walkthrough
3 files
Add comprehensive golden file regression test suiteAdd script to regenerate golden reference filesAdd minimal trace test for CPU/GPU transport debugging4 files
Unify CPU/GPU SST k-ω with identical arithmetic kernelAdd snapshot fields for Jacobi iteration consistencyAdd GPU-callable strain rate magnitude helper functionsUse strain magnitude helper for mixing length model1 files
Fix MLP feature computation to match CPU reference velocity12 files
Remove unused min_interior method from ScalarFieldRemove unused scale_input method from MLP classRemove unused sor_iteration method from PoissonSolverRemove unused print_velocity_profile method from RANSSolverRemove orphan timing utility functionsRemove orphan timing utility declarationsRemove min_interior implementationRemove sor_iteration implementationRemove print_velocity_profile implementationRemove scale_input implementationRemove orphan turbulence GEP functionsRemove orphan turbulence GEP declarations3 files
Add golden test suite execution and run_golden_test helperAdd test_golden executable and 6 parameterized test casesReduce time step and add perturbation amplitude parameterSummary by CodeRabbit
New Features
Improvements
Configuration Updates
✏️ Tip: You can customize this high-level summary in your review settings.