-
Notifications
You must be signed in to change notification settings - Fork 1
Fix CPU/GPU consistency tests and rename ci script #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The cross-build tests were silently skipping when only one build existed, and would compare CPU-to-CPU when GPU was unavailable. Now: - Builds both CPU and GPU versions automatically for cross-build tests - Fails (not skips) if GPU is unavailable when required - Always generates fresh CPU reference before GPU comparison - Renamed run_ci_local.sh -> ci.sh (used by both local and CI pipelines) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Following the SolverDeviceView pattern, make get_device_view() return valid host pointers in CPU builds instead of an invalid empty view. This is the tap root change that enables future unification of CPU/GPU turbulence model code paths. The change is backwards-compatible: - GPU builds: unchanged behavior (returns GPU-mapped pointers) - CPU builds: now returns valid host pointers - NN models: unaffected (their GPU checks are inside #ifdef USE_GPU_OFFLOAD) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inlined GPU computation with call to mixing_length_cell_kernel() which was already defined with #pragma omp declare target but wasn't being used. This eliminates ~15 lines of duplicate code and ensures GPU and CPU paths use identical kernel logic. The mixing_length_cell_kernel() function (lines 34-64) computes: - Strain rate magnitude from velocity gradients - y+ and van Driest damping - Mixing length with delta/2 cap - Final eddy viscosity 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Following the unified kernel pattern, extract the GEP eddy viscosity computation into a single gep_cell_kernel() function with #pragma omp declare target. Both GPU and CPU paths now call the same kernel function, eliminating duplicated logic. Key changes: - Add gep_cell_kernel() with unified GEP variant logic - GPU path: replace inlined code with kernel call - CPU path: replace inlined code with kernel call (using local wall_distance buffer for flat array access) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract the common EARSM output computation (tensor basis, Reynolds stresses, eddy viscosity) into a unified earsm_compute_output() kernel with #pragma omp declare target. All three EARSM variants (WJ, GS, Pope) now call this shared kernel after computing model-specific β coefficients. Key changes: - Add earsm_compute_output() unified kernel (~60 lines) - WJ kernel: replace 60 lines of inlined output with kernel call - GS kernel: replace 50 lines of inlined output with kernel call - Pope kernel: replace 120 lines (including debug logging) with kernel call - Remove unused debug logging infrastructure and includes - Net reduction: ~130 lines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add omp_is_initial_device() check to verify target regions execute on GPU - Tighten warning threshold from exact 0.0 to 1e-14 (below machine epsilon) - Print GPU device count and verification status This catches cases where GPU offload compiles but silently produces identical results to CPU, which would indicate something is wrong. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract sst_transport_cell_kernel() with #pragma omp declare target that computes all SST physics (gradients, F1 blending, production, advection, diffusion, cross-diffusion, time integration) for a single cell. Both GPU and CPU paths now call this unified kernel. Benefits: - Single source of truth for SST transport physics - Eliminates multi-pass CPU approach (separate gradient, blending, production, advection, diffusion loops) - CPU and GPU use identical computation logic - Easier to maintain and verify correctness The unified kernel uses simple central-difference diffusion (same as previous GPU path) rather than face-averaged coefficients. This is acceptable per user guidance prioritizing unified logic over micro- optimizations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughReplaced local CI runner references with Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer / CI Trigger
participant CI as scripts/ci.sh
participant BuilderCPU as build_cpu
participant BuilderGPU as build_gpu
participant GPU as GPU Device
participant Comparator as Comparator / Tests
rect rgba(0,128,96,0.06)
Note over CI: CI starts (local or workflow)
end
Dev->>CI: invoke ./scripts/ci.sh [--cpu / tests...]
CI->>CI: check_gpu_available()
alt GPU available
CI->>BuilderCPU: ensure_build(build_cpu, offload=OFF)
CI->>BuilderGPU: ensure_build(build_gpu, offload=ON)
CI->>BuilderCPU: run tests → generate cpu_reference
BuilderCPU-->>CI: cpu_reference artifacts
CI->>BuilderGPU: run tests on GPU (use cpu_reference)
BuilderGPU->>GPU: offloaded kernels execute
BuilderGPU-->>CI: gpu outputs
CI->>Comparator: compare gpu outputs vs cpu_reference
Comparator-->>CI: pass/fail + diffs
else No GPU
CI-->>Dev: skip/fail GPU cross-build tests (based on flags)
end
CI-->>Dev: final status, logs, artifacts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/turbulence_baseline.cpp (1)
332-354: CPU path does not use the unified kernel, risking code drift.The CPU path still has inline computation instead of calling
mixing_length_cell_kernel. While the logic appears equivalent, this defeats the purpose of the "unified kernel" approach. If either path is modified, the other won't automatically get the fix.Consider refactoring the CPU path to also call the unified kernel for true consistency:
🔎 Suggested refactor
+ const int stride = mesh.total_Nx(); + double* nu_t_ptr = nu_t.data().data(); + const double* dudx_ptr = dudx_.data().data(); + const double* dudy_ptr = dudy_.data().data(); + const double* dvdx_ptr = dvdx_.data().data(); + const double* dvdy_ptr = dvdy_.data().data(); + // CPU path for (int j = mesh.j_begin(); j < mesh.j_end(); ++j) { for (int i = mesh.i_begin(); i < mesh.i_end(); ++i) { - double y_wall = mesh.wall_distance(i, j); - double y_plus = y_wall * u_tau / nu_; - - // Use same formulation as GPU (single expression) - double damping = 1.0 - std::exp(-y_plus / A_plus_); - - double l_mix = kappa_ * y_wall * damping; - // Use same min operation as GPU - if (l_mix > 0.5 * delta_) { - l_mix = 0.5 * delta_; - } - - double Sxx = dudx_(i, j); - double Syy = dvdy_(i, j); - double Sxy = 0.5 * (dudy_(i, j) + dvdx_(i, j)); - double S_mag = std::sqrt(2.0 * (Sxx*Sxx + Syy*Syy + 2.0*Sxy*Sxy)); - - nu_t(i, j) = l_mix * l_mix * S_mag; + const int cell_idx = j * stride + i; + mixing_length_cell_kernel( + cell_idx, u_tau, nu_, + kappa_, A_plus_, delta_, + mesh.wall_distance(i, j), + dudx_ptr, dudy_ptr, dvdx_ptr, dvdy_ptr, + nu_t_ptr); } }src/turbulence_earsm.cpp (1)
204-340: CPU path does not use the unified output kernel.The CPU implementation in
EARSMClosure::compute_nu_tstill computes tensor basis and Reynolds stresses inline rather than callingearsm_compute_output. While the GPU-specific variants (WJ, GS, Pope) use the unified kernel, the base CPU path does not. This creates potential for drift between CPU and GPU implementations.Consider refactoring to use
earsm_compute_outputin the CPU path as well, or at minimum document why the paths differ.scripts/ci.sh (1)
104-143: Robust build helper with proper error handling.The
ensure_buildfunction handles configuration and build with appropriate error reporting. However, the static analysis tool flagged line 116:🔎 Address SC2155: Declare and assign separately
- local orig_dir=$(pwd) + local orig_dir + orig_dir=$(pwd)This prevents masking the return value of
pwdif it were to fail (though unlikely).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.claude/rules.md.github/scripts/gpu_ci_correctness.sbatch.template.github/workflows/ci.ymlscripts/ci.shsrc/solver.cppsrc/turbulence_baseline.cppsrc/turbulence_earsm.cppsrc/turbulence_gep.cppsrc/turbulence_transport.cpptests/test_cpu_gpu_bitwise.cpptests/test_poisson_cpu_gpu_3d.cpp
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,cc,cxx}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx}: Do not add platform-specific tolerances with#ifdef __APPLE__or similar platform checks in tests
Include code comments explaining WHY, not WHAT; document numerical algorithms; note non-obvious optimizations
Files:
tests/test_cpu_gpu_bitwise.cppsrc/turbulence_gep.cppsrc/turbulence_earsm.cppsrc/solver.cpptests/test_poisson_cpu_gpu_3d.cppsrc/turbulence_transport.cppsrc/turbulence_baseline.cpp
**/*.{cpp,cc,cxx,h,hpp}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx,h,hpp}: Avoid overly strict floating-point comparisons (e.g.,==for doubles); use appropriate tolerances based on algorithm
Fix all compiler warnings before pushing; use[[maybe_unused]]for intentionally unused variables in assertions
Useconstfor read-only references and mark methodsconstif they don't modify state
Use RAII for resource management; use smart pointers or RAII wrappers instead of manualnew/deleteormalloc/free
UseOmpDeviceBufferwrapper for GPU buffer management instead of manual allocation
Check return values and use exceptions for error conditions with informative error messages
Files:
tests/test_cpu_gpu_bitwise.cppsrc/turbulence_gep.cppsrc/turbulence_earsm.cppsrc/solver.cpptests/test_poisson_cpu_gpu_3d.cppsrc/turbulence_transport.cppsrc/turbulence_baseline.cpp
**/*.{sh,slurm,batch}
📄 CodeRabbit inference engine (.cursor/rules/lean-changes.mdc)
Always submit slurm jobs using the embers QOS
Files:
scripts/ci.sh
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Use `OmpDeviceBuffer` wrapper for GPU buffer management instead of manual allocation
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Applied to files:
.github/workflows/ci.yml.claude/rules.mdtests/test_cpu_gpu_bitwise.cppscripts/ci.sh.github/scripts/gpu_ci_correctness.sbatch.templatetests/test_poisson_cpu_gpu_3d.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
Applied to files:
.github/workflows/ci.yml.claude/rules.mdscripts/ci.sh.github/scripts/gpu_ci_correctness.sbatch.template
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Create feature branches from latest main; run `./test_before_ci.sh` to verify starting state before development
Applied to files:
.claude/rules.mdscripts/ci.sh
📚 Learning: 2025-12-25T20:38:56.549Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursor/rules/lean-debug.mdc:0-0
Timestamp: 2025-12-25T20:38:56.549Z
Learning: For GitHub CI debugging, run `./tools/ci_packet.sh <runid>` via terminal tool.
Applied to files:
.claude/rules.mdscripts/ci.sh
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Add tests for new features; update tests when changing behavior; don't commit broken tests
Applied to files:
.claude/rules.md
📚 Learning: 2025-12-25T20:38:56.549Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursor/rules/lean-debug.mdc:0-0
Timestamp: 2025-12-25T20:38:56.549Z
Learning: For local debugging, run `./tools/local_packet.sh <command...>` via terminal tool.
Applied to files:
.claude/rules.md
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx} : Do not add platform-specific tolerances with `#ifdef __APPLE__` or similar platform checks in tests
Applied to files:
.claude/rules.mdtests/test_cpu_gpu_bitwise.cpptests/test_poisson_cpu_gpu_3d.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Fix all compiler warnings before pushing; use `[[maybe_unused]]` for intentionally unused variables in assertions
Applied to files:
.claude/rules.mdtests/test_cpu_gpu_bitwise.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Use `OmpDeviceBuffer` wrapper for GPU buffer management instead of manual allocation
Applied to files:
tests/test_cpu_gpu_bitwise.cppsrc/turbulence_earsm.cppsrc/solver.cpptests/test_poisson_cpu_gpu_3d.cppsrc/turbulence_transport.cppsrc/turbulence_baseline.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Avoid overly strict floating-point comparisons (e.g., `==` for doubles); use appropriate tolerances based on algorithm
Applied to files:
tests/test_cpu_gpu_bitwise.cpptests/test_poisson_cpu_gpu_3d.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Explicitly upload data to GPU; check `USE_GPU_OFFLOAD` is defined; verify `omp_get_num_devices() > 0` at runtime for GPU offload code
Applied to files:
tests/test_cpu_gpu_bitwise.cppsrc/turbulence_gep.cppsrc/turbulence_earsm.cpptests/test_poisson_cpu_gpu_3d.cppsrc/turbulence_transport.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_physics_validation.cpp : Navier-Stokes solver validation tests must verify: viscous terms, pressure gradient, parabolic velocity profiles, divergence-free constraint, momentum balance, channel symmetry, cross-model consistency, and numerical stability
Applied to files:
src/turbulence_gep.cpptests/test_poisson_cpu_gpu_3d.cppsrc/turbulence_transport.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_tg_validation.cpp : Taylor-Green Vortex test must validate energy decay with <5% error over 100 timesteps using viscous terms, time integration, and periodic BCs
Applied to files:
tests/test_poisson_cpu_gpu_3d.cpp
🧬 Code graph analysis (3)
tests/test_cpu_gpu_bitwise.cpp (1)
tests/test_poisson_cpu_gpu_3d.cpp (1)
cout(160-169)
src/turbulence_gep.cpp (3)
include/gpu_utils.hpp (5)
USE_GPU_OFFLOAD(88-95)USE_GPU_OFFLOAD(98-104)USE_GPU_OFFLOAD(107-113)USE_GPU_OFFLOAD(116-123)USE_GPU_OFFLOAD(188-203)include/turbulence_model.hpp (8)
nu(189-189)nu(189-189)mesh(132-140)mesh(161-177)mesh(161-169)mesh(184-184)mesh(184-184)nu_(190-190)src/features.cpp (2)
compute_gradients_from_mac(26-59)compute_gradients_from_mac(26-30)
src/solver.cpp (2)
include/solver.hpp (5)
velocity_(120-120)velocity_(127-127)k_(123-123)omega_(124-124)nu_t_(122-122)include/fields.hpp (2)
mesh_(47-47)mesh_(127-127)
🪛 Shellcheck (0.11.0)
scripts/ci.sh
[warning] 116-116: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (23)
src/solver.cpp (1)
4438-4477: LGTM - CPU build now provides valid device view for consistency tests.The implementation correctly mirrors the GPU path by returning a fully populated
TurbulenceDeviceViewwith host pointers. All required fields are set consistently with the GPU version at lines 4268-4308, and theconst_castpattern matches the existingget_solver_view()implementation. This enables CPU/GPU consistency testing without conditional code paths in turbulence models.src/turbulence_baseline.cpp (1)
265-282: GPU path correctly uses the unified kernel.The GPU path now calls
mixing_length_cell_kernelwhich consolidates the per-cell computation. This is a good refactor for maintainability and CPU/GPU consistency.tests/test_poisson_cpu_gpu_3d.cpp (3)
23-26: Good addition of GPU verification includes and threshold constant.The conditional
#include <omp.h>andMIN_EXPECTED_DIFFconstant are well-placed. The threshold of1e-14is appropriately above machine epsilon, providing a reasonable heuristic to detect if CPU and GPU are running identical code paths.Also applies to: 32-34
274-293: Excellent GPU execution verification.This proactive verification catches two critical failure modes:
- No GPU devices available (driver issues, misconfiguration)
- GPU present but offload silently falling back to host
The use of
omp_is_initial_device()inside a target region is the correct way to detect host fallback. Based on learnings, verifyingomp_get_num_devices() > 0at runtime is the recommended practice for GPU offload code.
361-365: Threshold-based warning is appropriate.Replacing the exact-zero comparison with a threshold check is correct. This aligns with coding guidelines to avoid overly strict floating-point comparisons. The warning-but-pass behavior is suitable since suspiciously small diffs don't necessarily indicate a bug.
tests/test_cpu_gpu_bitwise.cpp (3)
25-27: Consistent GPU include guards and threshold constant.Matches the pattern in
test_poisson_cpu_gpu_3d.cpp, maintaining consistency across the test suite.Also applies to: 35-37
301-320: GPU verification matches the 3D Poisson test pattern.Consistent implementation of the GPU availability and execution verification across test files is good for maintainability.
371-376: Threshold-based warnings applied consistently across all field comparisons.Each velocity component (u, v, w) and pressure uses the same
MIN_EXPECTED_DIFFthreshold for detecting suspiciously small differences. This uniform approach ensures consistent behavior across all field comparisons.Also applies to: 396-398, 421-423, 446-448
src/turbulence_gep.cpp (2)
35-100: Well-designed unified GEP kernel.The kernel is properly annotated with
#pragma omp declare targetguards and handles all the per-cell GEP computation including variant selection, van Driest damping, and eddy viscosity calculation. The inline function approach allows the compiler to optimize while maintaining code consistency.
283-318: CPU path correctly uses the unified kernel.Unlike
turbulence_baseline.cpp, this file properly callsgep_cell_kernelfrom both GPU and CPU paths. The wall distance buffer creation (lines 299-308) correctly adapts the kernel's pointer-based interface for CPU use. This is the right approach for maintaining CPU/GPU consistency.src/turbulence_transport.cpp (3)
55-185: Comprehensive SST transport cell kernel.The kernel correctly implements the full SST k-ω physics: velocity gradients, F1 blending, production limiting, upwind advection, central diffusion, cross-diffusion, and time integration. While the function signature is lengthy (24+ parameters), this is inherent to the SST formulation and clearly documented with doxygen comments.
726-776: GPU path uses the unified kernel correctly.The GPU kernel loop properly calculates all necessary indices and calls
sst_transport_cell_kernel. The output values are written back to the device arrays immediately after the kernel call.
799-890: CPU path mirrors GPU path structure.The CPU implementation correctly uses the same unified kernel, ensuring bit-for-bit consistency with the GPU path. The wall distance buffer is populated for the full domain (including ghost cells via the loop bounds on lines 821-826), matching the kernel's expectations.
src/turbulence_earsm.cpp (2)
707-793: Good unified EARSM output kernel.The
earsm_compute_outputfunction consolidates the common output stage across all EARSM variants (WJ, GS, Pope). Key features:
- Re_t-based blending to fade nonlinear terms
- Tensor basis construction (S*, [S*,Ω*], S*²)
- Reynolds stress computation
- Equivalent eddy viscosity with fallback for near-zero Sxy
- Defensive NaN check (line 785)
This reduces code duplication in the GPU kernels.
880-885: All GPU EARSM kernels now use the unified output function.The Wallin-Johansson, Gatski-Speziale, and Pope GPU kernels all call
earsm_compute_outputfor the final stage, ensuring consistent output computation across variants.Also applies to: 971-976, 1041-1045
scripts/ci.sh (3)
91-100: Good GPU availability check.The
check_gpu_availablefunction correctly usesnvidia-smito detect GPU presence. The double-check pattern (command exists AND succeeds) handles edge cases where nvidia-smi is installed but no GPU is present.
238-351: Comprehensive cross-build test implementation.The
run_cross_build_testfunction properly:
- Checks GPU availability and fails (not skips) if unavailable
- Uses build caching to avoid redundant builds
- Generates CPU reference first, then runs GPU comparison
- Uses
OMP_TARGET_OFFLOAD=MANDATORYto prevent silent CPU fallback- Reports detailed output on failure
The fail-not-skip behavior on missing GPU is intentional and correct for ensuring actual GPU testing.
179-233: Extended run_test with environment prefix support.The optional
env_prefixparameter (line 183) enables per-test environment variable injection, used forOMP_TARGET_OFFLOAD=MANDATORYon GPU utilization tests. This is a clean extension of the existing function..github/workflows/ci.yml (1)
43-45: CI workflow updated to use renamed script.The workflow correctly references the new
ci.shscript with the--cpuflag for CPU-only testing. This aligns with the script rename fromrun_ci_local.shtoci.sh..claude/rules.md (3)
241-241: LGTM! All script references consistently updated.All references to the CI script have been systematically updated from
run_ci_local.shtoci.shacross workflow guidelines, quick reference, and emergency procedures sections. The changes maintain consistency throughout the documentation.Also applies to: 254-254, 315-315, 349-349, 359-359
474-477: LGTM! Documentation reflects enhanced CI script functionality.The updated documentation correctly captures the new GPU-aware capabilities of
ci.sh:
- Auto-detection of GPU availability
- Explicit CPU-only mode (
--cpu)- GPU-specific test mode (
gpu)This aligns with the PR objective to implement "GPU-aware cross-build testing" and represents a meaningful enhancement over the previous script.
100-101: Documentation is correct; the referenced scripts exist in the repository.The scripts
./scripts/ci.shand./scripts/run_gpu_ci_test.share both present and correctly documented in .claude/rules.md. The inconsistency noted was with outdated learnings that referencetest_before_ci.sh, which no longer exists. The documentation accurately reflects the current repository structure..github/scripts/gpu_ci_correctness.sbatch.template (1)
58-62: Script rename verified and complete.The updates correctly reference the renamed CI script
ci.sh. The invocation pattern (./scripts/ci.sh full) is valid, the script exists and is executable, and no stale references to the oldrun_ci_local.shremain in the codebase.
…kernel Address reviewer feedback: CPU path in MixingLengthModel::update() now calls the same mixing_length_cell_kernel() used by GPU path, eliminating potential for drift between implementations. Changes: - Add <vector> include for wall_distance buffer - Replace inline computation loop with unified kernel call - Use same cell indexing pattern as GPU for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/turbulence_baseline.cpp (2)
345-354: Cache the wall distance buffer to avoid repeated allocations.The CPU path allocates and populates
wall_dist_bufon everyupdate()call, while the GPU path pre-computes and cachesy_wall_flat_as a member variable (lines 120, 122-128). This creates unnecessary allocations in CPU-only builds.Consider caching the wall distance buffer as a member variable (similar to
dudx_,dudy_, etc.) to match the GPU optimization and improve CPU performance.💡 Proposed refactor to cache wall distance buffer
Add a member variable to cache the wall distance buffer in the header file:
// In turbulence_baseline.hpp, add to MixingLengthModel class: std::vector<double> wall_dist_cpu_; // Cached wall distance buffer for CPU pathThen update the CPU path to lazy-initialize once:
- // Create wall_distance buffer for unified kernel - const size_t total_cells = (size_t)mesh.total_Nx() * mesh.total_Ny(); - std::vector<double> wall_dist_buf(total_cells, 0.0); - for (int j = 0; j < mesh.total_Ny(); ++j) { - for (int i = 0; i < mesh.total_Nx(); ++i) { - const int idx = j * stride + i; - wall_dist_buf[idx] = mesh.wall_distance(i, j); + // Create wall_distance buffer for unified kernel (lazy initialization) + const int total_cells = mesh.total_Nx() * mesh.total_Ny(); + if (wall_dist_cpu_.empty() || static_cast<int>(wall_dist_cpu_.size()) != total_cells) { + wall_dist_cpu_.resize(total_cells, 0.0); + for (int j = 0; j < mesh.total_Ny(); ++j) { + for (int i = 0; i < mesh.total_Nx(); ++i) { + const int idx = j * stride + i; + wall_dist_cpu_[idx] = mesh.wall_distance(i, j); + } } } - const double* wall_dist_ptr = wall_dist_buf.data(); + const double* wall_dist_ptr = wall_dist_cpu_.data();
346-346: Minor: Inconsistent integer type fortotal_cells.This line uses
size_twhile the rest of the codebase consistently usesintfor cell counts (e.g., lines 82, 104, 115, 150, 229, 250, 334-335).🔎 Proposed fix
- const size_t total_cells = (size_t)mesh.total_Nx() * mesh.total_Ny(); + const int total_cells = mesh.total_Nx() * mesh.total_Ny();
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/turbulence_baseline.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,cc,cxx}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx}: Do not add platform-specific tolerances with#ifdef __APPLE__or similar platform checks in tests
Include code comments explaining WHY, not WHAT; document numerical algorithms; note non-obvious optimizations
Files:
src/turbulence_baseline.cpp
**/*.{cpp,cc,cxx,h,hpp}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx,h,hpp}: Avoid overly strict floating-point comparisons (e.g.,==for doubles); use appropriate tolerances based on algorithm
Fix all compiler warnings before pushing; use[[maybe_unused]]for intentionally unused variables in assertions
Useconstfor read-only references and mark methodsconstif they don't modify state
Use RAII for resource management; use smart pointers or RAII wrappers instead of manualnew/deleteormalloc/free
UseOmpDeviceBufferwrapper for GPU buffer management instead of manual allocation
Check return values and use exceptions for error conditions with informative error messages
Files:
src/turbulence_baseline.cpp
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-and-test (ubuntu-latest, Release)
- GitHub Check: build-and-test (ubuntu-latest, Debug)
- GitHub Check: gpu-tests
🔇 Additional comments (3)
src/turbulence_baseline.cpp (3)
23-23: LGTM: Header addition supports CPU buffer allocation.The
<vector>include is required for thewall_dist_bufallocation in the CPU path (line 347).
266-284: LGTM: GPU path successfully uses unified kernel.The GPU computation path now invokes the shared
mixing_length_cell_kernel, eliminating code duplication while maintaining the same computational logic.
369-375: LGTM: CPU path correctly invokes unified kernel.The CPU execution path successfully uses the same
mixing_length_cell_kernelas the GPU path, achieving the PR objective of eliminating code duplication while maintaining computational consistency.
Extract unified cell kernels for all EARSM variants (WJ, GS, Pope) with #pragma omp declare target. Both GPU and CPU paths now call the same kernel functions, eliminating potential for drift. Added kernels: - earsm_wj_cell_kernel(): Wallin-Johansson cubic solve + beta - earsm_gs_cell_kernel(): Gatski-Speziale regularization + beta - earsm_pope_cell_kernel(): Pope regularized model + beta All kernels call earsm_compute_output() for tensor basis → output. Host path added to #else blocks so CPU builds use same logic as GPU. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/turbulence_earsm.cpp (3)
735-736: Clarify that beta2 and beta3 modification is local.The blending factors
beta2andbeta3are modified in-place. Since they are passed by value, this doesn't affect the caller, but adding a brief comment would improve readability.🔎 Optional clarification comment
// Apply Re_t blending: keep linear term (beta1), fade nonlinear terms (beta2, beta3) + // Note: beta2/beta3 are pass-by-value, so modification is local to this function beta2 *= alpha; beta3 *= alpha;
785-785: Consider usingstd::isfinite()for NaN check.The NaN check
(nu_t_loc == nu_t_loc)works but is less readable thanstd::isfinite(nu_t_loc). If this pattern is intentional for GPU compatibility, consider adding a comment explaining why.🔎 More readable alternative
- nu_t_loc = (nu_t_loc == nu_t_loc) ? nu_t_loc : 0.0; // NaN check + nu_t_loc = std::isfinite(nu_t_loc) ? nu_t_loc : 0.0; // NaN/Inf check
799-862: LGTM! Consider extracting common Re_t blending logic.The WJ cell kernel correctly implements the Wallin-Johansson model with proper regularization and clamping. The Re_t computation and alpha blending (lines 820-823) are duplicated across all three EARSM variants, which is acceptable for performance but could be extracted into a helper if maintainability becomes a concern.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/turbulence_earsm.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,cc,cxx}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx}: Do not add platform-specific tolerances with#ifdef __APPLE__or similar platform checks in tests
Include code comments explaining WHY, not WHAT; document numerical algorithms; note non-obvious optimizations
Files:
src/turbulence_earsm.cpp
**/*.{cpp,cc,cxx,h,hpp}
📄 CodeRabbit inference engine (.cursorrules)
**/*.{cpp,cc,cxx,h,hpp}: Avoid overly strict floating-point comparisons (e.g.,==for doubles); use appropriate tolerances based on algorithm
Fix all compiler warnings before pushing; use[[maybe_unused]]for intentionally unused variables in assertions
Useconstfor read-only references and mark methodsconstif they don't modify state
Use RAII for resource management; use smart pointers or RAII wrappers instead of manualnew/deleteormalloc/free
UseOmpDeviceBufferwrapper for GPU buffer management instead of manual allocation
Check return values and use exceptions for error conditions with informative error messages
Files:
src/turbulence_earsm.cpp
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_physics_validation.cpp : Navier-Stokes solver validation tests must verify: viscous terms, pressure gradient, parabolic velocity profiles, divergence-free constraint, momentum balance, channel symmetry, cross-model consistency, and numerical stability
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Explicitly upload data to GPU; check `USE_GPU_OFFLOAD` is defined; verify `omp_get_num_devices() > 0` at runtime for GPU offload code
Applied to files:
src/turbulence_earsm.cpp
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to **/*.{cpp,cc,cxx,h,hpp} : Use `OmpDeviceBuffer` wrapper for GPU buffer management instead of manual allocation
Applied to files:
src/turbulence_earsm.cpp
🧬 Code graph analysis (1)
src/turbulence_earsm.cpp (3)
include/features.hpp (2)
nu(129-129)b_xx(116-120)include/turbulence_transport.hpp (5)
nu(120-120)nu(120-120)k(187-187)C_mu(150-150)C_mu(150-150)include/turbulence_baseline.hpp (2)
C_mu(86-86)C_mu(86-86)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: gpu-tests
- GitHub Check: build-and-test (ubuntu-latest, Debug)
- GitHub Check: build-and-test (ubuntu-latest, Release)
🔇 Additional comments (5)
src/turbulence_earsm.cpp (5)
865-925: LGTM! GS model correctly implemented.The Gatski-Speziale cell kernel is correctly implemented with appropriate regularization and rotation correction. The strain/rotation computation and Re_t blending logic are duplicated from the WJ kernel, which maintains consistency across models.
928-970: LGTM! Pope model correctly implemented.The Pope quadratic cell kernel is correctly implemented with appropriate simplifications compared to the WJ/GS models. The computation is well-structured and consistent with the unified output kernel interface.
1001-1028: Excellent CPU/GPU unification for WJ model.The modification successfully unifies the CPU and GPU code paths by calling the same
earsm_wj_cell_kernelfrom both contexts. This eliminates code duplication and ensures consistent behavior across backends. The proper handling oftotal_size(marked unused in CPU builds) is good defensive programming.
1047-1074: Excellent CPU/GPU unification for GS model.The GS model now uses the unified
earsm_gs_cell_kernelfor both CPU and GPU execution, ensuring consistency and reducing duplication. The implementation correctly follows the same pattern as the WJ model.
1089-1116: Excellent CPU/GPU unification for Pope model.The Pope model successfully adopts the unified kernel approach. The CPU fallback is properly implemented with the same logic as the GPU path, ensuring bit-for-bit reproducibility across backends.
ci.sh: - Fix SC2155: Split local declaration and assignment for orig_dir - Add -DCMAKE_BUILD_TYPE=Release to ensure consistent build config test_cpu_gpu_bitwise.cpp: - Prevent GPU builds from running in --dump-prefix mode - This avoids accidentally generating reference files with GPU build - Clear error message directs user to rebuild with CPU 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
scripts/ci.sh (4)
304-318: Verify that reference files were actually created.After CPU reference generation succeeds (exit code 0), the script doesn't verify that the expected reference files actually exist. If the test binary succeeds but doesn't write files (e.g., due to a bug or permission issue), the GPU comparison in step 2 will fail with a confusing error.
🔎 Add reference file verification
if [ $cpu_exit_code -ne 0 ]; then log_failure "$test_name (CPU reference generation failed, exit code: $cpu_exit_code)" echo " Output (last 30 lines):" tail -30 "$output_file" | sed 's/^/ /' FAILED=$((FAILED + 1)) FAILED_TESTS="${FAILED_TESTS}\n - $test_name (CPU ref generation)" rm -f "$output_file" return 0 fi + + # Verify at least one reference file was created + local ref_files_count + ref_files_count=$(find "$ref_dir" -name "${ref_prefix}*" -type f 2>/dev/null | wc -l) + if [ "$ref_files_count" -eq 0 ]; then + log_failure "$test_name (No reference files created with prefix '${ref_prefix}')" + FAILED=$((FAILED + 1)) + FAILED_TESTS="${FAILED_TESTS}\n - $test_name (no ref files)" + rm -f "$output_file" + return 0 + fi
325-329: Document why OMP_TARGET_OFFLOAD=MANDATORY is needed.Line 329 uses
OMP_TARGET_OFFLOAD=MANDATORYbut the comment only explains part of the purpose. This environment variable is critical for detecting GPU offload failures - without it, OpenMP silently falls back to CPU execution, making GPU tests meaningless.🔎 Enhance comment
# Run GPU comparison against CPU reference - # MANDATORY ensures we fail if GPU offload doesn't work (no silent CPU fallback) + # OMP_TARGET_OFFLOAD=MANDATORY ensures we fail if GPU offload doesn't work. + # Without this, OpenMP silently falls back to CPU, making GPU testing meaningless. + # This catches GPU unavailability, driver issues, or misconfigured offload. log_info " Step 2: Running GPU and comparing against CPU reference..."
298-308: Consider cleaning reference directory before generating new files.The script creates the reference directory but doesn't clean existing files. If a previous test run wrote files with different prefixes or the test's output format changed, stale files could cause confusion during debugging.
🔎 Add reference cleanup
# Create reference directory local ref_dir="${PROJECT_DIR}/build_gpu/cpu_reference" mkdir -p "$ref_dir" + + # Clean any existing reference files for this test to avoid stale data + rm -f "${ref_dir}/${ref_prefix}"* local output_file="/tmp/test_output_$$.txt"
354-362: Inconsistent cmake configuration with ensure_build.Line 359 doesn't specify
-DCMAKE_BUILD_TYPE, whileensure_build(line 123) hardcodes it toRelease. This inconsistency means:
- Fallback builds will use CMake's default (often Debug)
- Cross-build tests will always use Release
For consistency, either both paths should specify the build type or both should omit it.
🔎 Align build configurations
if [ ! -d "$BUILD_DIR" ]; then log_info "Build directory not found. Creating and building..." mkdir -p "$BUILD_DIR" cd "$BUILD_DIR" - cmake .. -DUSE_GPU_OFFLOAD=${USE_GPU} -DBUILD_TESTS=ON + cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_GPU_OFFLOAD=${USE_GPU} -DBUILD_TESTS=ON make -j"$(nproc)" cd "$PROJECT_DIR" fi
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
scripts/ci.shtests/test_cpu_gpu_bitwise.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_cpu_gpu_bitwise.cpp
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{sh,slurm,batch}
📄 CodeRabbit inference engine (.cursor/rules/lean-changes.mdc)
Always submit slurm jobs using the embers QOS
Files:
scripts/ci.sh
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Applies to tests/test_physics_validation.cpp : Navier-Stokes solver validation tests must verify: viscous terms, pressure gradient, parabolic velocity profiles, divergence-free constraint, momentum balance, channel symmetry, cross-model consistency, and numerical stability
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: For GPU-related changes, also run `./test_before_ci_gpu.sh` to run the complete GPU CI test suite locally
Applied to files:
scripts/ci.sh
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Run `./test_before_ci.sh` before every push to repository to test Debug build, Release build, and all unit tests
Applied to files:
scripts/ci.sh
📚 Learning: 2025-12-25T20:38:56.549Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursor/rules/lean-debug.mdc:0-0
Timestamp: 2025-12-25T20:38:56.549Z
Learning: For GitHub CI debugging, run `./tools/ci_packet.sh <runid>` via terminal tool.
Applied to files:
scripts/ci.sh
📚 Learning: 2025-12-25T20:38:43.379Z
Learnt from: CR
Repo: comp-physics/cfd-nn PR: 0
File: .cursorrules:0-0
Timestamp: 2025-12-25T20:38:43.379Z
Learning: Create feature branches from latest main; run `./test_before_ci.sh` to verify starting state before development
Applied to files:
scripts/ci.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: gpu-tests
- GitHub Check: build-and-test (ubuntu-latest, Release)
- GitHub Check: build-and-test (ubuntu-latest, Debug)
🔇 Additional comments (5)
scripts/ci.sh (5)
1-25: LGTM! Clear documentation of cross-build behavior.The updated documentation clearly explains the GPU requirement for cross-build tests and the two-step process (CPU reference generation + GPU comparison).
91-100: LGTM! GPU detection logic is sound.The function correctly checks for both nvidia-smi availability and successful execution.
180-234: LGTM! env_prefix parameter correctly implemented.The optional environment variable prefix is properly handled with a default empty string and correctly passed to the
envcommand when present.
425-434: LGTM! Cross-build tests properly gated on GPU availability.The logic correctly skips cross-build tests when running in CPU-only mode (
--cpuflag), with clear messaging explaining why GPU is required for CPU vs GPU comparisons.
441-444: LGTM! GPU utilization test uses OMP_TARGET_OFFLOAD=MANDATORY.The GPU utilization test correctly uses
OMP_TARGET_OFFLOAD=MANDATORYvia theenv_prefixparameter to ensure GPU execution is enforced, consistent with the cross-build test approach.
Reuse y_wall_flat_ member variable instead of creating a local wall_dist_buf on every update() call. Also fix size_t to int for consistency with rest of codebase. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PR Type
Enhancement, Tests
Description
Unify CPU/GPU kernel code paths for turbulence models
#pragma omp declare targetReturn valid host pointers in CPU TurbulenceDeviceView
Improve CPU/GPU consistency testing
Rename ci.sh and improve test infrastructure
run_ci_local.sh→ci.sh(used by both local and CI pipelines)ensure_build()function to guarantee both CPU/GPU builds existDiagram Walkthrough
File Walkthrough
6 files
Return valid host pointers in CPU TurbulenceDeviceViewReplace inlined GPU code with unified mixing_length kernelExtract EARSM output computation into unified kernelCreate unified GEP cell kernel for CPU/GPUUnify SST k-ω transport into single cell kernelRename from run_ci_local.sh and improve cross-build testing2 files
Add GPU device verification and MIN_EXPECTED_DIFF thresholdAdd GPU device verification and MIN_EXPECTED_DIFF threshold3 files
Update script references from run_ci_local.sh to ci.shUpdate script references from run_ci_local.sh to ci.shUpdate script references from run_ci_local.sh to ci.shSummary by CodeRabbit
Tests
New Features
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.