-
Notifications
You must be signed in to change notification settings - Fork 127
[Experimental PR] Self-Hosted Container Tests #1088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Experimental PR] Self-Hosted Container Tests #1088
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a scheduled/manual GitHub Actions workflow ( Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as Self-hosted Runner
participant Host as Runner Host
participant Apptainer as Apptainer
participant Container as Container Env
participant MFC as mfc.sh
GH->>Runner: trigger docker-test (cron/dispatch/push/PR)
Runner->>Host: prepare env, set NODE_OPTIONS, enforce timeout
Host->>Apptainer: select image by device/interface & launch
Apptainer->>Container: instantiate with NV/GPU flags, binds, env
Container->>MFC: run dry-run (mfc.sh --dry-run)
MFC-->>Container: dry-run output
Container->>Container: compute gpu_count, gpu_ids, thread counts
Container->>MFC: run tests (mfc.sh with device opts, retries)
MFC-->>Container: emit logs & exit status
Container->>Host: return logs/artifacts
Runner->>GH: print/upload logs, finish job
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (4)📚 Learning: 2025-11-24T21:50:46.909ZApplied to files:
📚 Learning: 2025-11-24T21:50:16.713ZApplied to files:
📚 Learning: 2025-11-24T21:50:46.909ZApplied to files:
📚 Learning: 2025-11-24T21:50:46.909ZApplied to files:
🧬 Code graph analysis (1).github/workflows/phoenix/container.sh (1)
🪛 Shellcheck (0.11.0).github/workflows/phoenix/container.sh[warning] 3-3: job_interface is referenced but not assigned. (SC2154) [warning] 4-4: job_device is referenced but not assigned. (SC2154) [warning] 18-18: job_slug is referenced but not assigned. (SC2154) 🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Nitpicks 🔍
|
| apptainer build mfc:latest-\$job_device.sif docker://sbryngelson/mfc:latest-mp-\$job_device | ||
| elif [ "$job_interface" = "acc" ]; then | ||
| apptainer build mfc:latest-\$job_device.sif docker://sbryngelson/mfc:latest-\$job_device | ||
| fi | ||
|
|
||
| CONTAINER="mfc:latest-\$job_device.sif" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Suggestion: Host-side variables are escaped (e.g. \$job_device) so they are not expanded by the shell on the host; this produces literal strings like "$job_device" for built image names and the CONTAINER variable, causing wrong container filenames and failed builds or execs. Remove the backslashes so variables expand on the host and quote the resulting values to avoid word-splitting. [logic error]
Severity Level: Minor
| apptainer build mfc:latest-\$job_device.sif docker://sbryngelson/mfc:latest-mp-\$job_device | |
| elif [ "$job_interface" = "acc" ]; then | |
| apptainer build mfc:latest-\$job_device.sif docker://sbryngelson/mfc:latest-\$job_device | |
| fi | |
| CONTAINER="mfc:latest-\$job_device.sif" | |
| apptainer build "mfc:latest-$job_device.sif" "docker://sbryngelson/mfc:latest-mp-$job_device" | |
| elif [ "$job_interface" = "acc" ]; then | |
| apptainer build "mfc:latest-$job_device.sif" "docker://sbryngelson/mfc:latest-$job_device" | |
| fi | |
| CONTAINER="mfc:latest-$job_device.sif" |
Why it matters? ⭐
This is a real logic bug: those backslashes prevent the host shell from expanding $job_device when running apptainer build and when setting CONTAINER, producing literal names like "mfc:latest-$job_device.sif" instead of the intended concrete filenames. Unescaping and quoting the expansions fixes the filenames and avoids word-splitting.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** .github/workflows/phoenix/container.sh
**Line:** 4:9
**Comment:**
*Logic Error: Host-side variables are escaped (e.g. `\$job_device`) so they are not expanded by the shell on the host; this produces literal strings like "$job_device" for built image names and the `CONTAINER` variable, causing wrong container filenames and failed builds or execs. Remove the backslashes so variables expand on the host and quote the resulting values to avoid word-splitting.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| [ "\$job_device" = "gpu" ] && NV_FLAG="--nv" | ||
|
|
||
| apptainer exec \$NV_FLAG --fakeroot --writable-tmpfs \ | ||
| --bind "\$SCRATCH_DIR":/scratch \ | ||
| --env job_slug="\$job_slug" \ | ||
| --env job_device="\$job_device" \ | ||
| "\$CONTAINER" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Suggestion: The apptainer exec invocation and its arguments contain escaped variables (e.g. \$SCRATCH_DIR, "\$CONTAINER", \$NV_FLAG) so the host will pass literal tokens rather than expanded values; unescape these so the runtime receives actual paths, env values and the container name. [logic error]
Severity Level: Minor
| [ "\$job_device" = "gpu" ] && NV_FLAG="--nv" | |
| apptainer exec \$NV_FLAG --fakeroot --writable-tmpfs \ | |
| --bind "\$SCRATCH_DIR":/scratch \ | |
| --env job_slug="\$job_slug" \ | |
| --env job_device="\$job_device" \ | |
| "\$CONTAINER" \ | |
| [ "$job_device" = "gpu" ] && NV_FLAG="--nv" | |
| apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ | |
| --bind "$SCRATCH_DIR":/scratch \ | |
| --env job_slug="$job_slug" \ | |
| --env job_device="$job_device" \ | |
| "$CONTAINER" \ |
Why it matters? ⭐
Correct: the exec arguments (NV_FLAG, --bind, --env, container name) must be expanded by the host before calling apptainer. The current escaping passes literal tokens like "$SCRATCH_DIR" or "$CONTAINER" which is wrong. Unescaping these fixes the environment and path propagation into the container.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** .github/workflows/phoenix/container.sh
**Line:** 12:18
**Comment:**
*Logic Error: The apptainer exec invocation and its arguments contain escaped variables (e.g. `\$SCRATCH_DIR`, `"\$CONTAINER"`, `\$NV_FLAG`) so the host will pass literal tokens rather than expanded values; unescape these so the runtime receives actual paths, env values and the container name.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds weekly automated testing of MFC's Docker containers (CPU and GPU variants) on the Phoenix cluster at Georgia Tech. The testing uses Apptainer to pull and run the latest published Docker images, executing the full MFC test suite within each container to verify the containerized builds work correctly.
- Adds a new GitHub Actions workflow (
docker-test.yml) that runs weekly on Fridays - Creates a new script (
container.sh) that pulls Docker images via Apptainer and runs tests inside containers - Tests three configurations: GPU with OpenACC, GPU with OpenMP, and CPU
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
.github/workflows/docker-test.yml |
Defines the weekly workflow schedule and matrix for testing CPU and GPU containers with different parallelization interfaces |
.github/workflows/phoenix/container.sh |
Script that pulls Docker containers via Apptainer and executes MFC test suite inside them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 2 files
Prompt for AI agents (all 3 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name=".github/workflows/docker-test.yml">
<violation number="1" location=".github/workflows/docker-test.yml:45">
P2: Log filename mismatch: The workflow expects `test-{device}-{interface}.out` but submit.sh generates `container-{device}-{interface}.out` based on the script basename. This will cause the 'Print Logs' step to fail with 'file not found'.</violation>
</file>
<file name=".github/workflows/phoenix/container.sh">
<violation number="1" location=".github/workflows/phoenix/container.sh:4">
P1: Inconsistent variable escaping: `$job_interface` is unescaped (expands normally) but `\$job_device` is escaped (won't expand, remains literal). Remove the backslashes to allow `$job_device` to expand properly.</violation>
<violation number="2" location=".github/workflows/phoenix/container.sh:44">
P1: Orphan `EOT` heredoc terminator without a corresponding heredoc start. This will cause a shell error when the script executes. Either remove this line or add the missing heredoc opening statement.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
.github/workflows/phoenix/container.sh (5)
43-44: Remove the strayEOTtoken—it will cause a "command not found" error.Line 44 contains an orphan
EOTtoken with no corresponding here-document. The bash-ccommand string properly ends on line 43 with a closing single quote. TheEOTwill be executed as a shell command at runtime, causing the script to fail with "command not found."Apply this diff to remove the extraneous line:
./mfc.sh test --max-attempts 3 -a -j $n_test_threads $device_opts -- -c phoenix ' -EOT
1-2: Addset -efor fail-fast error handling.Shell scripts should exit immediately on command failure rather than continue silently. This ensures that errors in
apptainerinvocations or test execution are caught early instead of masking failures.Apply this diff to add error handling:
#!/bin/bash +set -e if [ "$job_interface" = "omp" ]; then
3-7: Add missing CPU case for container build.The script handles only GPU interfaces (
ompandacc), leaving the CPU case undefined. Whenjob_device=cpuandjob_interface=none(as configured in the workflow matrix on lines 28-29), the script falls through to theelseblock and attempts to build with undefined values, causing a build failure.Apply this diff to handle the CPU case:
if [ "$job_interface" = "omp" ]; then apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-mp-$job_device elif [ "$job_interface" = "acc" ]; then apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-$job_device +elif [ "$job_device" = "cpu" ]; then + apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-$job_device fi
9-18: Fix variable escaping in apptainer invocation—variables must expand on host.The variables on lines 14-18 are escaped with backslashes (
\$NV_FLAG,\$SCRATCH_DIR, etc.), which prevents the shell from expanding them. This means apptainer receives literal strings like"$CONTAINER"instead of the actual container name, causing container lookup failures and incorrect environment bindings.Variables set on the host (like
$CONTAINERon line 9) must be unescaped so they expand before being passed to apptainer.Apply this diff to fix the escaping:
CONTAINER="mfc:latest-$job_device.sif" NV_FLAG="" -[ "\$job_device" = "gpu" ] && NV_FLAG="--nv" +[ "$job_device" = "gpu" ] && NV_FLAG="--nv" -apptainer exec \$NV_FLAG --fakeroot --writable-tmpfs \ - --bind "\$SCRATCH_DIR":/scratch \ - --env job_slug="\$job_slug" \ - --env job_device="\$job_device" \ - "\$CONTAINER" \ +apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ + --bind "$SCRATCH_DIR":/scratch \ + --env job_slug="$job_slug" \ + --env job_device="$job_device" \ + "$CONTAINER" \
15-15: Define theSCRATCH_DIRenvironment variable before running this script.The
--bind "\$SCRATCH_DIR":/scratchmount on line 15 will fail ifSCRATCH_DIRis undefined, resulting in an invalid apptainer bind argument--bind "":/scratch. This variable is not defined in the script or passed via the SLURM environment. Either define it explicitly (e.g., from$TMPDIRor a hardcoded Phoenix scratch path) or add a fallback:NV_FLAG="" [ "\$job_device" = "gpu" ] && NV_FLAG="--nv" +SCRATCH_DIR="${SCRATCH_DIR:-.}" # Use SCRATCH_DIR if set, else current directory + apptainer exec \$NV_FLAG --fakeroot --writable-tmpfs \.github/workflows/docker-test.yml (1)
45-47: Fix log filename mismatch and add wait for async job completion.Two issues prevent reliable log retrieval:
Filename mismatch (critical): The "Print Logs" step expects
test-{device}-{interface}.out, butsubmit.shgenerates log files using the script basename, producingcontainer-{device}-{interface}.out. This will cause "file not found" errors.Race condition (major): The
submit.shscript submits a job to the cluster scheduler and returns immediately, but test execution happens asynchronously on the cluster. The "Print Logs" step runs without waiting, so the log file may not exist yet.Apply this diff to fix the filename and add explicit waiting:
- name: Pull & Test if: matrix.lbl == 'gt' run: bash .github/workflows/phoenix/submit.sh .github/workflows/phoenix/container.sh ${{ matrix.device }} ${{ matrix.interface }} + - name: Wait for job completion + if: matrix.lbl == 'gt' + run: | + timeout=3600 # 1 hour timeout + elapsed=0 + while [ $elapsed -lt $timeout ]; do + if [ -f "container-${{ matrix.device }}-${{ matrix.interface }}.out" ]; then + echo "Log file found" + exit 0 + fi + sleep 10 + elapsed=$((elapsed + 10)) + done + echo "Timeout waiting for job completion" + exit 1 + - name: Print Logs if: always() - run: cat test-${{ matrix.device }}-${{ matrix.interface }}.out + run: cat container-${{ matrix.device }}-${{ matrix.interface }}.out || echo "Log file not found"
🧹 Nitpick comments (2)
.github/workflows/phoenix/container.sh (1)
39-39: Modernize command substitution syntax from backticks to$(…).Line 39 uses deprecated backtick syntax for command substitution, which is inconsistent with the
$(...)syntax already used on lines 36-37. Backticks are harder to nest and read; the modern form is preferred in all modern bash style guides.Apply this diff:
- n_test_threads=`expr $gpu_count \* 2` + n_test_threads=$(expr $gpu_count \* 2)Alternatively, use arithmetic expansion for clearer intent:
- n_test_threads=`expr $gpu_count \* 2` + n_test_threads=$((gpu_count * 2)).github/workflows/docker-test.yml (1)
41-43: Remove redundant conditional—all matrix entries havelbl: 'gt'.The
if: matrix.lbl == 'gt'conditional on line 42 is redundant because all matrix configurations (commented-out and active) specifylbl: 'gt'. This conditional logic adds unnecessary complexity without providing flexibility. Either remove the conditional if all jobs should always run this step, or clarify the intent if other labels will be added in the future.Simplify by removing the conditional:
- name: Pull & Test - if: matrix.lbl == 'gt' run: bash .github/workflows/phoenix/submit.sh .github/workflows/phoenix/container.sh ${{ matrix.device }} ${{ matrix.interface }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.github/workflows/bench.yml(0 hunks).github/workflows/cleanliness.yml(0 hunks).github/workflows/coverage.yml(0 hunks).github/workflows/deploy-tap.yml(0 hunks).github/workflows/docker-test.yml(1 hunks).github/workflows/docker.yml(0 hunks).github/workflows/docs.yml(0 hunks).github/workflows/formatting.yml(0 hunks).github/workflows/frontier/bench.sh(0 hunks).github/workflows/frontier/build.sh(0 hunks).github/workflows/frontier/submit-bench.sh(0 hunks).github/workflows/frontier/submit.sh(0 hunks).github/workflows/frontier/test.sh(0 hunks).github/workflows/homebrew.yml(0 hunks).github/workflows/line-count.yml(0 hunks).github/workflows/lint-source.yml(0 hunks).github/workflows/lint-toolchain.yml(0 hunks).github/workflows/phoenix/container.sh(1 hunks).github/workflows/pmd.yml(0 hunks).github/workflows/spelling.yml(0 hunks).github/workflows/test.yml(0 hunks)
💤 Files with no reviewable changes (19)
- .github/workflows/lint-source.yml
- .github/workflows/deploy-tap.yml
- .github/workflows/coverage.yml
- .github/workflows/cleanliness.yml
- .github/workflows/formatting.yml
- .github/workflows/lint-toolchain.yml
- .github/workflows/frontier/bench.sh
- .github/workflows/test.yml
- .github/workflows/frontier/submit-bench.sh
- .github/workflows/docker.yml
- .github/workflows/frontier/test.sh
- .github/workflows/line-count.yml
- .github/workflows/bench.yml
- .github/workflows/frontier/submit.sh
- .github/workflows/frontier/build.sh
- .github/workflows/docs.yml
- .github/workflows/homebrew.yml
- .github/workflows/spelling.yml
- .github/workflows/pmd.yml
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: After each successful build step, run focused tests using `./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336` instead of running all tests
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
.github/workflows/phoenix/container.sh
🪛 actionlint (1.7.9)
.github/workflows/docker-test.yml
26-26: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🪛 Shellcheck (0.11.0)
.github/workflows/phoenix/container.sh
[warning] 3-3: job_interface is referenced but not assigned.
(SC2154)
[warning] 4-4: job_device is referenced but not assigned.
(SC2154)
[warning] 9-9: CONTAINER appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: This expression is constant. Did you forget the $ on a variable?
(SC2050)
[warning] 12-12: NV_FLAG appears unused. Verify use (or export if used externally).
(SC2034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Georgia Tech | Phoenix (cpu)
| NODE_OPTIONS: ${{ matrix.lbl == 'gt' && '--max-old-space-size=2048' || '' }} | ||
| ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION: node16 | ||
| ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upgrade Node.js to a supported LTS version—Node 16 is end-of-life.
The workflow forces Node.js 16 and explicitly allows running insecure node versions. Node 16 reached end-of-life in September 2023 and contains known security vulnerabilities. Modern GitHub Actions runners include Node.js 20 LTS or later. Remove these version overrides and allow the runner's default Node.js to be used, or explicitly specify a current LTS version like node20.
Apply this diff to remove the insecure overrides:
env:
NODE_OPTIONS: ${{ matrix.lbl == 'gt' && '--max-old-space-size=2048' || '' }}
- ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION: node16
- ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: trueIf you need to enforce a specific Node.js version, use setup-node@v4 action in a step instead:
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'🤖 Prompt for AI Agents
.github/workflows/docker-test.yml lines 34-36: the workflow forces Node.js 16
and allows insecure Node versions; remove the unsafe overrides and either let
the runner default Node be used or explicitly set a current LTS via
setup-node@v4. Delete the ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION and
ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION lines, and if you must pin a version add
a setup-node@v4 step with node-version: '20' (or another supported LTS) instead
of forcing the runner environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
.github/workflows/phoenix/container.sh (2)
19-27: Passjob_interfaceto the container via--envso the GPU interface detection works.Inside the bash
-ccommand (single-quoted, lines 19–43), the script references$job_interfaceon line 24 to choose between OpenMP (mp) and OpenACC (acc). However,job_interfaceis not exported to the container environment (compare withjob_deviceandjob_slugon lines 16–17). Since the bash-cuses single quotes, shell variables won't expand from the outer scope—only environment variables will be accessible. Without--env job_interface="$job_interface", the conditional on line 24 will fail or behave unexpectedly.apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ --bind "$SCRATCH_DIR":/scratch \ --env job_slug="$job_slug" \ --env job_device="$job_device" \ + --env job_interface="$job_interface" \ "$CONTAINER" \
44-44: Remove the strayEOTline—it will cause a "command not found" error.The bash
-ccommand ends with the closing single quote on line 43, so the orphanedEOTtoken on line 44 will be executed as a shell command at runtime and fail../mfc.sh test --max-attempts 3 -a -j $n_test_threads $device_opts -- -c phoenix ' -EOT
🧹 Nitpick comments (3)
.github/workflows/phoenix/container.sh (3)
39-39: Replace deprecated backticks with modern$()command substitution syntax.The use of backticks for command substitution is outdated in modern Bash. Adopt the
$()syntax for consistency and readability.- n_test_threads=`expr $gpu_count \* 2` + n_test_threads=$(expr $gpu_count \* 2)
1-2: Addset -eto fail fast on errors.The script lacks error handling. Without
set -e, ifapptainer build,apptainer exec, or any MFC test command fails, the script will continue silently and the workflow may report false success. Compare with other Phoenix scripts (e.g.,submit.sh) which useset -eto exit immediately on errors.#!/bin/bash +set -e
3-7: Consider adding explicit handling for thecputest case.The current container image selection (lines 3–7) only explicitly handles the
ompandaccGPU interfaces. If acpujob runs (as indicated in the PR summary's matrix strategy), theelsebranch will attempt to buildmfc:latest-cpu. Verify that this image exists in the Docker registry, or add an explicitcpucase for clarity. For example:if [ "$job_interface" = "omp" ]; then apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-mp-$job_device elif [ "$job_interface" = "acc" ]; then apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-$job_device +elif [ "$job_device" = "cpu" ]; then + apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-cpu fi
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/phoenix/container.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: After each successful build step, run focused tests using `./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336` instead of running all tests
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
.github/workflows/phoenix/container.sh
🪛 Shellcheck (0.11.0)
.github/workflows/phoenix/container.sh
[warning] 3-3: job_interface is referenced but not assigned.
(SC2154)
[warning] 4-4: job_device is referenced but not assigned.
(SC2154)
[warning] 16-16: job_slug is referenced but not assigned.
(SC2154)
| apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ | ||
| --bind "$SCRATCH_DIR":/scratch \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and read the container.sh file
find . -name "container.sh" -type f | head -20Repository: MFlowCode/MFC
Length of output: 97
🏁 Script executed:
# List the phoenix workflows directory
fd . .github/workflows/phoenix/ 2>/dev/null | head -20Repository: MFlowCode/MFC
Length of output: 242
🏁 Script executed:
# Search for SCRATCH_DIR references
rg "SCRATCH_DIR" --no-headingRepository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
cat -n .github/workflows/phoenix/container.shRepository: MFlowCode/MFC
Length of output: 1522
🏁 Script executed:
# Search for calls to container.sh
rg "container\.sh" --no-heading -A 5 -B 5Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# List all YAML files in the workflows directory
fd "\.ya?ml$" .github/workflows/Repository: MFlowCode/MFC
Length of output: 90
🏁 Script executed:
# Find all YAML workflow files
find .github/workflows -name "*.yml" -o -name "*.yaml"Repository: MFlowCode/MFC
Length of output: 90
🏁 Script executed:
# Search for where container.sh is called
rg "container\.sh" .github/workflows/ -lRepository: MFlowCode/MFC
Length of output: 90
🏁 Script executed:
# Check the phoenix directory for workflow files
cat .github/workflows/phoenix/*.yml 2>/dev/null || echo "No .yml files found"
cat .github/workflows/phoenix/*.yaml 2>/dev/null || echo "No .yaml files found"Repository: MFlowCode/MFC
Length of output: 97
🏁 Script executed:
cat -n .github/workflows/docker-test.ymlRepository: MFlowCode/MFC
Length of output: 1852
🏁 Script executed:
# Search for any file that might call the container.sh script
rg "container\.sh" --no-heading -B 10 -A 10Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
cat -n .github/workflows/phoenix/submit.shRepository: MFlowCode/MFC
Length of output: 1943
🏁 Script executed:
# Also check for any environment variable definitions or exports
rg "SCRATCH_DIR\s*=" .github/workflows/ -B 5 -A 5Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check if mfc.sh defines or sets SCRATCH_DIR
rg "SCRATCH_DIR" . --no-heading -B 5 -A 5Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Look at mfc.sh file, particularly the load command
cat -n mfc.sh | head -100Repository: MFlowCode/MFC
Length of output: 2473
🏁 Script executed:
# Search for SCRATCH_DIR in mfc.sh
rg "SCRATCH_DIR" mfc.sh -B 5 -A 5Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check the modules.sh file which is called by "mfc.sh load"
cat -n toolchain/bootstrap/modules.sh | head -150Repository: MFlowCode/MFC
Length of output: 5022
🏁 Script executed:
# Search for SCRATCH in the entire repo more broadly
rg "SCRATCH" . --no-heading -B 3 -A 3Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check if there are any environment setup scripts
fd "\.sh$" toolchain/ | head -20Repository: MFlowCode/MFC
Length of output: 326
🏁 Script executed:
# Check the toolchain/modules file which contains environment variables
cat -n toolchain/modules | grep -A 10 -B 5 "^p"Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Also search for SCRATCH in toolchain
rg "SCRATCH" toolchain/ --no-heading -B 3 -A 3Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Let's look at the full toolchain/modules file to understand the format
cat toolchain/modulesRepository: MFlowCode/MFC
Length of output: 3098
🏁 Script executed:
# Search for SLURM-related variables or SCRATCH references in comments
rg "SLURM|SCRATCH|TMPDIR" . --no-heading -B 2 -A 2Repository: MFlowCode/MFC
Length of output: 3570
🏁 Script executed:
# Check if there's a README or documentation that mentions SCRATCH_DIR
fd "README|readme|SETUP|setup" . -type f | head -10Repository: MFlowCode/MFC
Length of output: 227
🏁 Script executed:
# Look for any CI/CD documentation or workflow notes
find . -name "*.md" -type f | xargs rg "SCRATCH_DIR|container\.sh" 2>/dev/null || trueRepository: MFlowCode/MFC
Length of output: 39
Ensure SCRATCH_DIR environment variable is defined before use.
The script binds $SCRATCH_DIR to /scratch inside the container, but this variable is never defined in the codebase. It must be set by the self-hosted runner environment or the workflow. If not provided, the bind mount will fail. Define a fallback (e.g., using TMPDIR or a hardcoded path) to prevent runtime failures.
🤖 Prompt for AI Agents
.github/workflows/phoenix/container.sh around lines 14-15: the script uses
$SCRATCH_DIR in the apptainer bind but never guarantees it exists; set a safe
fallback before the apptainer exec (for example default SCRATCH_DIR to
${TMPDIR:-/tmp} if unset) and ensure the variable is exported/validated (exists
and is writable) so the bind mount cannot fail at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
.github/workflows/phoenix/container.sh (3)
43-44: Remove orphanEOTterminator.Line 44 contains a stray
EOTtoken with no matching heredoc start (<<EOT). This will be executed as a command and fail with a "command not found" error. The bash -c command correctly ends at line 43 with the closing single quote.Apply this diff:
./mfc.sh test --max-attempts 3 -a -j $n_test_threads $device_opts -- -c phoenix ' -EOT
14-18: Exportjob_interfaceas environment variable to container.
job_interfaceis used at line 24 inside the container, but it is not passed via--env. Since the bash -c command is single-quoted,$job_interfacewill not expand and the conditional will fail, breaking GPU interface detection.Apply this diff:
apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ --bind "$SCRATCH_DIR":/scratch \ --env job_slug="$job_slug" \ --env job_device="$job_device" \ + --env job_interface="$job_interface" \ "$CONTAINER" \
14-15: Define or validateSCRATCH_DIRbefore use.
SCRATCH_DIRis used in the bind mount but is never defined in this script or in the codebase. If the Phoenix cluster environment does not provide this variable, the apptainer bind will fail at runtime. Either define a fallback (e.g., usingTMPDIRor a hardcoded path) or add validation to ensure it exists.Apply this diff to add a safe fallback:
#!/bin/bash set -e +# Use SCRATCH_DIR if set, otherwise default to a safe temporary location +SCRATCH_DIR="${SCRATCH_DIR:-${TMPDIR:-/tmp}}" + if [ "$job_interface" = "omp" ]; then
🧹 Nitpick comments (2)
.github/workflows/phoenix/container.sh (2)
1-2: Add error handling to prevent silent failures.Without
set -e, failed apptainer or mfc.sh commands may go unnoticed, allowing the workflow to continue despite errors. This is especially critical for critical infrastructure like CI/CD pipelines.Apply this diff:
#!/bin/bash +set -e
39-39: Use modern$()syntax instead of deprecated backticks.Line 39 uses backticks for command substitution, which is deprecated. Lines 36–37 already use the modern
$(...)syntax; for consistency, update this line as well.Apply this diff:
- n_test_threads=`expr $gpu_count \* 2` + n_test_threads=$((gpu_count * 2))Note:
$((...))is preferred overexprfor arithmetic and is more efficient.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/phoenix/container.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: After each successful build step, run focused tests using `./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336` instead of running all tests
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
.github/workflows/phoenix/container.sh
🪛 Shellcheck (0.11.0)
.github/workflows/phoenix/container.sh
[warning] 3-3: job_interface is referenced but not assigned.
(SC2154)
[warning] 4-4: job_device is referenced but not assigned.
(SC2154)
[warning] 16-16: job_slug is referenced but not assigned.
(SC2154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Georgia Tech | Phoenix (cpu)
🔇 Additional comments (2)
.github/workflows/phoenix/container.sh (2)
3-7: Container image selection logic is correct.The conditional build correctly differentiates between omp and acc interfaces, pulling appropriate images from the registry.
9-12: GPU flag and container name setup is correct.The NV_FLAG is properly set only for GPU jobs, and CONTAINER name matches the built image.
Added a new matrix configuration for GPU testing.
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
.github/workflows/phoenix/container.sh (3)
44-46: Critical: Remove the orphanEOTline.Line 45 closes the bash -c command with a single quote
'. Line 46 contains a strayEOTtoken with no matching heredoc, which will cause a "command not found: EOT" runtime error when the script executes.Remove line 46:
./mfc.sh test --max-attempts 3 -a -j $n_test_threads $device_opts -- -c phoenix ' -EOT \ No newline at end of file
14-17: Major: Validate thatSCRATCH_DIRis defined before use.Line 17 binds
$SCRATCH_DIRinto the container as/scratch, but the variable is never defined or validated in this script. If the self-hosted Phoenix runner does not exportSCRATCH_DIR, the bind mount will fail or bind an incorrect path, causing the container execution to fail silently or incorrectly.Add validation at the start of the script to ensure
SCRATCH_DIRis set, or define a safe fallback:#!/bin/bash +set -e + +# Ensure SCRATCH_DIR is defined; fall back to TMPDIR or /tmp if not set +export SCRATCH_DIR="${SCRATCH_DIR:-${TMPDIR:-/tmp}}"
16-29: Critical:job_interfaceis not passed to the container but is used inside it.Lines 26 and 28 reference
$job_interfaceinside the bash -c command block, but the variable is not exported via--env(onlyjob_slugandjob_deviceare on lines 18–19). Inside the container,$job_interfacewill be empty or undefined, breaking GPU interface detection logic.Add
job_interfaceto the environment variables:apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ --bind "$SCRATCH_DIR":/scratch \ --env job_slug="$job_slug" \ --env job_device="$job_device" \ + --env job_interface="$job_interface" \ "$CONTAINER" \.github/workflows/docker-test.yml (1)
34-36: Major: Remove Node.js 16 EOL configuration and explicitly allow insecure versions.Node.js 16 reached end-of-life in September 2023 and contains known security vulnerabilities. Forcing the runner to use Node 16 and explicitly allowing insecure versions (lines 35–36) introduces unnecessary security risk. Modern GitHub Actions runners include Node 20 LTS or later by default.
Remove the version override lines and let the runner's default Node.js be used, or explicitly pin a current LTS version:
env: NODE_OPTIONS: ${{ matrix.lbl == 'gt' && '--max-old-space-size=2048' || '' }} - ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION: node16 - ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true steps: - name: CloneIf you must enforce a specific Node.js version, add a
setup-node@v4step in the workflow instead of forcing it via environment variables.
🧹 Nitpick comments (2)
.github/workflows/phoenix/container.sh (1)
1-1: Use modern$(...)command substitution syntax and addset -efor safety.
- Line 41 uses backticks for command substitution, which is deprecated. Lines 38–39 already use the modern
$(...)syntax—keep it consistent.- The script lacks
set -e, which means failures in apptainer or MFC test commands may go unnoticed, allowing the workflow to proceed despite errors.Apply this diff to add error handling and modernize syntax:
#!/bin/bash +set -e if [ "$job_interface" = "omp" ]; then apptainer build mfc:latest-$job_device.sif docker://sbryngelson/mfc:latest-mp-$job_device @@ -38,7 +40,7 @@ if [ "$job_device" = "gpu" ]; then gpu_count=$(nvidia-smi -L | wc -l) # number of GPUs on node gpu_ids=$(seq -s " " 0 $(($gpu_count-1))) # 0,1,2,...,gpu_count-1 device_opts="-g $gpu_ids" - n_test_threads=`expr $gpu_count \* 2` + n_test_threads=$(expr $gpu_count \* 2) fiAlso applies to: 41-41
.github/workflows/docker-test.yml (1)
42-42: Optional: Simplify redundant matrix label condition.Line 42 checks
if: matrix.lbl == 'gt', but all active and commented-out matrix entries havelbl: 'gt'(lines 18, 23, 27). The conditional adds complexity without filtering. If all jobs should always run this step, remove the condition. If this condition is intentional for future extensibility with other labels, consider adding a clarifying comment.- name: Pull & Test - if: matrix.lbl == 'gt' run: bash .github/workflows/phoenix/submit.sh .github/workflows/phoenix/container.sh ${{ matrix.device }} ${{ matrix.interface }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/docker-test.yml(1 hunks).github/workflows/phoenix/container.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: After each successful build step, run focused tests using `./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336` instead of running all tests
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
.github/workflows/phoenix/container.sh
🪛 actionlint (1.7.9)
.github/workflows/docker-test.yml
18-18: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🪛 Shellcheck (0.11.0)
.github/workflows/phoenix/container.sh
[warning] 3-3: job_interface is referenced but not assigned.
(SC2154)
[warning] 4-4: job_device is referenced but not assigned.
(SC2154)
[warning] 18-18: job_slug is referenced but not assigned.
(SC2154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
|
Almost done with the implementation, but it requires a little bit of debugging. Overall, the concept is shown to be feasible as far as I am concerned. Will deprecate for now and reinstate if needed. |
Comment out the unset LD_PRELOAD command in container.sh
[ 16%] Building Fortran object CMakeFiles/syscheck_lib.dir/modules/syscheck/m_thermochem.f90.o
[ 33%] Building Fortran object CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o
nvfortran-Fatal-nvfortran-Fatal-The value of CUDAROOT is not a directory: /usr/local/pace-apps/spack/packages/linux-rhel9-x86_64_v3/gcc-11.3.1/cuda-12.1.1-ebglvvqo7uhjvhvff2qlsjtjd54louafThe value of CUDAROOT is not a directory: /usr/local/pace-apps/spack/packages/linux-rhel9-x86_64_v3/gcc-11.3.1/cuda-12.1.1-ebglvvqo7uhjvhvff2qlsjtjd54louaf
nvfortran-Error-nvfortran-Error-A CUDA toolkit matching the current driver version (12.9) or a supported older version (11.0 or 12.3) was not installed with this HPC SDK.A CUDA toolkit matching the current driver version (12.9) or a supported older version (11.0 or 12.3) was not installed with this HPC SDK.
gmake[3]: *** [CMakeFiles/syscheck_lib.dir/build.make:80: CMakeFiles/syscheck_lib.dir/modules/syscheck/m_thermochem.f90.o] Error 2
gmake[3]: *** Waiting for unfinished jobs....
gmake[3]: *** [CMakeFiles/syscheck_lib.dir/build.make:93: CMakeFiles/syscheck_lib.dir/fypp/syscheck/syscheck.fpp.f90.o] Error 2
gmake[2]: *** [CMakeFiles/Makefile2:111: CMakeFiles/syscheck_lib.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:92: CMakeFiles/syscheck.dir/rule] Error 2
gmake: *** [Makefile:170: syscheck] Error 2
Error: Failed to build the syscheck target.
Terminated |
Comment out the loading of mfc.sh in submit.sh
Updated Docker test workflow to include new interface configurations.
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
.github/workflows/test.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level Suggestion
The PR should be revised to restore the many CI workflows that were deleted, as their removal severely damages test coverage and repository automation. Only the new container testing workflow should be added, not replace the existing infrastructure. [High-level, importance: 10]
Solution Walkthrough:
Before:
# .github/workflows/
- bench.yml
- cleanliness.yml
- coverage.yml
- docker.yml
- docs.yml
- test.yml
- ... (12 other workflow files deleted)
+ docker-test.yml
After:
# .github/workflows/
- bench.yml (reinstated)
- cleanliness.yml (reinstated)
- coverage.yml (reinstated)
- docker.yml (reinstated)
- docs.yml (reinstated)
- test.yml (reinstated)
- ... (all other workflows reinstated)
+ docker-test.yml (added)
| - lbl: 'gt' | ||
| cluster_name: 'Georgia Tech | Phoenix' | ||
| device: 'gpu' | ||
| interface: 'mp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: In .github/workflows/docker-test.yml, change the interface for the GPU test case from 'mp' to 'omp'. This ensures it matches the value expected by the .github/workflows/phoenix/container.sh script. [possible issue, importance: 7]
| - lbl: 'gt' | |
| cluster_name: 'Georgia Tech | Phoenix' | |
| device: 'gpu' | |
| interface: 'mp' | |
| - lbl: 'gt' | |
| cluster_name: 'Georgia Tech | Phoenix' | |
| device: 'gpu' | |
| interface: 'omp' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
.github/workflows/phoenix/container.sh (1)
16-20: Passjob_interfaceto the container environment.The script references
$job_interfaceinside the container (lines 26-29) but only passesjob_slugandjob_devicevia--env. This will cause the interface-specific build options to fail silently.Apply this diff:
apptainer exec $NV_FLAG --fakeroot --writable-tmpfs \ --bind "$SCRATCH_DIR":/scratch \ --env job_slug="$job_slug" \ --env job_device="$job_device" \ + --env job_interface="$job_interface" \ "$CONTAINER" \Based on past review comments flagging this issue.
.github/workflows/docker-test.yml (1)
35-36: Node.js 16 end-of-life security risk—extensively flagged in previous reviews.These lines force Node.js 16 (EOL September 2023) and allow insecure versions, creating known security vulnerabilities. This issue has been thoroughly documented by multiple reviewers with detailed remediation steps.
See prior review comments from coderabbitai and qodo-code-review for complete analysis and fix suggestions.
🧹 Nitpick comments (1)
.github/workflows/phoenix/container.sh (1)
14-14: Clarify the intent of the LD_PRELOAD comment.The comment suggests
LD_PRELOADshould be unset (per PR objectives, to suppress xalt warnings), but nounsetcommand is present.If the intent is to unset the variable, apply this diff:
-# unset LD_PRELOAD +unset LD_PRELOAD || trueIf this is intentionally commented out for debugging, consider adding a note explaining why.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/docker-test.yml(1 hunks).github/workflows/phoenix/container.sh(1 hunks).github/workflows/phoenix/submit.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
.github/workflows/phoenix/submit.sh.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to src/simulation/**/*.{fpp,f90} : Ensure GPU code compiles with Cray ftn, NVIDIA nvfortran, GNU gfortran, and Intel ifx/ifort compilers
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: After each successful build step, run focused tests using `./mfc.sh test -j $(nproc) -f EA8FA07E -t 9E2CA336` instead of running all tests
Applied to files:
.github/workflows/phoenix/container.sh
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
.github/workflows/phoenix/container.sh
🪛 actionlint (1.7.9)
.github/workflows/docker-test.yml
18-18: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
22-22: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
26-26: label "gt" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
🪛 Shellcheck (0.11.0)
.github/workflows/phoenix/container.sh
[warning] 3-3: job_interface is referenced but not assigned.
(SC2154)
[warning] 4-4: job_device is referenced but not assigned.
(SC2154)
[warning] 18-18: job_slug is referenced but not assigned.
(SC2154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Georgia Tech | Phoenix (gpu-acc)
🔇 Additional comments (1)
.github/workflows/phoenix/submit.sh (1)
60-60: No action needed. The commented-outmfc.sh loadstep is intentional for container-based execution.Since submit.sh submits container.sh, which runs all work inside an Apptainer container, the host environment modules loaded by
mfc.sh loadare unnecessary. The container is self-contained with all dependencies pre-installed and pre-configured in the Dockerfile.Likely an incorrect or invalid review comment.
|
It took <50 min ea. which is perfect for the intended purpose. |
User description
User description
User description
Description
Weekly tests on the latest gpu/cpu containers on Phoenix.
PR Type
Tests
Description
Add weekly Docker container testing workflow on Phoenix cluster
Test GPU and CPU containers with OpenMP and ACC interfaces
Execute MFC test suite with GPU-aware device configuration
Run tests on Friday midnight UTC schedule with manual trigger option
Diagram Walkthrough
File Walkthrough
docker-test.yml
Weekly Docker container testing workflow definition.github/workflows/docker-test.yml
UTC
labels
container.sh
Container build and MFC test execution script.github/workflows/phoenix/container.sh
and interface
execution
enumeration
CodeAnt-AI Description
Add weekly self-hosted container tests on the Phoenix cluster
What Changed
Impact
✅ Earlier detection of container regressions✅ Fewer late CI failures caused by missing GPU/CPU checks✅ Clearer per-job test logs for GPU and CPU container runs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.
PR Type
Other
Description
Removed multiple GitHub Actions workflows (bench, cleanliness, coverage, deploy-tap, docker, docs, formatting, homebrew, line-count, lint-source, lint-toolchain, pmd, spelling, test)
Added new
docker-test.ymlworkflow for weekly container testing on Phoenix clusterCreated
container.shscript for building and testing Apptainer containers with GPU/CPU supportRemoved Frontier cluster test scripts (build.sh, bench.sh, submit.sh, test.sh)
Modified Phoenix
submit.shto comment out mfc.sh load commandDiagram Walkthrough
File Walkthrough
2 files
New weekly Docker container testing workflowContainer build and MFC test execution script1 files
Comment out mfc.sh load command19 files
Removed Frontier benchmark scriptRemoved Frontier build scriptRemoved Frontier benchmark submission scriptRemoved Frontier submission scriptRemoved Frontier test scriptRemoved benchmark workflowRemoved code cleanliness check workflowRemoved coverage check workflowRemoved Homebrew tap deployment workflowRemoved containerization workflowRemoved documentation build workflowRemoved code formatting workflowRemoved Homebrew formula test workflowRemoved line count workflowRemoved source linting workflowRemoved toolchain linting workflowRemoved PMD source DRYness workflowRemoved spell check workflowRemoved main test suite workflow