ci(cuda): limit parallel jobs to avoid OOM during CUDA build#841
ci(cuda): limit parallel jobs to avoid OOM during CUDA build#841doringeman wants to merge 1 commit intomainfrom
Conversation
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Since
cmake --buildalready parallelizes when using Ninja, consider usingCMAKE_BUILD_PARALLEL_LEVEL(or Ninja’s-jviaCMAKE_BUILD_TOOL) to limit jobs instead of adding-jhere, which can be less portable across generators. - The
$(nproc --ignore=2)expression assumes GNU coreutils; if these Docker builds may run on non-GNU environments, you might want to guard this or use a simpler$(nproc)with a fixed upper bound to avoid portability issues.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Since `cmake --build` already parallelizes when using Ninja, consider using `CMAKE_BUILD_PARALLEL_LEVEL` (or Ninja’s `-j` via `CMAKE_BUILD_TOOL`) to limit jobs instead of adding `-j` here, which can be less portable across generators.
- The `$(nproc --ignore=2)` expression assumes GNU coreutils; if these Docker builds may run on non-GNU environments, you might want to guard this or use a simpler `$(nproc)` with a fixed upper bound to avoid portability issues.
## Individual Comments
### Comment 1
<location path="llamacpp/native/cuda.Dockerfile" line_range="40" />
<code_context>
-S ." > cmake-flags
RUN cmake $(cat cmake-flags)
-RUN cmake --build build --config Release
+RUN cmake --build build --config Release -j$(nproc --ignore=2)
RUN cmake --install build --config Release --prefix install
</code_context>
<issue_to_address>
**issue:** Guard against `nproc --ignore=2` returning 0 parallel jobs on small-core machines.
On 1–2 core machines, `nproc --ignore=2` can return 0, yielding `-j0`, which some build tools treat as unlimited parallelism or otherwise special-case. To avoid this, ensure a minimum of 1 job, e.g.:
```dockerfile
RUN cmake --build build --config Release -j"$(nproc --ignore=2 || echo 1)"
```
or use a small shell expression to compute `max(1, nproc-2)` explicitly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| -S ." > cmake-flags | ||
| RUN cmake $(cat cmake-flags) | ||
| RUN cmake --build build --config Release | ||
| RUN cmake --build build --config Release -j$(nproc --ignore=2) |
There was a problem hiding this comment.
issue: Guard against nproc --ignore=2 returning 0 parallel jobs on small-core machines.
On 1–2 core machines, nproc --ignore=2 can return 0, yielding -j0, which some build tools treat as unlimited parallelism or otherwise special-case. To avoid this, ensure a minimum of 1 job, e.g.:
RUN cmake --build build --config Release -j"$(nproc --ignore=2 || echo 1)"or use a small shell expression to compute max(1, nproc-2) explicitly.
There was a problem hiding this comment.
Code Review
This pull request updates the cuda.Dockerfile to parallelize the CMake build process by utilizing available CPU cores. While this improves build performance, a critical issue was identified regarding the use of nproc --ignore=2, which could return a non-positive value on environments with two or fewer cores and cause the build to fail. A suggestion was provided to ensure at least one build job is always specified using a shell arithmetic check.
| -S ." > cmake-flags | ||
| RUN cmake $(cat cmake-flags) | ||
| RUN cmake --build build --config Release | ||
| RUN cmake --build build --config Release -j$(nproc --ignore=2) |
There was a problem hiding this comment.
Using nproc --ignore=2 can result in a negative or zero value if the build environment has 2 or fewer cores, causing the build command to fail. It is safer to use a shell arithmetic expansion to ensure a minimum of 1 job.
RUN cmake --build build --config Release -j$(nproc --ignore=2 | awk '{print ($1 < 1 ? 1 : $1)}')
No description provided.