ci(cuda): limit parallel jobs to avoid OOM during CUDA build by doringeman · Pull Request #840 · docker/model-runner

doringeman · 2026-04-06T11:29:14Z

No description provided.

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

sourcery-ai

Hey - I've left some high level feedback:

Using -j$(nproc --ignore=2) assumes the base image’s nproc supports --ignore; consider either verifying this for all target environments or using a more portable pattern (e.g., a fixed max or a small shell calculation) to avoid build-time failures.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Using `-j$(nproc --ignore=2)` assumes the base image’s `nproc` supports `--ignore`; consider either verifying this for all target environments or using a more portable pattern (e.g., a fixed max or a small shell calculation) to avoid build-time failures.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request modifies the llamacpp/native/cuda.Dockerfile to parallelize the build process using nproc. Feedback indicates that using nproc inside a container can lead to Out-Of-Memory (OOM) issues on high-core hosts because it reflects host resources rather than container limits. It is recommended to use a fixed, conservative job limit to ensure stability across different environments.

gemini-code-assist · 2026-04-06T11:30:28Z

llamacpp/native/cuda.Dockerfile

    -S ." > cmake-flags
 RUN cmake $(cat cmake-flags)
-RUN cmake --build build --config Release
+RUN cmake --build build --config Release -j$(nproc --ignore=2)


Using nproc inside a Docker container typically reports the host's CPU count rather than the container's allocated resources or available memory. On high-core hosts, nproc --ignore=2 will still spawn a large number of parallel jobs, likely leading to the exact OOM issues this PR aims to avoid. To ensure build stability across various CI environments, it is safer to use a fixed, conservative limit (e.g., -j 4), consistent with the approach taken in llamacpp/native/generic.Dockerfile.

RUN cmake --build build --config Release -j 4

References

Pragmatism: The solution should match the complexity of the problem and avoid failure modes like OOM on large hosts. A fixed limit is more robust in containerized environments. ^(link)

ci(cuda): limit parallel jobs to avoid OOM during CUDA build

3f8b47a

Signed-off-by: Dorin Geman <dorin.geman@docker.com>

doringeman closed this Apr 6, 2026

sourcery-ai bot reviewed Apr 6, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(cuda): limit parallel jobs to avoid OOM during CUDA build#840

ci(cuda): limit parallel jobs to avoid OOM during CUDA build#840
doringeman wants to merge 1 commit intodocker:mainfrom
doringeman:llamacpp-cuda-build

doringeman commented Apr 6, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

doringeman commented Apr 6, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant