Skip to content

Conversation

@LalatenduMohanty
Copy link
Member

Calculate optimal max_workers by analyzing the dependency graph to find the maximum number of packages that can be built in parallel.

Uses the smaller of CPU-based default and graph-based maximum to avoid allocating idle worker threads.

closes #880

Calculate optimal max_workers by analyzing the dependency graph to find
the maximum number of packages that can be built in parallel.

Uses the smaller of CPU-based default and graph-based maximum to avoid allocating
idle worker threads.

closes python-wheel-build#880

Co-Authored-By: Claude <claude@anthropic.com>

Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>
@LalatenduMohanty LalatenduMohanty requested a review from a team as a code owner December 12, 2025 19:06
@mergify mergify bot added the ci label Dec 12, 2025
cpu_default = min(32, (os.cpu_count() or 1) + 4)
optimal_workers = min(cpu_default, max_parallelism)
logger.info(
"graph allows max %i parallel builds, using %i workers (cpu default: %i)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I would understand this log message if I was reading it without having seen the code. Could you log each value separately with a short description before going through this if statement, then log the actual selected worker pool size being returned? Something like the messages below, for example.

You could add messages like "batch size from graph exceeds CPU count" or whatever to help with debugging, too, but just having each number separately would make it easier to understand.

"batch size from graph: %d"
"CPU count: %d"
"minimum pool size: %d" (from cpu_default)
"user requested pool size: %d"
"optimal worker pool size: %d"

Analyzes the dependency graph to determine the maximum number of packages
that can be built in parallel at any point. Uses this to optimize the
number of worker threads, avoiding wasteful allocation of idle workers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure I understand this change. I think you're saying the batch size from the graph should be factored in because there's no point in setting up a worker pool larger than the number of jobs we would try to run at one time. Is that it?

@tiran had an idea at one point to continuously add tasks to the pool as packages finished their build. I don't know if we ever implemented that. If we did that, we might end up wanting to build more than the maximum batch size because finishing 1 item in a batch might let us build another set of items that would be considered to be in another batch based on this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're saying the batch size from the graph should be factored in because there's no point in setting up a worker pool larger than the number of jobs we would try to run at one time. Is that it?

Yes, if we can only run 2 parallel builds (from the way graph is ) but we are setting 6 workers , it is not useful right. It is not a critical issue but a code improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Python's worker pool implementation actually create all of the workers based on the size passed in, or does it just limit the number of workers to the size given?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

build-parallel: Optimize max_workers based on dependency graph parallelism

2 participants