[RQD][FIX] Fix hyperthreading cores reservation #2124
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
RQD crashes when launching hyperthreading jobs:
Root Cause
The issue was caused by an inconsistency between two core counting mechanisms:
self.cores.idle_cores- Counts logical cores for initial validationavail_cores_countinreserveHT()- Counts actual physical hyperthreading coresThis inconsistency allowed the initial core availability check to pass (sufficient logical cores), but the hyperthreading reservation would fail (insufficient physical HT cores).
Solution
Fix the validation flow for hyperthreading workloads:
CUE_THREADABLE=1jobs: Check HT core availability FIRST, before any reservation attemptsNew Flow:
This prevents the mismatch where logical cores are available but physical HT cores are not.
Benefits
It helps a lot on our end where before this fix, only half of the cores available on a host was actively used. The solution proposed here fixes that issue.
Bonus
Sort tasksets by ascending order. It's easier to read when logged.