Skip to content

Conversation

@anton-ubi
Copy link
Contributor

@anton-ubi anton-ubi commented Dec 19, 2025

Problem

RQD crashes when launching hyperthreading jobs:

CRITICAL: Not launching, insufficient hyperthreading cores to reserve based on frameCores (5 < 8.0)

Root Cause

The issue was caused by an inconsistency between two core counting mechanisms:

  1. self.cores.idle_cores - Counts logical cores for initial validation
  2. avail_cores_count in reserveHT() - Counts actual physical hyperthreading cores

This inconsistency allowed the initial core availability check to pass (sufficient logical cores), but the hyperthreading reservation would fail (insufficient physical HT cores).

Solution

Fix the validation flow for hyperthreading workloads:

  • For CUE_THREADABLE=1 jobs: Check HT core availability FIRST, before any reservation attempts
  • For regular jobs: Keep existing logical core validation

New Flow:

  • HT jobs → Validate HT cores → Reserve if available → Set CPU_LIST
  • Regular jobs → Validate logical cores → Reserve normally

This prevents the mismatch where logical cores are available but physical HT cores are not.

Benefits

It helps a lot on our end where before this fix, only half of the cores available on a host was actively used. The solution proposed here fixes that issue.

Bonus

Sort tasksets by ascending order. It's easier to read when logged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant