Skip to content

blk: honor isolcpus configuration#871

Open
blktests-ci[bot] wants to merge 8 commits into
linus-master_basefrom
series/1093842=>linus-master
Open

blk: honor isolcpus configuration#871
blktests-ci[bot] wants to merge 8 commits into
linus-master_basefrom
series/1093842=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented May 22, 2026

Pull request for series with
subject: blk: honor isolcpus configuration
version: 15
url: https://patchwork.kernel.org/project/linux-block/list/?series=1099060

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 22, 2026

Upstream branch: 6779b50
series: https://patchwork.kernel.org/project/linux-block/list/?series=1099060
version: 15

igaw and others added 7 commits May 23, 2026 06:24
The calculation of the upper limit for queues does not depend solely on
the number of online CPUs; for example, the isolcpus kernel
command-line option must also be considered.

To account for this, the block layer provides a helper function to
retrieve the maximum number of queues. Use it to set an appropriate
upper queue number limit.

This patch brings aacraid in line with the API migration initiated for
other SCSI drivers in commit 94970cf ("scsi: use block layer
helpers to calculate num of queues").

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin: Drop "Fixes:" tag; indicate alignment with other SCSI drivers]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
The core scheduler recently transitioned to compiling SMP data
structures unconditionally to reduce code complexity - see commit
cac5cef ("sched/smp: Make SMP unconditional").

In alignment with this philosophy of reducing dual-path maintenance,
this patch removes the #ifdef CONFIG_SMP guards and the dedicated !SMP
fallback logic here.

While the !SMP path provided a slightly simpler execution flow for
uniprocessor kernels (avoiding SMP-specific overhead), maintaining these
separate code paths adds unnecessary complexity and testing burden.
Removing these guards simplifies the codebase by standardizing entirely
on the SMP logic, which safely resolves to single-CPU operations on UP
configurations.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin: Updated commit message to clarify !SMP removal context]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
This commit introduces group_mask_cpus_evenly(), which allows callers to
distribute a specific CPU mask evenly across groups. It serves as a bounded
version of group_cpus_evenly().

While group_cpus_evenly() operates on the global cpu_possible_mask,
group_mask_cpus_evenly() confines the distribution strictly within the
boundaries of the caller-provided mask. It preserves the kernel's native
two-stage spreading logic-first prioritising CPUs that are physically
present (cpu_present_mask) to prevent I/O starvation, and then distributing
any remaining vectors to non-present CPUs to maintain hotplug safety.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin:
    - Added check for numgrps == 0
    - Updated commit message to resolve typo
    - Removed unused <linux/sched/isolation.h>
    - Fix TOCTOU race by caching the provided mask
    - Removed ineffective data_race() annotations around cpumask pointers
    - Implemented two-stage grouping logic to prioritise physically
      present CPUs, mirroring group_cpus_evenly()
    - Fix division-by-zero bug by ensuring group_mask_cpus_evenly()
      returns NULL instead of an empty array when evaluated against an
      empty mask]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Multiqueue drivers spread I/O queues across all CPUs for optimal
performance. However, these drivers are not aware of CPU isolation
requirements and will distribute queues without considering the isolcpus
configuration.

Introduce a new isolcpus mask that allows users to define which CPUs
should have I/O queues assigned. This is similar to managed_irq, but
intended for drivers that do not use the managed IRQ infrastructure

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Extend the capabilities of the generic CPU to hardware queue (hctx)
mapping code, so it maps houskeeping CPUs and isolated CPUs to the
hardware queues evenly.

A hctx is only operational when there is at least one online
housekeeping CPU assigned (aka active_hctx). Thus, check the final
mapping that there is no hctx which has only offline housekeeing CPU and
online isolated CPUs.

Example mapping result:

  16 online CPUs

  isolcpus=io_queue,2-3,6-7,12-13

Queue mapping:
        hctx0: default 0 2
        hctx1: default 1 3
        hctx2: default 4 6
        hctx3: default 5 7
        hctx4: default 8 12
        hctx5: default 9 13
        hctx6: default 10
        hctx7: default 11
        hctx8: default 14
        hctx9: default 15

IRQ mapping:
        irq 42 affinity 0 effective 0  nvme0q0
        irq 43 affinity 0 effective 0  nvme0q1
        irq 44 affinity 1 effective 1  nvme0q2
        irq 45 affinity 4 effective 4  nvme0q3
        irq 46 affinity 5 effective 5  nvme0q4
        irq 47 affinity 8 effective 8  nvme0q5
        irq 48 affinity 9 effective 9  nvme0q6
        irq 49 affinity 10 effective 10  nvme0q7
        irq 50 affinity 11 effective 11  nvme0q8
        irq 51 affinity 14 effective 14  nvme0q9
        irq 52 affinity 15 effective 15  nvme0q10

A corner case is when the number of online CPUs and present CPUs
differ and the driver asks for less queues than online CPUs, e.g.

  8 online CPUs, 16 possible CPUs

  isolcpus=io_queue,2-3,6-7,12-13
  virtio_blk.num_request_queues=2

Queue mapping:
        hctx0: default 0 1 2 3 4 5 6 7 8 12 13
        hctx1: default 9 10 11 14 15

IRQ mapping
        irq 27 affinity 0 effective 0 virtio0-config
        irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
        irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1

Noteworthy is that for the normal/default configuration (!isoclpus) the
mapping will change for systems which have non hyperthreading CPUs. The
main assignment loop will completely rely that group_mask_cpus_evenly to
do the right thing. The old code would distribute the CPUs linearly over
the hardware context:

queue mapping for /dev/nvme0n1
        hctx0: default 0 8
        hctx1: default 1 9
        hctx2: default 2 10
        hctx3: default 3 11
        hctx4: default 4 12
        hctx5: default 5 13
        hctx6: default 6 14
        hctx7: default 7 15

The assign each hardware context the map generated by the
group_mask_cpus_evenly function:

queue mapping for /dev/nvme0n1
        hctx0: default 0 1
        hctx1: default 2 3
        hctx2: default 4 5
        hctx3: default 6 7
        hctx4: default 8 9
        hctx5: default 10 11
        hctx6: default 12 13
        hctx7: default 14 15

In case of hyperthreading CPUs, the resulting map stays the same.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
[atomlin:
    - Updated blk_mq_validate() to use test_bit() for the new bitmap
    - Replaced __free cleanups with traditional goto unwinding to align
      with subsystem styling
    - Updated blk_mq_map_fallback() to use qmap->queue_offset ensuring
      secondary maps do not incorrectly route to the primary default map
    - Added a bitmap_empty() check to prevent out-of-bounds CPU routing
      when all mapped CPUs are offline
    - Migrated active_hctx to a dynamically sized bitmap to fix an
      out-of-bounds write when hardware queues exceed the system CPU
      count
    - Fixed absolute vs. relative hardware queue index mix-up in
      blk_mq_map_queues() and validation checks
    - Fixed typographical errors
    - Reduced stack frame size of blk_mq_num_queues()
    - Resolved a TOCTOU race against CPU hotplug events by snapshotting
      cpu_online_mask to ensure mapping and validation phases agree
    - Corrected a loop overwrite bug in blk_mq_map_queues() by iterating
      directly over masks to prevent orphaned queues from being activated
    - Restored topology-aware multi-queue fallback in
      blk_mq_map_hw_queues() by correctly routing missing IRQ affinity
      masks to the map_software path instead of the naive fallback
    - Fixed a silent validation bypass in blk_mq_map_hw_queues() caused by
      overlapping IRQ affinity masks by evaluating the active_hctx bitmap
      in a secondary pass
    - Hardened isolation logic in blk_mq_map_hw_queues() to require online
      housekeeping CPUs before marking a hardware queue as active
    - Enforce safe fallback of 1 when the intersection evaluates to 0]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
When isolcpus=io_queue is enabled and the last housekeeping CPU
for a given hctx goes offline, no CPU would be left to handle I/O.
To prevent I/O stalls, disallow offlining housekeeping CPUs that are
still serving isolated CPUs.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin:
    - Removed duplicate paragraph from commit message
    - Allow offlining of non-housekeeping CPUs
    - Fix logic flaw that prematurely rejected valid offline requests
    - Iterated over cpu_online_mask and manually reverse-mapped CPUs to
      correctly detect isolated CPUs, as blk_mq_map_swqueue()
      intentionally prunes them from hctx->cpumask
    - Drop hctx->queue->disk->disk_name from warning to avoid UAF bug
    - Ensure isolation constraints are only enforced for CPUs actively
      mapped to the evaluated hardware queue
    - Correct pr_warn format specifier]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.

Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
By passing the HK_TYPE_IO_QUEUE mask directly to the topological
distribution function (group_mask_cpus_evenly()), we ensure that managed
interrupts are kept strictly off isolated CPUs.

This patch additionally addresses the architectural constraints of
restricted vector distribution:

    1.  Vector Limits and Overrides: Updated irq_calc_affinity_vectors()
        to strictly bound the maximum number of allocated vectors to the
        weight of the housekeeping mask. This correctly overrides
        drivers providing a calc_sets() callback, preventing them from
        wasting memory on dead hardware queues that cannot be routed to
        isolated CPUs.

    2.  Multi-set Alignment and Leak Prevention: When isolation
        constraints result in fewer available masks than requested
        vectors for a given set, the remaining vector slots are padded
        with the housekeeping mask. This replaces the historical
        irq_default_affinity padding, ensuring excess managed queues do
        not leak interrupts onto isolated CPUs.

    3.  Minimum Vector Safety Net: To prevent fatal -ENOSPC device probe
        aborts on heavily isolated systems (where the housekeeping CPU
        count might be lower than a device's structural minimum), the
        final vector calculation is safeguarded to never drop below
        minvec. Queues will safely share the available housekeeping CPUs
        instead of failing the probe.

    4.  Zero Overhead: The housekeeping mask is conditionally assigned
        via a direct pointer, completely avoiding temporary mask
        allocations (e.g., alloc_cpumask_var) and bitwise operations
        when CPU isolation is disabled. This guarantees zero performance
        or memory overhead for standard configurations.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 23, 2026

Upstream branch: 79bd2dd
series: https://patchwork.kernel.org/project/linux-block/list/?series=1099060
version: 15

The io_queue flag informs multiqueue device drivers where to place
hardware queues. Document this new flag in the isolcpus
command-line argument description.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[atomlin:
    - Refined io_queue kernel parameter documentation
    - Removed an inaccurate claim in the documentation stating
      that io_queue takes precedence over managed_irq]
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
@blktests-ci blktests-ci Bot force-pushed the series/1093842=>linus-master branch from fb8a974 to 5c030f4 Compare May 23, 2026 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants