Skip to content

block: enable RWF_DONTCACHE for block devices#835

Open
blktests-ci[bot] wants to merge 4 commits into
linus-master_basefrom
series/1095033=>linus-master
Open

block: enable RWF_DONTCACHE for block devices#835
blktests-ci[bot] wants to merge 4 commits into
linus-master_basefrom
series/1095033=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci Bot commented May 14, 2026

Pull request for series with
subject: block: enable RWF_DONTCACHE for block devices
version: 6
url: https://patchwork.kernel.org/project/linux-block/list/?series=1095033

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 14, 2026

Upstream branch: aa54b1d
series: https://patchwork.kernel.org/project/linux-block/list/?series=1095033
version: 6

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 15, 2026

Upstream branch: 70eda68
series: https://patchwork.kernel.org/project/linux-block/list/?series=1095033
version: 6

@blktests-ci blktests-ci Bot force-pushed the series/1095033=>linus-master branch from 701593b to 323483e Compare May 15, 2026 07:58
tzussman added 4 commits May 18, 2026 07:28
Some bio completion handlers need to run from preemptible task context,
but bio_endio() may be called from IRQ context (e.g., buffer_head
writeback). Callers need a way to ensure their callback eventually runs
from a sleepable context. Add infrastructure for that, in two forms:

  1. BIO_COMPLETE_IN_TASK, a bio flag the submitter sets when it knows
     in advance that its callback needs task context (e.g., dropbehind
     writeback). bio_endio() sees the flag and offloads completion to a
     worker automatically.

  2. bio_complete_in_task(), a helper that completion callbacks can
     invoke from within bi_end_io() when the deferral decision is
     dynamic (e.g., fserror reporting).

Both share a per-CPU batch list drained by a delayed work item on a
WQ_PERCPU workqueue. Producers push the bio onto the local CPU's batch
and schedule the work item, which then dispatches each bio's bi_end_io()
from task context. The delayed work item uses a 1-jiffie delay to allow
batches of completions to accumulate before processing.

Both methods are gated on bio_in_atomic(), which returns true in any
context where a sleeping bi_end_io() is unsafe, including
non-preemptible task context. This logic is copied from commit
c99fab6 ("erofs: fix atomic context detection when
!CONFIG_DEBUG_LOCK_ALLOC").

Two CPU hotplug callbacks are used to drain remaining bios from the
departing CPU's batch, while maintaining the per-CPU behavior. The
CPUHP_AP_ONLINE_DYN callback disables the per-CPU delayed work while the
CPU is still online, preventing it from running on an unbound worker
later. CPUHP_BP_PREPARE_DYN then drains any bios added between disabling
the work item and CPU offline.

Link: https://lore.kernel.org/all/20260409160243.1008358-1-hch@lst.de/
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Set BIO_COMPLETE_IN_TASK on iomap writeback bios when a dropbehind folio
is added. This ensures that bi_end_io runs in task context, where
folio_end_dropbehind() can safely invalidate folios.

With the bio layer now handling task-context deferral generically,
IOMAP_IOEND_DONTCACHE is no longer needed, as XFS no longer needs to
route DONTCACHE ioends through its completion workqueue. Remove the flag
and its NOMERGE entry.

Without the NOMERGE, regular I/Os that get merged with a dropbehind
folio will also have their completion deferred to task context.

Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add block_write_begin_iocb() which threads the kiocb through to
__filemap_get_folio() so that buffer_head-based I/O can use DONTCACHE
behavior. When the iocb has IOCB_DONTCACHE set, FGP_DONTCACHE is
passed to mark the folio for dropbehind. The existing
block_write_begin() is preserved as a wrapper that passes a NULL iocb.

Set BIO_COMPLETE_IN_TASK in submit_bh_wbc() when the folio has
dropbehind set, so that buffer_head writeback completions get deferred
to task context.

Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Block device buffered reads and writes already pass through
filemap_read() and iomap_file_buffered_write() respectively, both of
which handle IOCB_DONTCACHE. Enable RWF_DONTCACHE for block device files
by setting FOP_DONTCACHE in def_blk_fops.

For CONFIG_BUFFER_HEAD=y paths, use block_write_begin_iocb() in
blkdev_write_begin() to thread the kiocb through so that buffer_head
writeback gets dropbehind support.

CONFIG_BUFFER_HEAD=n paths are handled by the previously added iomap
BIO_COMPLETE_IN_TASK support.

This support is useful for databases that operate on raw block devices,
among other userspace applications.

Signed-off-by: Tal Zussman <tz2294@columbia.edu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci Bot commented May 18, 2026

Upstream branch: 70eda68
series: https://patchwork.kernel.org/project/linux-block/list/?series=1095033
version: 6

@blktests-ci blktests-ci Bot force-pushed the series/1095033=>linus-master branch from 323483e to 0521c96 Compare May 18, 2026 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant