block,btrfs: fix frozen-superblock strand on device add/remove/replace#967
Open
blktests-ci[bot] wants to merge 6 commits into
Open
block,btrfs: fix frozen-superblock strand on device add/remove/replace#967blktests-ci[bot] wants to merge 6 commits into
blktests-ci[bot] wants to merge 6 commits into
Conversation
Author
|
Upstream branch: 062871f |
btrfs_dev_replace_start() opens the replacement target with btrfs_init_dev_replace_tgtdev(), which adds it to the device list and opens its block device. Every error path after that point reaches the 'leave' label to tear the target back down with btrfs_destroy_dev_replace_tgtdev() - except the mark_block_group_to_copy() failure, which returns directly. The target is then leaked: it stays on the device list with its block device held until the filesystem is unmounted. Goto leave like the other post-open error paths so the target is destroyed. Fixes: 78ce9fc ("btrfs: zoned: mark block groups to copy for device-replace") Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Add bdev_deny_freeze() and bdev_allow_freeze(), modeled on deny_write_access()/allow_write_access(). bd_fsfreeze_count becomes a signed counter: > 0 counts active freezes, < 0 counts deniers, and the two regimes are mutually exclusive. bdev_freeze() refuses with -EBUSY while a deny is held, and bdev_deny_freeze() refuses while the device is frozen. A filesystem that mutates a device's membership (a btrfs device add, remove or replace) denies freezing on the device for the duration, so a claim a freeze walk might act on is never added or torn down behind the freezer's back. The deny/allow helpers are a single atomic on bd_fsfreeze_count and take no lock, so they can be called while holding s_umount without inverting against bdev_freeze()'s bd_fsfreeze_mutex -> s_umount order. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
bdev_fput() yields the holder claim and then closes the file, which is a deferred operation. Split the yield half into bdev_yield_claim() so a caller can give up the holder while the file - and therefore the block device - is still open, act on the device, and only then bdev_fput(). A filesystem that made a device unfreezable for a membership change with bdev_deny_freeze() undoes the deny on release with bdev_yield_claim(bdev_file); bdev_allow_freeze(file_bdev(bdev_file)); bdev_fput(bdev_file); Re-allowing only after the holder is yielded avoids stranding the filesystem on a racing freeze, and doing it while the file is still open avoids touching the block device after bdev_fput(). bdev_fput() yields again, which is a no-op once the claim has already been given up. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
btrfs_rm_device() runs under mnt_want_write_file(), but the claim on the removed device is released by the ioctl after mnt_drop_write_file(), so a bdev_freeze() racing that window could freeze the filesystem through the device just as its claim is torn down, leaving nothing for bdev_thaw() to rebalance. The window cannot be closed by reordering the teardown. btrfs_rm_device() hands the final bdev_fput() back to the ioctl, run only after mnt_drop_write_file(), because bdev_release() takes the disk ->open_mutex and its dependency chain, which must not nest under the superblock's freeze/write protection -- freeze_super() drops s_umount before draining writers precisely to keep sb_start_write ordered above s_umount. Holding mnt_want_write across bdev_fput() would reintroduce that inversion, so the holder teardown is forced outside the write-protected section. A freeze landing in the resulting gap resolves the still-live holder, rides in, and strands when the claim is released; no ordering of the close against the drop removes the gap. The device itself therefore has to refuse freezing for the whole removal. Deny freezing the device for the duration of the removal: bdev_deny_freeze() at the start of btrfs_rm_device() (it cannot be frozen yet, the ioctl holds the write count), and release it through btrfs_release_device_allow_freeze() in the ioctls on success, or bdev_allow_freeze() on the error paths that keep the device a member. A device frozen before the removal begins is refused with -EBUSY. btrfs_release_device_allow_freeze() yields the holder, re-allows freezing, then closes the device, so the re-allow neither strands the filesystem on a racing freeze nor touches the block device after the final fput. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Author
|
Upstream branch: 062871f |
btrfs_init_new_device() opens and claims the new device on a live superblock without holding the write count, so a bdev_freeze() racing the window between the claim being published and the device becoming a member could freeze the filesystem through a claim the add may still abort and tear down. Add btrfs_open_device_deny_freeze(): it opens the device once non-exclusively to take the freeze deny, then claims it by the same dev_t, so the holder is only ever published while the device is already unfreezable. Keep it denied until the add is durable: bdev_allow_freeze() on each success return (the device is now a committed member), btrfs_release_device_allow_freeze() on the error unwind. The deny spans the whole add, including the seeding tail whose late failures still release the device. A device already frozen when the add starts is refused with -EBUSY. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
A device replace opens a target and, on success, frees the source on a live filesystem from btrfs_dev_replace_finishing() - which cannot fail and also runs from a kthread on mount resume. A bdev_freeze() racing the source free or the target swap-in would freeze the filesystem through a claim that is being torn down or replaced, leaving nothing for bdev_thaw() to rebalance. Make both devices unfreezable for the whole replace, with the invariant that a STARTED replace holds one deny on each device and any other state holds none. The target is denied at open (btrfs_open_device_deny_freeze(), undone on btrfs_init_dev_replace_tgtdev()'s error unwind); the source is denied at the start of btrfs_dev_replace_start(), before mark_block_group_to_copy() so every 'leave' unwind sees both denied. The deny tracks the STARTED state and is dropped whenever the replace leaves it: btrfs_dev_replace_finishing() re-allows the target it makes a member and frees the source through btrfs_close_bdev(allow_freeze=true), and its scrub-error path re-allows both as it cancels. Its early failures (before the device swap) keep the replace STARTED and resumable, so both stay denied. Suspending for unmount re-allows both, so they are reopened freezable at the next mount where btrfs_resume_dev_replace_async() re-denies them (staying suspended if a device is frozen right then); a replace cancelled from the suspended state therefore destroys the target without allowing. btrfs_close_bdev() and btrfs_destroy_dev_replace_tgtdev() take an allow_freeze argument to carry this distinction; the unmount path (btrfs_close_one_device()) passes false. On resume, a failed kthread_run() re-allows both devices and goes through the suspend path, resetting the replace to SUSPENDED and finishing the exclusive operation instead of returning straight away. The (re)mount still aborts on that error; routing it through suspend keeps the deny balanced against the unmount teardown and additionally drops BTRFS_EXCLOP_DEV_REPLACE, closing a pre-existing leak that was harmless on the failed mount that frees the fs but would have wedged future exclusive operations after a failed remount-rw. Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
f7d71cb to
fdf18d2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: block,btrfs: fix frozen-superblock strand on device add/remove/replace
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1111715