nvme: check MUD support before firmware commit#3403
Open
guzebing1612-dev wants to merge 1 commit into
Open
Conversation
Firmware Commit returns the Multiple Update Detected value in the command
completion. nvme-cli currently identifies the controller after the
command completes to decide whether to print that value.
That post-command Identify is fragile for immediate activation. A possible
Linux race looks like:
nvme-cli thread nvme driver AEN work
--------------- --------------------
libnvme_exec_admin_passthru()
-> Firmware Commit succeeds
nvme_handle_aen_notice()
-> FW_ACT_STARTING
-> nvme_change_ctrl_state(RESETTING)
-> nvme_fw_act_work()
-> nvme_quiesce_io_queues()
-> wait for activation
fw_commit_print_mud()
-> fw_commit_support_mud()
-> nvme_identify_ctrl()
-> admin ioctl passthru
-> nvme_user_cmd*()
-> blk_mq_alloc_request()
-> __nvme_check_ready()
rejects user admin command while resetting
nvme-cli then prints "identify-ctrl: ..." after the successful
fw-commit output. The extra error makes it unclear whether Firmware
Commit itself failed, even though the command completion was already
successful.
The post-command Identify can also observe the wrong capability. MUD is
a bit in the Firmware Commit completion, so it should be interpreted
using the controller capability that applied when that command was
processed. If the old firmware does not report support for firmware
image overlap but the new firmware does, a post-activation Identify can
make nvme-cli interpret the old command completion using the new
firmware capability.
Read and cache SMUD before submitting Firmware Commit, and use the
cached value to decide whether to print the completion MUD value. This
ties MUD reporting to the firmware revision that accepted the command
and avoids a post-success Identify during firmware activation.
Fixes: b447292 ("nvme: Check fw-commit command support MUD")
Reviewed-by: Tokunori Ikegami <ikegami.t@gmail.com>
Signed-off-by: Guzebing <guzebing@bytedance.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR follows up on #3396 and the mailing-list patch:
https://lore.kernel.org/linux-nvme/20260519134542.1841435-1-guzebing1612@gmail.com/
The current code checks
FRMW.SMUDafter Firmware Commit succeeds, which may issue an Identify Controller command after firmware activation has started.This patch caches
FRMW.SMUDbefore submitting Firmware Commit and uses the cached value when printing the completion MUD value. This avoids issuing Identify Controller after activation has started, and makes MUD handling use the pre-activation controller capability.Closes #3396
Tested: