Skip to content

MCTP recoverability chock the mainthread  #14

@drakedog2008

Description

@drakedog2008

When the MCTP layer of the device is working but the NVMe MI layer does not, nvmed will stuck in the MCTP recovery loop and consume most of the CPU slices on the mainthread. Dbus clients will timeout on:

query the unhealthy status for the bad device.
query health status for the other good devices.

root@mvbce5-nfd11:~# time busctl tree xyz.openbmc_project.NVMe
└─ /xyz
  └─ /xyz/openbmc_project
    ├─ /xyz/openbmc_project/inventory
    │ └─ /xyz/openbmc_project/inventory/system
    │   └─ /xyz/openbmc_project/inventory/system/board
    │     ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1
    │     │ ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_1
    │     │ ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2
    │     │ │ ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2/controllers
    │     │ │ │ ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2/controllers/0
    │     │ │ │ ├─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2/controllers/1
    │     │ │ │ └─ /xyz/openbmc_project/inventory/system/board/Chassis/Chassis_1_NVMe_2/controllers/2
    │     │ │ └─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2/volumes
    │     │ │   └─ /xyz/openbmc_project/inventory/system/board/Chassis_1/Chassis_1_NVMe_2/volumes/1
 ...
    └─ /xyz/openbmc_project/sensors
      └─ /xyz/openbmc_project/sensors/temperature
        ├─ /xyz/openbmc_project/sensors/temperature/NVMe_2_NVMe_1
...
real	0m19.398s
user	0m0.080s
sys	0m0.021s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions