• Yangyang Li's avatar
    RDMA/hns: Fix an cmd queue issue when resetting · 3ec5f54f
    Yangyang Li authored
    If a IMP reset caused by some hardware errors and hns RoCE driver reset
    occurred at the same time, there is a possiblity that the IMP will stop
    dealing with command and users can't use the hardware. The logs are as
    follows:
    
     hns3 0000:fd:00.1: cleaned 0, need to clean 1
     hns3 0000:fd:00.1: firmware version query failed -11
     hns3 0000:fd:00.1: Cmd queue init failed
     hns3 0000:fd:00.1: Upgrade reset level
     hns3 0000:fd:00.1: global reset interrupt
    
    The hns NIC driver divides the reset process into 3 status:
    initialization, hardware resetting and softwaring restting. RoCE driver
    gets reset status by interfaces provided by NIC driver and commands will
    not be sent to the IMP if the driver is in any above status. The main
    reason for this issue is that there is a time gap between status 1 and 2,
    if the RoCE driver sends commands to the IMP during this gap, the IMP will
    stop working because it is not ready.
    
    To eliminate the time gap, the hns NIC driver has added a new interface in
    commit a4de0228 ("net: hns3: provide .get_cmdq_stat interface for the
    client"), so RoCE driver can ensure that no commands will be sent during
    resetting.
    
    Link: https://lore.kernel.org/r/1592314778-52822-1-git-send-email-liweihang@huawei.comSigned-off-by: default avatarYangyang Li <liyangyang20@huawei.com>
    Signed-off-by: default avatarWeihang Li <liweihang@huawei.com>
    Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    3ec5f54f
hns_roce_hw_v2.c 189 KB