• Yishai Hadas's avatar
    net/mlx4_core: Activate reset flow upon fatal command cases · f5aef5aa
    Yishai Hadas authored
    We activate reset flow upon command fatal errors, when the device enters an
    erroneous state, and must be reset.
    
    The cases below are assumed to be fatal: FW command timed-out, an error from FW
    on closing commands, pci is offline when posting/pending a command.
    
    In those cases we place the device into an error state: chip is reset, pending
    commands are awakened and completed immediately. Subsequent commands will
    return immediately.
    
    The return code in the above cases will depend on the command. Commands which
    free and close resources will return success (because the chip was reset, so
    callers may safely free their kernel resources). Other commands will return -EIO.
    
    Since the device's state was marked as error, the catas poller will
    detect this and restart the device's software stack (as is done when a FW
    internal error is directly detected). The device state is protected by a
    persistent mutex lives on its mlx4_dev, as such no need any more for the
    hcr_mutex which is removed.
    Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
    Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    f5aef5aa
mcg.c 41.4 KB