• Jesse Brandeburg's avatar
    ice: reset first in crash dump kernels · 0288c3e7
    Jesse Brandeburg authored
    When the system boots into the crash dump kernel after a panic, the ice
    networking device may still have pending transactions that can cause errors
    or machine checks when the device is re-enabled. This can prevent the crash
    dump kernel from loading the driver or collecting the crash data.
    
    To avoid this issue, perform a function level reset (FLR) on the ice device
    via PCIe config space before enabling it on the crash kernel. This will
    clear any outstanding transactions and stop all queues and interrupts.
    Restore the config space after the FLR, otherwise it was found in testing
    that the driver wouldn't load successfully.
    
    The following sequence causes the original issue:
    - Load the ice driver with modprobe ice
    - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
    - Trigger a crash with echo c > /proc/sysrq-trigger
    - Load the ice driver again (or let it load automatically) with modprobe ice
    - The system crashes again during pcim_enable_device()
    
    Fixes: 837f08fd ("ice: Add basic driver framework for Intel(R) E800 Series")
    Reported-by: default avatarVishal Agrawal <vagrawal@redhat.com>
    Reviewed-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
    Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
    Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
    Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
    Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    0288c3e7
ice_main.c 246 KB