• James Smart's avatar
    scsi: lpfc: Fix list corruption on the completion queue. · 8931c73b
    James Smart authored
    Enabling list_debug showed the drivers txcmplq was suffering list
    corruption. The systems will eventually crash because the iocb free list
    gets crossed linked with the prings txcmplq.  Most systems will run for a
    while after the corruption, but will eventually crash when a scsi eh reset
    occurs and the txcmplq is attempted to be flushed. The flush gets stuck in
    an endless loop.
    
    The problem is the abort handler does not hold the sli4 ring lock while
    validating the IO so the IO could complete while the driver is still
    preping the abort.  The erroneously generated abort, when it completes, has
    pointers to the original IO that has already completed, and the IO
    manipulation (for the second time) corrupts the list.
    
    Correct by taking the ring lock early in the abort handler so the erroneous
    abort won't be sent if the io has/is completing.
    Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
    Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    8931c73b
lpfc_scsi.c 180 KB