• Mark Haverkamp's avatar
    [SCSI] aacraid: Improved error handling · 03d44337
    Mark Haverkamp authored
    Received from Mark Salyzyn,
    
    This set of fixes improve error handling stability of the driver. A popular
    manifestation of the problems is an NULL pointer reference in the interrupt
    handler when referencing portions of the scsi command context, or in the
    scsi_done handling when an offlined device is referenced.
    
    The aacraid driver currently does not get notification of orphaned command
    completions due to devices going offline. The driver also fails to handle the
    commands that are finished by the error handler, and thus can complete again
    later at the hands of the adapter causing situations of completion of an
    invalid scsi command context. Test Unit Ready calls abort assuming that the
    abort was successful, but are not, and thus when the interrupt from the adapter
    occurs, they reference invalid command contexts. We add in a TIMED_OUT flag to
    inform the aacraid FIB context that the interrupt service should merely release
    the driver resources and not complete the command up. We take advantage of this
    with the abort handler as well for select abortable commands. And we detect and
    react if a command that can not be aborted is currently still outstanding to
    the controller when reissued by the retry mechanism.
    Signed-off-by: default avatarMark Haverkamp <markh@linux-foundation.org>
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@SteelEye.com>
    03d44337
commsup.c 43.5 KB