• Webb Scales's avatar
    hpsa: rework controller command submission · 25163bd5
    Webb Scales authored
    Allow driver initiated commands to have a timeout.  It does not
    yet try to do anything with timeouts on such commands.
    
    We are sending a reset in order to get rid of a command we want to abort.
    If we make it return on the same reply queue as the command we want to abort,
    the completion of the aborted command will not race with the completion of
    the reset command.
    
    Rename hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd(), since
    this function is the interface for issuing commands to the controller and
    not the "core" of that implementation.  Add a parameter to it which allows
    the caller to specify the reply queue to be used.  Modify existing callers
    to specify the default reply queue.
    
    Rename __hpsa_scsi_do_simple_cmd_core() to hpsa_scsi_do_simple_cmd_core(),
    since this routine is the "core" implementation of the "do simple command"
    function and there is no longer any other function with a similar name.
    Modify the existing callers of this routine (other than
    hpsa_scsi_do_simple_cmd()) to instead call hpsa_scsi_do_simple_cmd(), since
    it will now accept the reply_queue paramenter, and it provides a controller
    lock-up check.  (Also, tweak two related message strings to make them
    distinct from each other.)
    
    Submitting a command to a locked up controller always results in a timeout,
    so check for controller lock-up before submitting.
    
    This is to enable fixing a race between command completions and
    abort completions on different reply queues in a subsequent patch.
    We want to be able to specify which reply queue an abort completion
    should occur on so that it cannot race the completion of the command
    it is trying to abort.
    
    The following race was possible in theory:
    
      1. Abort command is sent to hardware.
      2. Command to be aborted simultaneously completes on another
         reply queue.
      3. Hardware receives abort command, decides command has already
         completed and indicates this to the driver via another different
         reply queue.
      4. driver processes abort completion finds that the hardware does not know
         about the command, concludes that therefore the command cannot complete,
         returns SUCCESS indicating to the mid-layer that the scsi_cmnd may be
         re-used.
      5. Command from step 2 is processed and completed back to scsi mid
         layer (after we already promised that would never happen.)
    
    Fix by forcing aborts to complete on the same reply queue as the command
    they are aborting.
    
    Piggybacking device rescanning functionality onto the lockup
    detection thread is not a good idea because if the controller
    locks up during device rescanning, then the thread could get
    stuck, then the lockup isn't detected.  Use separate work
    queues for device rescanning and lockup detection.
    
    Detect controller lockup in abort handler.
    
    After a lockup is detected, return DO_NO_CONNECT which results in immediate
    termination of commands rather than DID_ERR which results in retries.
    
    Modify detect_controller_lockup() to return the result, to remove the need for
    a separate check.
    Reviewed-by: default avatarScott Teel <scott.teel@pmcs.com>
    Reviewed-by: default avatarKevin Barnett <kevin.barnett@pmcs.com>
    Signed-off-by: default avatarWebb Scales <webbnh@hp.com>
    Signed-off-by: default avatarDon Brace <don.brace@pmcs.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
    25163bd5
hpsa_cmd.h 26.1 KB