• James Bottomley's avatar
    [SCSI] Fix 'Device not ready' issue on mpt2sas · 14216561
    James Bottomley authored
    This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
    
    SAT-2 says (section 8.12.2)
    
            if the device is in the stopped state as the result of
            processing a START STOP UNIT command (see 9.11), then the SATL
            shall terminate the TEST UNIT READY command with CHECK CONDITION
            status with the sense key set to NOT READY and the additional
            sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
            REQUIRED;
    
    mpt2sas internal SATL seems to implement this.  The result is very confusing
    standby behaviour (using hdparm -y).  If you suspend a drive and then send
    another command, usually it wakes up.  However, if the next command is a TEST
    UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
    the SATL rules for this, returning NOT READY to all subsequent commands.  This
    means that the ordering of TEST UNIT READY is crucial: if you send TUR and
    then a command, you get a NOT READY to both back.  If you send a command and
    then a TUR, you get GOOD status because the preceeding command woke the drive.
    
    This bit us badly because
    
    commit 85ef06d1
    Author: Tejun Heo <tj@kernel.org>
    Date:   Fri Jul 1 16:17:47 2011 +0200
    
        block: flush MEDIA_CHANGE from drivers on close(2)
    
    Changed our ordering on TEST UNIT READY commands meaning that SATA drives
    connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
    SATL sees the suspend *before* the drives get awoken by the next ATA command)
    resulting in lots of failed commands.
    
    The standard is completely nuts forcing this inconsistent behaviour, but we
    have to work around it.
    
    The fix for this is twofold:
    
       1. Set the allow_restart flag so we wake the drive when we see it has been
          suspended
    
       2. Return all TEST UNIT READY status directly to the mid layer without any
          further error handling which prevents us causing error handling which
          may offline the device just because of a media check TUR.
    Reported-by: default avatarMatthias Prager <linux@matthiasprager.de>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
    14216561
scsi_error.c 60.1 KB