• Niklas Cassel's avatar
    ata: libata-core: do not issue non-internal commands once EH is pending · e20e81a2
    Niklas Cassel authored
    While the ATA specification states that a device should return command
    aborted for all commands queued after the device has entered error state,
    since ATA only keeps the sense data for the latest command (in non-NCQ
    case), we really don't want to send block layer commands to the device
    after it has entered error state. (Only ATA EH commands should be sent,
    to read the sense data etc.)
    
    Currently, scsi_queue_rq() will check if scsi_host_in_recovery()
    (state is SHOST_RECOVERY), and if so, it will _not_ issue a command via:
    scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
    -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
    
    Before commit e494f6a7 ("[SCSI] improved eh timeout handler"),
    when receiving a TFES error IRQ, the call chain looked like this:
    ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
    ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
    blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
    scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
    
    Which meant that as soon as an error IRQ was serviced, SHOST_RECOVERY
    would be set.
    
    However, after commit e494f6a7 ("[SCSI] improved eh timeout handler"),
    scsi_times_out() will instead call scsi_abort_command() which will queue
    delayed work, and the worker function scmd_eh_abort_handler() will call
    scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
    
    So now, after the TFES error IRQ has been serviced, we need to wait for
    the SCSI workqueue to run its work before SHOST_RECOVERY gets set.
    
    It is worth noting that, even before commit e494f6a7 ("[SCSI] improved
    eh timeout handler"), we could receive an error IRQ from the time when
    scsi_queue_rq() checks scsi_host_in_recovery(), to the time when
    ata_scsi_queuecmd() is actually called.
    
    In order to handle both the delayed setting of SHOST_RECOVERY and the
    window where we can receive an error IRQ, add a check against
    ATA_PFLAG_EH_PENDING (which gets set when servicing the error IRQ),
    inside ata_scsi_queuecmd() itself, while holding the ap->lock.
    (Since the ap->lock is held while servicing IRQs.)
    
    Fixes: e494f6a7 ("[SCSI] improved eh timeout handler")
    Signed-off-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
    Tested-by: default avatarJohn Garry <john.g.garry@oracle.com>
    Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
    e20e81a2
libata-scsi.c 114 KB