• Steffen Maier's avatar
    scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices · 242ec145
    Steffen Maier authored
    Suppose more than one non-NPIV FCP device is active on the same channel.
    Send I/O to storage and have some of the pending I/O run into a SCSI
    command timeout, e.g. due to bit errors on the fibre. Now the error
    situation stops. However, we saw FCP requests continue to timeout in the
    channel. The abort will be successful, but the subsequent TUR fails.
    Scsi_eh starts. The LUN reset fails. The target reset fails.  The host
    reset only did an FCP device recovery. However, for non-NPIV FCP devices,
    this does not close and reopen ports on the SAN-side if other non-NPIV FCP
    device(s) share the same open ports.
    
    In order to resolve the continuing FCP request timeouts, we need to
    explicitly close and reopen ports on the SAN-side.
    
    This was missing since the beginning of zfcp in v2.6.0 history commit
    ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.").
    
    Note: The FSF requests for forced port reopen could run into FSF request
    timeouts due to other reasons. This would trigger an internal FCP device
    recovery. Pending forced port reopen recoveries would get dismissed. So
    some ports might not get fully reopened during this host reset handler.
    However, subsequent I/O would trigger the above described escalation and
    eventually all ports would be forced reopen to resolve any continuing FCP
    request timeouts due to earlier bit errors.
    Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
    Fixes: 1da177e4 ("Linux-2.6.12-rc2")
    Cc: <stable@vger.kernel.org> #3.0+
    Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
    Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    242ec145
zfcp_erp.c 49.4 KB