Commits · 03b1a06203a1f283ad8a20dce78f0f5f17eaeb88 · nexedi / linux

28 Mar, 2017 1 commit

scsi: osd_uld: remove an unneeded NULL check · 03b1a062

Dan Carpenter authored Mar 23, 2017

We don't call the remove() function unless probe() succeeds so "oud"
can't be NULL here.  Plus, if it were NULL, we dereference it on the
next line so it would crash anyway.

[mkp: applied by hand]
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

03b1a062

23 Mar, 2017 37 commits

scsi: ipr: Driver version 2.6.4 · 16a20b52

Brian King authored Mar 15, 2017

Bump driver version
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

16a20b52

scsi: ipr: Fix SATA EH hang · ef97d8ae

Brian King authored Mar 15, 2017

This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap->eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ef97d8ae

scsi: ipr: Error path locking fixes · f646f325

Brian King authored Mar 15, 2017

This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f646f325

scsi: ipr: Fix abort path race condition · 439ae285

Brian King authored Mar 15, 2017

This fixes a race condition in the error handlomg paths of ipr. While a
command is outstanding to the adapter, it is placed on a pending queue
for the hrrq it is associated with, while holding the HRRQ lock. When a
command is completed, it is removed from the pending queue, under HRRQ
lock, and placed on a local list. This list is then iterated through
without any locks and each command's done function is invoked, inside of
which, the command gets returned to the free list while grabbing the
HRRQ lock. This fixes two race conditions when commands have been
removed from the pending list but have not yet been added to the free
list. Both of these changes fix race conditions that could result in
returning success from eh_abort_handler and then later calling scsi_done
for the same request.

The first race condition is in ipr_cancel_op. It looks through each
pending queue to see if the command to be aborted is still outstanding
or not. Rather than looking on the pending queue, reverse the logic to
check to look for commands that are NOT on the free queue. The second
race condition can occur when in ipr_wait_for_ops where we are waiting
for responses for commands we've aborted.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

439ae285

scsi: ipr: Remove redundant initialization · 960e9648

Brian King authored Mar 15, 2017

Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd
done function. This should have already been setup at command allocation
time and if its since been changed, it means we are in the ipr_erp*
functions and need to wait for them to complete and don't want to
override that here.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

960e9648

scsi: ipr: Fix missed EH wakeup · 66a0d59c

Brian King authored Mar 15, 2017

Following a command abort or device reset, ipr's EH handlers wait for
the commands getting aborted to get sent back from the adapter prior to
returning from the EH handler. This fixes up some cases where the
completion handler was not getting called, which would have resulted in
the EH thread waiting until it timed out, greatly extending EH time.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

66a0d59c

scsi: hisi_sas: add is_sata_phy_v2_hw() · 4935933e

Xiaofei Tan authored Mar 23, 2017

Add helper function is_sata_phy_v2_hw() to judge whether the attached
device is SATA disk for a root PHY.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4935933e

scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk · 6073b771

Xiang Chen authored Mar 23, 2017

When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is
SATA or SAS disk.  So use dev_is_sata to identify SATA or SAS disk.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6073b771

scsi: hisi_sas: check hisi_sas_lu_reset() error message · 14d3f397

John Garry authored Mar 23, 2017

Unless we actually get some sort of failure in hisi_sas_lu_reset(),
don't print a message.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

14d3f397

scsi: hisi_sas: release SMP slot in lldd_abort_task · ccbfe5a0

Xiang Chen authored Mar 23, 2017

When an SMP task timeouts, it will call lldd_abort_task to release the
associated slot, and then will release the sas_task.

Currently in lldd_abort_task, if we fail to internally abort IO, then
the slot of SMP IO is not released, but sas_task will still be later
released, so the slot's sas_task is NULL, which will cause NULL pointer
when hisi_sas_slot_task_free happens later.

To resolve, check the return value of internal abort, and release the
slot if it failed.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ccbfe5a0

scsi: hisi_sas: add hisi_sas_clear_nexus_ha() · 8b05ad6a

John Garry authored Mar 23, 2017

Add function for upper-layer to reset controller when all else fails.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8b05ad6a

scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link · 4df642db

John Garry authored Mar 23, 2017

For consistency, remove the "hisi_sas_" prefix.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4df642db

scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq · 981843c6

Xiaofei Tan authored Mar 23, 2017

Handle the situation that PHY UP and DOWN irq happen simultaneously.
There is no mechanism of SoC HW to ensure this situation will never
happen. So, we add this handle just in case.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

981843c6

scsi: hisi_sas: some modifications to v2 hw reg init values · f1dc7518

John Garry authored Mar 23, 2017

This patch includes:
(1) Disable transport layer retry
(2) Support CQ time and count interrupt coal
(3) fix link FIFO full issue
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Zhao Nenglong <zhaonenglong@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f1dc7518

scsi: hisi_sas: process error codes according to their priority · 634a9585

Xiang Chen authored Mar 23, 2017

There are some rules to decide which error code has the high priority
when errors happen together:

(1) Error phase of CQ decides the error happens on RX or TX;

(2) For TX error, when DMA/TRANS TX error happen simultaneously, the
    priority of DMA TX error is higher than TRANS TX error, so for the
    priority of TX error: DW2 (DMA TX part) > DW0;

(3) For RX error, when TRANS/DMA/SIPC RX error happen simultaneously,
    the priority of TRANS RX error is higher than DMA and SIPC RX error,
    and we should also keep the rules (the priority of DW3 > DW2), so
    for the priority of RX error: DW1 > DW3 > DW2(SIPC RX part);

(4) There are also a priority we should keep in the same error type.

So, modify slot error code to handle this.

In addition to this, some some error codes are modified according to
recommendation from SoC designer.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

634a9585

scsi: hisi_sas: remove task free'ing for timeouts · 6fcdda80

John Garry authored Mar 23, 2017

When a TMF or internal abort times-out, do not free slot. We expect this
to be done upon later escalated error handling.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6fcdda80

scsi: hisi_sas: fix some sas_task.task_state_lock locking · 54c9dd2d

John Garry authored Mar 23, 2017

Some more locking needs to be added/modified for when
read-modify-writing sas_task.task_state_flags.

Note: since we can attempt to grab this lock in interrupt
      context we should use irq variant of spin_lock.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

54c9dd2d

scsi: hisi_sas: free slots after hardreset · 6131243a

Xiang Chen authored Mar 23, 2017

After hardreset, we clear up IOs of remote disks, so we need to free
those slots in LLDD.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6131243a

scsi: hisi_sas: check for SAS_TASK_STATE_ABORTED in slot complete · a305f337

John Garry authored Mar 23, 2017

Check in slot_complete_v2_hw() for whether a task has already been
completed by upper layer.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a305f337

scsi: hisi_sas: hardreset for SATA disk in LU reset · 055945df

John Garry authored Mar 23, 2017

When issuing an LU reset for a SATA target, issue an internal abort and
a hard reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

055945df

scsi: hisi_sas: modify hisi_sas_abort_task() for SSP · c35279f2

John Garry authored Mar 23, 2017

Currently an internal abort is executed regardless of the result of the
TMF. We should also check the result of the internal abort to see if we
should free the slot.

So change the status code STAT_IO_COMPLETE to TMF_RESP_FUNC_SUCC,
meaning the slot has been successfully aborted.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c35279f2

scsi: hisi_sas: modify error handling for v2 hw · fc866951

Xiang Chen authored Mar 23, 2017

For error codes which need abort-and-retry, simulate IO timeout and let
SCSI+ATA layers process those errors.

Previously for SSP, we should try to abort the IO in the LLDD and then
pass back to upper layer, but sometimes this would also error. So
Instead of adding special error handling for this scenario in the LLDD,
allow the upper layer to handle completely.

No performance hit is seen by taking this approach.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fc866951

scsi: hisi_sas: only reset link for PHY_FUNC_LINK_RESET · b4c67a6c

John Garry authored Mar 23, 2017

We currently do a hard reset for a link reset. Change this to do a link
reset only.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b4c67a6c

scsi: hisi_sas: error hisi_sas_task_prep() when port down · ddabca21

John Garry authored Mar 23, 2017

When sas_port is NULL, then return SAS_PHY_DOWN.

In addition, when the sas_dev is gone then explicitly return
SAS_PHY_DOWN.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ddabca21

scsi: hisi_sas: remove hisi_sas_port_deformed() · 405314df

John Garry authored Mar 23, 2017

Currently when a root PHY is deformed from a asd_sas_port we try to
release the slots in the LLDD, and fail.

Regardless, it is not right to release this early.

This patch removes the deformed function. As it was before, port
deformation is still done in hisi_sas_phy_down().

It would be nice to actually remove the hisi_sas_port_{de}formed() pair,
however we cannot as we need to know the asd_sas_port index libsas has
associated with an asd_sas_phy.

The hw does actually generate a port id for a PHY, but this seems to a
random number, so ignored for this purpose.

This patch also changes the code to link slots to the hisi_sas_device,
and not hisi_sas_port.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

405314df

scsi: hisi_sas: add softreset function for SATA disk · 7c594f04

Xiang Chen authored Mar 23, 2017

Add softreset to clear IO after internal abort device for SATA disk.

The SATA error handling for the controller is based on device internal
abort and softreset function.

The controller does not support internal abort for single IO, so we need
to execute internal abort for device.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7c594f04

scsi: hisi_sas: move PHY init to hisi_sas_scan_start() · 396b8044

John Garry authored Mar 23, 2017

Relocate the PHY init code from LLDD hw init path to
hisi_sas_scan_start().
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

396b8044

scsi: hisi_sas: add controller reset · 06ec0fb9

Xiang Chen authored Mar 23, 2017

There are some scenarios that we need to warm-reset to reset registers
of SAS controller. During reset we disable interrupts/DQs/PHYs, and
after reset we re-init the hardware and rescan the topology to see if
anything changed.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

06ec0fb9

scsi: hisi_sas: add to_hisi_sas_port() · 2e244f0f

John Garry authored Mar 23, 2017

Introduce function to get hisi_sas_port from asd_sas_port.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2e244f0f

scsi: fnic: bug fix for fip.fip_subcode in fnic_fcoe_send_vlan_req · b8e1aa3c

Satish Kharat authored Mar 22, 2017

This is a bug introduced when they moved the fip subcodes to central
place. Was sending FIP_SC_VL_NOTE in fip.fip_subcode for VLAN request in
fnic_fcoe_send_vlan_req. Change is to use FIP_SC_VL_REQ instead.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b8e1aa3c

scsi: fnic: Adding debug IO and Abort latency counter to fnic stats · 445d2960

Satish Kharat authored Feb 28, 2017

The IO and Abort latency counter counts the time taken to complete the
IO and abort command into broad buckets. This is not intended for
performance measurement, just a debug statistic.  current_max_io_time
tries to keep track of the maximum time an IO has taken to complete if
it is > 30sec.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

445d2960

scsi: fnic: Adding Check Condition counter to misc fnicstats · 39fcbbc0

Satish Kharat authored Feb 28, 2017

Just a simple counter of number of check conditions encountered on that
host.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

39fcbbc0

scsi: fnic: Avoid false out-of-order detection for aborted command · b9202b4a

Satish Kharat authored Feb 28, 2017

If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion, avoid this
being flagged as out-of-order completion by setting the FNIC_IO_DONE
flag in fnic_fcpio_icmnd_cmpl_handler
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b9202b4a

scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative · 7ef539c8

Satish Kharat authored Feb 28, 2017

Fixing the IO stats update (Active IOs and IO completion) to prevent
"Number of Active IOs" from becoming negative in the fnistats output.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7ef539c8

scsi: fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case · ccc6d704

Satish Kharat authored Feb 28, 2017

Getting rid of else case to make the flow look bit simpler.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ccc6d704

scsi: fnic: Ratelimit printks to avoid flooding when vlan is not set by the switch.i · b43abcbb

Satish Kharat authored Feb 28, 2017

This is to avoid the log from being filled with vlan discovery messages
when there is no vlan configured on the switch.
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b43abcbb

scsi: fnic: switch to pci_alloc_irq_vectors · cca678df

Christoph Hellwig authored Feb 01, 2017

Not a full cleanup for the IRQ code, for that we'd need to know if the
max number of the various CQ types is going to stay 1 forever.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cca678df

16 Mar, 2017 1 commit

scsi: esas2r: Remove redundant NULL check on buffer · 33bd712b

Colin Ian King authored Mar 15, 2017

Buffer is a pointer to the static char array event_buffer and therefore
can never be null, so the check is redundant. Remove it.

Detected by CoverityScan, CID#1077556 ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Bradley Grove <bgrove@attotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

33bd712b

15 Mar, 2017 1 commit

scsi: remove incorrect __exit markups · ba21222d

Dmitry Torokhov authored Mar 01, 2017

Even if bus is not hot-pluggable, devices can be unbound from the driver
via sysfs, so we should not be using __exit annotations on remove()
methods. The only exception is drivers registered with
platform_driver_probe() which specifically disables sysfs bind/unbind
attributes.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ba21222d