Commits · 769031c92c8b85bd886f7e064430b7b512c68728 · Kirill Smelkov / linux

15 May, 2018 2 commits

scsi: target: target/file: Add support of direct and async I/O · 769031c9

Andrei Vagin authored Mar 21, 2018

There are two advantages:

* Direct I/O allows to avoid the write-back cache, so it reduces affects
  to other processes in the system.
* Async I/O allows to handle a few commands concurrently.

DIO + AIO shows a better perfomance for random write operations:

Mode: O_DSYNC Async: 1
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
  WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec

Mode: O_DSYNC Async: 0
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
  WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec

Known issue:

DIF (PI) emulation doesn't work when a target uses async I/O, because
DIF metadata is saved in a separate file, and it is another non-trivial
task how to synchronize writing in two files, so that a following read
operation always returns a consisten metadata for a specified block.

Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

769031c9

scsi: libosd: Remove VLA usage · 97e06618

Kees Cook authored May 02, 2018

On the quest to remove all VLAs from the kernel[1] this rearranges the
code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const"
variables as not triggering VLA creation). Additionally cleans up
variable naming to avoid 80 character column limit.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

97e06618

08 May, 2018 38 commits

scsi: tcmu: refactor nl wr_cache attr with new helpers · 33d065cc

Zhu Lingshan authored May 02, 2018

use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_WRITECACHE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
emulate_write_cache in configFS.

Removed tcmu_netlink_event() since we have new netlink
events helpers now.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

33d065cc

scsi: tcmu: refactor nl dev_size attr with new helpers · 84e28506

Zhu Lingshan authored May 02, 2018

use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_SIZE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_size in configFS.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

84e28506

scsi: tcmu: refactor nl dev_cfg attr with new nl helpers · 02ccfb54

Zhu Lingshan authored May 02, 2018

use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_CFG(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_config in configFS.
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

02ccfb54

scsi: tcmu: refactor rm_device cmd with new nl helpers · f892bd8e

Zhu Lingshan authored May 02, 2018

use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_REMOVED_DEVICE
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f892bd8e

scsi: tcmu: refactor add_device cmd with new nl helpers · e0c240ac

Zhu Lingshan authored May 02, 2018

use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_ADDED_DEVICE
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e0c240ac

scsi: tcmu: add new netlink events helpers · 0e5aee39

Zhu Lingshan authored May 02, 2018

Add new netlink events helpers tcmu_netlink_event_init() and
tcmu_netlink_event_send(). These new functions intend to replace
existing netlink events helper function tcmu_netlink_event().

The existing function tcmu_netlink_event() works well for events like
TCMU_ADDED_DEVICE and TCMU_REMOVED_DEVICE which only has one netlink
attribute. But if there is a command requires more than one attributes
to send out, we have to use a struct to adapt the paremeter
reconfig_data, it is hard to use one struct or a union in one struct to
adapt every command with different attributes, it may get long and ugly.

With the new two functions, we can call tcmu_netlink_event_init() to
initialize a netlink event, then add all attributes we need by using
nla_put_xxx(), at last use tcmu_netlink_event_send() to send it out. So
that we don't need to use a long struct or union if we want to send
mulitple attributes for different commands.

[mkp: typos]
Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0e5aee39

scsi: 3w-xxxx: fix a missing-check bug · 9899e4d3

Wenwen Wang authored May 07, 2018

In tw_chrdev_ioctl(), the length of the data buffer is firstly copied
from the userspace pointer 'argp' and saved to the kernel object
'data_buffer_length'. Then a security check is performed on it to make
sure that the length is not more than 'TW_MAX_IOCTL_SECTORS *
512'. Otherwise, an error code -EINVAL is returned. If the security
check is passed, the entire ioctl command is copied again from the
'argp' pointer and saved to the kernel object 'tw_ioctl'. Then, various
operations are performed on 'tw_ioctl' according to the 'cmd'. Given
that the 'argp' pointer resides in userspace, a malicious userspace
process can race to change the buffer length between the two
copies. This way, the user can bypass the security check and inject
invalid data buffer length. This can cause potential security issues in
the following execution.

This patch checks for capable(CAP_SYS_ADMIN) in tw_chrdev_open() to
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9899e4d3

scsi: 3w-9xxx: fix a missing-check bug · c9318a3e

Wenwen Wang authored May 07, 2018

In twa_chrdev_ioctl(), the ioctl driver command is firstly copied from
the userspace pointer 'argp' and saved to the kernel object
'driver_command'.  Then a security check is performed on the data buffer
size indicated by 'driver_command', which is
'driver_command.buffer_length'. If the security check is passed, the
entire ioctl command is copied again from the 'argp' pointer and saved
to the kernel object 'tw_ioctl'. Then, various operations are performed
on 'tw_ioctl' according to the 'cmd'. Given that the 'argp' pointer
resides in userspace, a malicious userspace process can race to change
the buffer size between the two copies. This way, the user can bypass
the security check and inject invalid data buffer size. This can cause
potential security issues in the following execution.

This patch checks for capable(CAP_SYS_ADMIN) in twa_chrdev_open()t o
avoid the above issues.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c9318a3e

scsi: mpt3sas: fix header path in ioctl documentation · a217b311

Tomohiro Kusumi authored May 04, 2018

MPT2_MAGIC_NUMBER as well as drivers/scsi/mpt2sas/mpt2sas_ctl.h were
removed to reuse mpt3sas code since commit 09ec55ed ("mpt2sas: Remove
.c and .h files from mpt2sas driver").
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a217b311

scsi: mpt3sas: remove obsolete path "drivers/scsi/mpt2sas/" from MAINTAINERS · f1367896

Tomohiro Kusumi authored May 04, 2018

drivers/scsi/mpt2sas/ no longer exists after commit
c84b06a4 ("mpt3sas: Single driver module which supports both SAS 2.0 &
SAS 3.0 HBAs") merged/removed it.
Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@osnexus.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f1367896

scsi: megaraid: silence a static checker bug · 27e833da

Dan Carpenter authored May 03, 2018

If we had more than 32 megaraid cards then it would cause memory
corruption.  That's not likely, of course, but it's handy to enforce it
and make the static checker happy.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

27e833da

scsi: mptsas: fix spelling mistake: "matchs" -> "matches" · c09a21d8

Colin Ian King authored May 03, 2018

Trivial fix to spelling mistake in warning message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c09a21d8

scsi: lpfc: fix spelling mistakes: "mabilbox" and "maibox" · 7afc0ce9

Colin Ian King authored May 03, 2018

Trivial fix to spelling mistakes in lpfc_printf_log log message

"mabilbox" -> "mailbox"
"maibox" -> "mailbox"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7afc0ce9

scsi: qla2xxx: remove the unused tcm_qla2xxx_cmd_wq · 4b83cb8b

Andrei Vagin authored May 02, 2018

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4b83cb8b

scsi: mptfusion: fix spelling mistake: "initators" -> "initiators" · b9315530

Colin Ian King authored May 02, 2018

Trivial fix to spelling mistake in text string.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b9315530

scsi: hisi_sas: workaround a v3 hw hilink bug · f70c1251

Xiaofei Tan authored May 02, 2018

There is an SoC bug of v3 hw development version. When hot- unplugging a
directly attached disk, the PHY down interrupt may not happen. It is
very easy to appear on some boards.

When this issue occurs, the controller will receive many invalid dword
frames, and the "alos" fields of register HILINK_ERR_DFX can indicate
that disk was unplugged.

As an workaround solution, this patch detects this issue in the channel
interrupt, and workaround it by following steps:

 - Disable the PHY
 - Clear error code and interrupt
 - Enable the PHY

Then the HW will reissue PHY down interrupt.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f70c1251

scsi: hisi_sas: add readl poll timeout helper wrappers · 9b8addf3

John Garry authored May 02, 2018

It is common to use readl poll timeout helpers in the driver, so create
custom wrappers.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9b8addf3

scsi: hisi_sas: remove redundant handling to event95 for v3 · bf081d5d

Xiaofei Tan authored May 02, 2018

Event95 is used for DFX purpose. The relevant bit for this interrupt in
the ENT_INT_SRC_MSK3 register has been disabled, so remove the
processing.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bf081d5d

scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw · 94135327

Xiang Chen authored May 02, 2018

As a unconstrained command, a command can be sent to SATA disk even if
SATA disk status is BUSY, ERR or DRQ.

If an ATA reset assert is successful but ATA reset de-assert fails, then
it will retry the reset de-assert. If reset de- assert retry is
successful, we think it is okay to probe the device but actually it
still has Err status.

Apparently we need to retry the ATA reset assertion and de- assertion
instead for this mentioned scenario.

As such, we config ATA reset assert as a constrained command, if ATA
reset de-assert fails, then ATA reset de-assert retry will also
fail. Then we will retry the proper process of ATA reset assert and
de-assert again.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

94135327

scsi: hisi_sas: update PHY linkrate after a controller reset · c2c1d9de

Xiang Chen authored May 02, 2018

After the controller is reset, we currently may not honour the PHY max
linkrate set via sysfs, in that after a reset we always revert to max
linkrate of 12Gbps, ignoring the value set via sysfs.

This patch modifies to policy to set the programmed PHY linkrate,
honouring the max linkrate programmed via sysfs.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c2c1d9de

scsi: hisi_sas: stop controller timer for reset · 6f7c32d6

John Garry authored May 02, 2018

We should only have the timer enabled after PHY up after controller
reset, so disable prior to reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6f7c32d6

scsi: hisi_sas: check sas_dev gone earlier in hisi_sas_abort_task() · c6ef8954

Xiang Chen authored May 02, 2018

It is possible to dereference a NULL-pointer in hisi_sas_abort_task() in
special scenario when the device has been removed.

If an SMP task times-out, it will call hisi_sas_abort_task() to
recover. And currently there is a check in hisi_sas_abort_task() to
avoid the situation of processing the abort for the removed device.

However we have an ordering problem, in that we may reference a task for
the removed device before checking if the device has been removed.

Fix this by only referencing the sas_dev after we know it is still
present.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c6ef8954

scsi: hisi_sas: fix PI memory size · a14da7a2

Xiang Chen authored May 02, 2018

There are 28 bytes of protection information record of SSP for v3 hw, 16
bytes for v2 hw, and probably 24 for v1 hw (forgotten now).

So use a value big enough in hisi_sas_command_table_ssp.prot to cover
all cases.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a14da7a2

scsi: hisi_sas: check host frozen before calling "done" function · cd938e53

Xiang Chen authored May 02, 2018

When the host is frozen in SCSI EH state, at any point after the LLDD
sets SAS_TASK_STATE_DONE for the sas_task task state, libsas may free
the task; see sas_scsi_find_task().

This puts the LLDD in a difficult position, in that once it sets
SAS_TASK_STATE_DONE for the task state it should not reference the
sas_task again. But the LLDD needs will check the sas_task indirectly in
calling task->task_done()->sas_scsi_task_done() or sas_ata_task_done()
(to check if the host is frozen state actually).

And the LLDD cannot set SAS_TASK_STATE_DONE for the task state after
task->task_done() is called (as the sas_task is free'd at this point).

This situation would seem to be a problem made by libsas.

To work around, check in the LLDD whether the host is in frozen state to
ensure it is ok to call task->task_done() function. If in the frozen
state, we rely on SCSI EH and libsas to free the sas_task directly.

We do not do this for the following IO types:

 - SMP - they are managed in libsas directly, outside SCSI EH
 - Any internally originated IO, for similar reason
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd938e53

scsi: hisi_sas: Add some checks to avoid free'ing a sas_task twice · b81b6cce

Xiang Chen authored May 02, 2018

If the SCSI host enters EH, any pending IO will be processed by SCSI
EH. However it is possible that SCSI EH will try to abort the IO and
also at the same time the IO completes in the driver. In this situation
there is a small chance of freeing the sas_task twice.

Then if another IO re-uses freed sas_task before the second time of
free'ing sas_task, it is possible to free incorrect sas_task.

To avoid this situation, add some checks to increase reliability. The
sas_task task state flag SAS_TASK_STATE_ABORTED is used to mutually
protect the LLDD and libsas freeing the task.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b81b6cce

scsi: hisi_sas: optimise the usage of DQ locking · 24cf4361

Xiang Chen authored May 02, 2018

In the DQ tasklet processing it is not necessary to take the DQ lock, as
there is no contention between adding slots to the CQ and removing slots
from the matching DQ.

In addition, since we run each DQ in a separate tasklet context, there
would be no possible contention between DQ processing running for the
same queue in parallel.

It is still necessary to take hisi_hba lock when free'ing slots.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

24cf4361

scsi: lpfc: Comment cleanup regarding Broadcom copyright header · 3e21d1cb

James Smart authored May 04, 2018

Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3e21d1cb

scsi: lpfc: update driver version to 12.0.0.3 · e65d8c34

James Smart authored May 04, 2018

Update the driver version to 12.0.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e65d8c34

scsi: lpfc: Enhance log messages when reporting CQE errors · 11f0e34f

James Smart authored May 04, 2018

Enhance log messages for CQEs as they were not reporting certain fields.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

11f0e34f

scsi: lpfc: Fix up log messages and stats counters in IO submit code path · 44c2757b

James Smart authored May 04, 2018

Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44c2757b

scsi: lpfc: Driver NVME load fails when CPU cnt > WQ resource cnt · d38f33b3

James Smart authored May 04, 2018

If the cpu count is larger than the number of WQ resources available,
adapter attachment eventually failes due to a WQ_CREATE failure.

Calculate the number of WQs desired (which initializes to cpu count)
after accounting for the number of queues the adapter supports and the
number allocated to SCSI and the control/ELS path, and scale down if
necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d38f33b3

scsi: lpfc: Handle new link fault code returned by adapter firmware. · 23288b78

James Smart authored May 04, 2018

The driver encounters a link event ACQE with a fault code it doesn't
recognize, it logs an "Invalid" fault type and futher treats the unknown
value as a mailbox command failure.  First off, there is no "invalid"
value, only values that are unknown. Secondly, the fault code doesn't
indicate status - the rest of the ACQE contains that status so there is
no reason to "fail the commands".

Change the "Invalid" to "Unknown". There is no "invalid" code value.

Separate fault code parsing and message genaration from any mbx handling
status.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

23288b78

scsi: lpfc: Correct fw download error message · a72d56b2

James Smart authored May 04, 2018

In situations when the firmware image in inappropriate for the chip
type, initial validation checks were light, allowing the checks to pass,
thus allowing the firmware to be downloaded.  Eventually, after the
download, the chip rejects the firmware but it is logged as a generic
firmware download error.

Revise the initial checks to validate the image vs asic type so that the
correct message is displayed and the download process is avoided.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a72d56b2

scsi: lpfc: enhance LE data structure copies to hardware · 48f8fdb4

James Smart authored May 04, 2018

The driver builds the control structures in host memory using
definitions that are based on 32-bit words. After building the structure
it is then written to the adapter.

This patch slightly optimizes LE hosts by copying the structures via
64-bit copies. This is doable as the adapter interface is LE thus there
is no byteswapping as the copy is performed.

The same optimization would be nice on BE systems, but when byteswapping
occurs, it swaps 32-bit words as well, thus trashing the control
structure. Given amount of code that is dependent upon the 32-bit word
definition, it was decided to not change things for the minor
optimization. Thus PPC 64-bit systems sticks with doing 32-bit copies.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

48f8fdb4

scsi: lpfc: Change IO submit return to EBUSY if remote port is recovering · cd240071

James Smart authored May 04, 2018

I/O submission paths in the lpfc nvme path are rejecting the io with an
error code that reflects back to the callee as a hard io failure. Many
of these conditions are transient and would likely resolve if retried.

Correct by returning -EBUSY, which the FC transport triggers off of to
return busy status codes to the blk-mq layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd240071

scsi: qedf: Update version number to 8.33.16.20 · ab7ad49d

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ab7ad49d

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5d1c8b5b

scsi: qedf: Add more defensive checks for concurrent error conditions · f3690a89

Chad Dupuis authored Apr 25, 2018

During an uplink toggle test all error handling is done via timeout and
firmware error conditions which can occur concurrently:

 - SCSI layer timeouts
 - Error detect CQEs
 - Firmware detected underruns
 - ABTS timeouts

All these concurrent events require more defensive checks in the driver
including:

 - Check both internally and externally generated aborts to make sure the
   xid is not already been aborted in another context or in cleanup.

 - Check back pointers in qedf_cmd_timeout to verify the context of the
   io_req, fcport and qedf_ctx

 - Check rport state in host reset handler to not reset the whole host
   if the rport is already uploaded or in the process of relogin

 - Check to state for an fcport before initiating a middle path ELS
   request
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f3690a89