Commits · 9b8addf3024eb57a215d0af2e1c95cd44b94ccab · Kirill Smelkov / linux

08 May, 2018 40 commits

scsi: hisi_sas: add readl poll timeout helper wrappers · 9b8addf3

John Garry authored May 02, 2018

It is common to use readl poll timeout helpers in the driver, so create
custom wrappers.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9b8addf3

scsi: hisi_sas: remove redundant handling to event95 for v3 · bf081d5d

Xiaofei Tan authored May 02, 2018

Event95 is used for DFX purpose. The relevant bit for this interrupt in
the ENT_INT_SRC_MSK3 register has been disabled, so remove the
processing.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bf081d5d

scsi: hisi_sas: config ATA de-reset as an constrained command for v3 hw · 94135327

Xiang Chen authored May 02, 2018

As a unconstrained command, a command can be sent to SATA disk even if
SATA disk status is BUSY, ERR or DRQ.

If an ATA reset assert is successful but ATA reset de-assert fails, then
it will retry the reset de-assert. If reset de- assert retry is
successful, we think it is okay to probe the device but actually it
still has Err status.

Apparently we need to retry the ATA reset assertion and de- assertion
instead for this mentioned scenario.

As such, we config ATA reset assert as a constrained command, if ATA
reset de-assert fails, then ATA reset de-assert retry will also
fail. Then we will retry the proper process of ATA reset assert and
de-assert again.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

94135327

scsi: hisi_sas: update PHY linkrate after a controller reset · c2c1d9de

Xiang Chen authored May 02, 2018

After the controller is reset, we currently may not honour the PHY max
linkrate set via sysfs, in that after a reset we always revert to max
linkrate of 12Gbps, ignoring the value set via sysfs.

This patch modifies to policy to set the programmed PHY linkrate,
honouring the max linkrate programmed via sysfs.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c2c1d9de

scsi: hisi_sas: stop controller timer for reset · 6f7c32d6

John Garry authored May 02, 2018

We should only have the timer enabled after PHY up after controller
reset, so disable prior to reset.
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6f7c32d6

scsi: hisi_sas: check sas_dev gone earlier in hisi_sas_abort_task() · c6ef8954

Xiang Chen authored May 02, 2018

It is possible to dereference a NULL-pointer in hisi_sas_abort_task() in
special scenario when the device has been removed.

If an SMP task times-out, it will call hisi_sas_abort_task() to
recover. And currently there is a check in hisi_sas_abort_task() to
avoid the situation of processing the abort for the removed device.

However we have an ordering problem, in that we may reference a task for
the removed device before checking if the device has been removed.

Fix this by only referencing the sas_dev after we know it is still
present.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c6ef8954

scsi: hisi_sas: fix PI memory size · a14da7a2

Xiang Chen authored May 02, 2018

There are 28 bytes of protection information record of SSP for v3 hw, 16
bytes for v2 hw, and probably 24 for v1 hw (forgotten now).

So use a value big enough in hisi_sas_command_table_ssp.prot to cover
all cases.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a14da7a2

scsi: hisi_sas: check host frozen before calling "done" function · cd938e53

Xiang Chen authored May 02, 2018

When the host is frozen in SCSI EH state, at any point after the LLDD
sets SAS_TASK_STATE_DONE for the sas_task task state, libsas may free
the task; see sas_scsi_find_task().

This puts the LLDD in a difficult position, in that once it sets
SAS_TASK_STATE_DONE for the task state it should not reference the
sas_task again. But the LLDD needs will check the sas_task indirectly in
calling task->task_done()->sas_scsi_task_done() or sas_ata_task_done()
(to check if the host is frozen state actually).

And the LLDD cannot set SAS_TASK_STATE_DONE for the task state after
task->task_done() is called (as the sas_task is free'd at this point).

This situation would seem to be a problem made by libsas.

To work around, check in the LLDD whether the host is in frozen state to
ensure it is ok to call task->task_done() function. If in the frozen
state, we rely on SCSI EH and libsas to free the sas_task directly.

We do not do this for the following IO types:

 - SMP - they are managed in libsas directly, outside SCSI EH
 - Any internally originated IO, for similar reason
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd938e53

scsi: hisi_sas: Add some checks to avoid free'ing a sas_task twice · b81b6cce

Xiang Chen authored May 02, 2018

If the SCSI host enters EH, any pending IO will be processed by SCSI
EH. However it is possible that SCSI EH will try to abort the IO and
also at the same time the IO completes in the driver. In this situation
there is a small chance of freeing the sas_task twice.

Then if another IO re-uses freed sas_task before the second time of
free'ing sas_task, it is possible to free incorrect sas_task.

To avoid this situation, add some checks to increase reliability. The
sas_task task state flag SAS_TASK_STATE_ABORTED is used to mutually
protect the LLDD and libsas freeing the task.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b81b6cce

scsi: hisi_sas: optimise the usage of DQ locking · 24cf4361

Xiang Chen authored May 02, 2018

In the DQ tasklet processing it is not necessary to take the DQ lock, as
there is no contention between adding slots to the CQ and removing slots
from the matching DQ.

In addition, since we run each DQ in a separate tasklet context, there
would be no possible contention between DQ processing running for the
same queue in parallel.

It is still necessary to take hisi_hba lock when free'ing slots.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

24cf4361

scsi: lpfc: Comment cleanup regarding Broadcom copyright header · 3e21d1cb

James Smart authored May 04, 2018

Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3e21d1cb

scsi: lpfc: update driver version to 12.0.0.3 · e65d8c34

James Smart authored May 04, 2018

Update the driver version to 12.0.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e65d8c34

scsi: lpfc: Enhance log messages when reporting CQE errors · 11f0e34f

James Smart authored May 04, 2018

Enhance log messages for CQEs as they were not reporting certain fields.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

11f0e34f

scsi: lpfc: Fix up log messages and stats counters in IO submit code path · 44c2757b

James Smart authored May 04, 2018

Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44c2757b

scsi: lpfc: Driver NVME load fails when CPU cnt > WQ resource cnt · d38f33b3

James Smart authored May 04, 2018

If the cpu count is larger than the number of WQ resources available,
adapter attachment eventually failes due to a WQ_CREATE failure.

Calculate the number of WQs desired (which initializes to cpu count)
after accounting for the number of queues the adapter supports and the
number allocated to SCSI and the control/ELS path, and scale down if
necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d38f33b3

scsi: lpfc: Handle new link fault code returned by adapter firmware. · 23288b78

James Smart authored May 04, 2018

The driver encounters a link event ACQE with a fault code it doesn't
recognize, it logs an "Invalid" fault type and futher treats the unknown
value as a mailbox command failure.  First off, there is no "invalid"
value, only values that are unknown. Secondly, the fault code doesn't
indicate status - the rest of the ACQE contains that status so there is
no reason to "fail the commands".

Change the "Invalid" to "Unknown". There is no "invalid" code value.

Separate fault code parsing and message genaration from any mbx handling
status.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

23288b78

scsi: lpfc: Correct fw download error message · a72d56b2

James Smart authored May 04, 2018

In situations when the firmware image in inappropriate for the chip
type, initial validation checks were light, allowing the checks to pass,
thus allowing the firmware to be downloaded.  Eventually, after the
download, the chip rejects the firmware but it is logged as a generic
firmware download error.

Revise the initial checks to validate the image vs asic type so that the
correct message is displayed and the download process is avoided.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a72d56b2

scsi: lpfc: enhance LE data structure copies to hardware · 48f8fdb4

James Smart authored May 04, 2018

The driver builds the control structures in host memory using
definitions that are based on 32-bit words. After building the structure
it is then written to the adapter.

This patch slightly optimizes LE hosts by copying the structures via
64-bit copies. This is doable as the adapter interface is LE thus there
is no byteswapping as the copy is performed.

The same optimization would be nice on BE systems, but when byteswapping
occurs, it swaps 32-bit words as well, thus trashing the control
structure. Given amount of code that is dependent upon the 32-bit word
definition, it was decided to not change things for the minor
optimization. Thus PPC 64-bit systems sticks with doing 32-bit copies.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

48f8fdb4

scsi: lpfc: Change IO submit return to EBUSY if remote port is recovering · cd240071

James Smart authored May 04, 2018

I/O submission paths in the lpfc nvme path are rejecting the io with an
error code that reflects back to the callee as a hard io failure. Many
of these conditions are transient and would likely resolve if retried.

Correct by returning -EBUSY, which the FC transport triggers off of to
return busy status codes to the blk-mq layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cd240071

scsi: qedf: Update version number to 8.33.16.20 · ab7ad49d

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ab7ad49d

scsi: qedf: Update copyright for 2018 · 5d1c8b5b

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5d1c8b5b

scsi: qedf: Add more defensive checks for concurrent error conditions · f3690a89

Chad Dupuis authored Apr 25, 2018

During an uplink toggle test all error handling is done via timeout and
firmware error conditions which can occur concurrently:

 - SCSI layer timeouts
 - Error detect CQEs
 - Firmware detected underruns
 - ABTS timeouts

All these concurrent events require more defensive checks in the driver
including:

 - Check both internally and externally generated aborts to make sure the
   xid is not already been aborted in another context or in cleanup.

 - Check back pointers in qedf_cmd_timeout to verify the context of the
   io_req, fcport and qedf_ctx

 - Check rport state in host reset handler to not reset the whole host
   if the rport is already uploaded or in the process of relogin

 - Check to state for an fcport before initiating a middle path ELS
   request
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f3690a89

scsi: qedf: Set the UNLOADING flag when removing a vport · 4f4616ce

Chad Dupuis authored Apr 25, 2018

Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted.  This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.

Fixes the crash:

PID: 106676  TASK: ffff9a436aa90000  CPU: 12  COMMAND: "multipathd"
 #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
 #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
 #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
    [exception RIP: qedf_init_task+61]
    RIP: ffffffffc0e13c2d  RSP: ffff9a43567d38d0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffbe920472c738  RCX: ffff9a434fa0e3e8
    RDX: ffff9a434f695280  RSI: ffffbe920472c738  RDI: ffff9a43aa359c80
    RBP: ffff9a43567d3950   R8: 0000000000000c15   R9: ffff9a3fb09b9880
    R10: ffff9a434fa0e3e8  R11: ffff9a43567d35ce  R12: 0000000000000000
    R13: ffff9a434f695280  R14: ffff9a43aa359c80  R15: ffff9a3fb9e005c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4f4616ce

scsi: qedf: Add additional checks when restarting an rport due to ABTS timeout · 92bbccdf

Chad Dupuis authored Apr 25, 2018

There are a couple of kernel cases when we restart a remote port due to
ABTS timeout that we need to handle:

 1. Flush any outstanding ABTS requests when flushing I/Os so that we do
    not hold up the eh_abort handler indefinitely causing process hangs.

 2. Check if we are currently uploading a connection before issuing an
    ABTS.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

92bbccdf

scsi: qedf: If qed fails to enable MSI-X fail PCI probe · 96673e1e

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

96673e1e

scsi: qedf: Honor default_prio module parameter even if DCBX does not converge · 65b7beca
Chad Dupuis authored Apr 25, 2018
```
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
```
65b7beca

scsi: qedf: Improve firmware debug dump handling · 4b9b7fab

Chad Dupuis authored Apr 25, 2018

Get all firmware debug data instead of just a grc dump.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4b9b7fab

scsi: qedf: Remove setting DCBX pending during soft context reset · f9a4a7f2

Saurav Kashyap authored Apr 25, 2018

PROBLEM DESCRIPTION:

According to the logs, STAG was changing and it was triggering soft
reset.  In soft reset we used to virtual link down and up and also we
were disabling DCBx flag. Since this was virtual link flap, DCBx never
used to converge again.

SOLUTION:

Code change is to remove disabling DCBx flag from soft reset.
Signed-off-by: Saurav Kashyap <saurav.kashyap@cavium.com>
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f9a4a7f2

scsi: qedf: Add task id to kref_get_unless_zero() debug messages when flushing requests · 8025c842

Chad Dupuis authored Apr 25, 2018

Helps to corroborate which requests we can't get reference on and if
it's real bug or not.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8025c842

scsi: qedf: Check if link is already up when receiving a link up event from qed · 3f9de7f0

Chad Dupuis authored Apr 25, 2018

[mkp: typo]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3f9de7f0

scsi: qedf: Return request as DID_NO_CONNECT if MSI-X is not enabled · a8f192bc

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a8f192bc

scsi: qedf: Release RRQ reference correctly when RRQ command times out · adf48842

Chad Dupuis authored Apr 25, 2018

When an RRQ request times out the reference is not getting decremented
correctly as there are still ELS commands leftover when we flush any
pending I/Os during offload:

[  281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
...
[  281.788553] [0000:21:00.3]:[qedf_cmd_timeout:58]:4: ELS timeout, xid=0x96a.
[  281.788772] [0000:21:00.3]:[qedf_rrq_compl:182]:4: Entered.
[  281.788774] [0000:21:00.3]:[qedf_rrq_compl:200]:4: rrq_compl: orig io = ffffc90004c556f8, orig xid = 0x81b, rrq_xid = 0x96a, refcount=1
...
[  331.448032] [0000:21:00.3]:[qedf_flush_els_req:1512]:4: Flushing ELS request xid=0x96a refcount=2.

The fix is to call kref_put on the rrq_req in case of timeout as the
timeout handler will call rrq_compl directly vs. a normal completion
where it is call from els_compl.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

adf48842

scsi: qedf: Honor priority from DCBX FCoE App tag · 84b2ba6e

Chad Dupuis authored Apr 25, 2018

We currently hard code the priority in the 8021q tag to 3 for FCoE
traffic.  The vast majority of the time this is fine but if the priority
is something else besides 3, any VLAN ID comparison either in the
non-offload path or offload path will fail and cause dropped frames
where none are expected.

Change the behavior so that the driver default is 3 if we do not get any
DCBX convergence.

If DCBX does converge, then set the FIP/FCoE priority in the following
manner:

 1. If the qedf_default_prio modparam is set use that
 2. If the DCBX FCoE priority is not in range (0..7) use 3
 3. Use the DCBX FCoE priority we get in the driver's DCBX handler
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

84b2ba6e

scsi: qedf: Add dcbx_not_wait module parameter so we won't wait for DCBX... · ba17d379

Chad Dupuis authored Apr 25, 2018

scsi: qedf: Add dcbx_not_wait module parameter so we won't wait for DCBX convergence to start discovery

This module parameter is to work around cases where we do not receive
the DCBX handler notification from qed but discovery is still possible
if we send out a FIP VLAN request irregardless of the DCBX state.

[mkp: zeroday warning]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ba17d379

scsi: qedf: Sanity check FCoE/FIP priority value to make sure it's between 0 and 7 · a93755cf
Chad Dupuis authored Apr 25, 2018
```
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
```
a93755cf

scsi: qedf: Add check for offload before flushing I/Os for target · 766639ca

Chad Dupuis authored Apr 25, 2018

We need to check that a fcport is offloaded before we try to flush any
requests.  No doing so could lead to undefined results and most likely a
crash.

Fixes the oops:

[  343.971886] [0000:42:00.3]:[qedf_execute_tmf:2070]:8: wait for tm_cmpl timeout!
[  343.971933] BUG: unable to handle kernel paging request at 00000000000024a8
[  343.971949] IP: [<ffffffffa06b8cc6>] qedf_flush_active_ios+0x46/0x260 [qedf]
[  343.971952] PGD 42c569067 PUD 4160fe067 PMD 0
[  343.971954] Oops: 0000 [#1] SMP
[  343.972008] Modules linked in: qedf(OEX) qed(OEX) bnx2i cnic fuse af_packet iscsi_ibft msr xfs intel_rapl sb_edac edac_core x86_pkg_temp_thermal bnx2x geneve intel_powerclamp vxlan coretemp ipmi_ssif ipmi_devintf kvm_intel kvm libiscsi joydev irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel tg3 ip6_udp_tunnel udp_tunnel mdio libcrc32c iTCO_wdt scsi_transport_iscsi uio drbg iTCO_vendor_support iscsi_boot_sysfs dcdbas(X) ipmi_si ansi_cprng aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper ptp pps_core pcspkr libphy lpc_ich mfd_core cryptd fjes wmi ipmi_msghandler button crc8 libfcoe libfc scsi_transport_fc mei_me mei shpchp processor acpi_pad btrfs xor hid_generic usbhid raid6_pq sd_mod sr_mod cdrom mgag200 crc32c_intel i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[  343.972020]  fb_sys_fops ttm ahci ehci_pci libahci ehci_hcd drm libata usbcore megaraid_sas usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4 [last unloaded: qedf]
[  343.972022] Supported: Yes, External
[  343.972026] CPU: 30 PID: 12777 Comm: sg_reset Tainted: G        W  OE    X 4.4.73-5-default #1
[  343.972027] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.1.3 11/20/2013
[  343.972029] task: ffff88018dfc0e80 ti: ffff88042bd7c000 task.ti: ffff88042bd7c000
[  343.972036] RIP: 0010:[<ffffffffa06b8cc6>]  [<ffffffffa06b8cc6>] qedf_flush_active_ios+0x46/0x260 [qedf]
[  343.972038] RSP: 0018:ffff88042bd7fbe0  EFLAGS: 00010286
[  343.972039] RAX: 0000000000000000 RBX: ffff88042ce37800 RCX: 0000000000000400
[  343.972040] RDX: 000000000000060e RSI: ffffffffa06be830 RDI: ffff8807e5072cc0
[  343.972041] RBP: 0000000000001000 R08: ffffffffa06bff4d R09: ffff88018dd84580
[  343.972042] R10: 000000000000018b R11: 0000000000000002 R12: 0000000000002003
[  343.972043] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8807e5072cc0
[  343.972046] FS:  00007fc1c8809700(0000) GS:ffff88042fbc0000(0000) knlGS:0000000000000000
[  343.972048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  343.972049] CR2: 00000000000024a8 CR3: 00000004236ec000 CR4: 00000000001406e0
[  343.972050] Stack:
[  343.972053]  504c78750607e154 ffffffff810a7d10 ffff88042ce37800 0000000000000010
[  343.972055]  0000000000002003 ffff8807ff480c48 ffff8807e5072cc0 ffffc90004ec4ff8
[  343.972057]  ffffffffa06b9b86 ffff880800000010 0000000000000282 ffff88042ce37800
[  343.972058] Call Trace:
[  343.972094]  [<ffffffffa06b9b86>] qedf_initiate_tmf+0x346/0x3e0 [qedf]
[  343.972120]  [<ffffffffa000fa06>] scsi_try_bus_device_reset+0x26/0x40 [scsi_mod]
[  343.972133]  [<ffffffffa001038e>] scsi_ioctl_reset+0x13e/0x260 [scsi_mod]
[  343.972145]  [<ffffffffa000f416>] scsi_ioctl+0x136/0x3d0 [scsi_mod]
[  343.972154]  [<ffffffff812ff6eb>] blkdev_ioctl+0x6bb/0x950
[  343.972164]  [<ffffffff8123cfed>] block_ioctl+0x3d/0x40
[  343.972170]  [<ffffffff81217e2d>] do_vfs_ioctl+0x2cd/0x4a0
[  343.972186]  [<ffffffff81218074>] SyS_ioctl+0x74/0x80
[  343.972193]  [<ffffffff8160916e>] entry_SYSCALL_64_fastpath+0x12/0x6d
[  343.975285] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

766639ca

scsi: qedf: Fix VLAN display when printing sent FIP frames · 15a93de7

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

15a93de7

scsi: qedf: Add missing skb frees in error path · f32803bb

Chad Dupuis authored Apr 25, 2018

Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f32803bb

scsi: qedf: Increase the number of default FIP VLAN request retries to 60 · c3ef86f3

Chad Dupuis authored Apr 25, 2018

Some configurations need more than 30 seconds to respond to a FIP VLAN
request so increase the default to 60 seconds.
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c3ef86f3

scsi: qedf: Synchronize rport restarts when multiple ELS commands time out · 44c7c859

Chad Dupuis authored Apr 25, 2018

If multiple ELS commands time out, such as aborts, they could all try to
restart the same rport and the same time.  This could mean multiple
multiple processes trying to clean up any outstanding commands or trying
to upload the same port.

Add a new flag (QEDF_RPORT_IN_RESET) and check other fcport state flags
before trying to reset the port.

Fixes the crash:

[17501.824701] ------------[ cut here ]------------
[17501.824733] kernel BUG at include/asm-generic/dma-mapping-common.h:65!
[17501.824760] invalid opcode: 0000 [#1] SMP
[17501.824781] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ses enclosure dm_service_time vfat fat sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass joydev btrfs hpilo raid6_pq iTCO_wdt iTCO_vendor_support xor hpwdt ipmi_ssif sg crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ioatdma lpc_ich glue_helper ablk_helper i2c_i801 shpchp cryptd ipmi_si pcspkr acpi_power_meter ipmi_devintf pcc_cpufreq dca wmi ipmi_msghandler dm_multipath nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod
[17501.825119]  crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qedf(OE) drm libfcoe ahci qedi(OE) crct10dif_pclmul libfc libahci uio crct10dif_common crc32c_intel libiscsi libata scsi_transport_iscsi scsi_transport_fc tg3 qede(OE) scsi_tgt hpsa qed(OE) i2c_core ptp scsi_transport_sas pps_core iscsi_boot_sysfs dm_mirror dm_region_hash dm_log dm_mod
[17501.825292] CPU: 8 PID: 10531 Comm: kworker/u96:1 Tainted: G           OE  ------------   3.10.0-693.el7.x86_64 #1
[17501.825330] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 06/02/2016
[17501.825372] Workqueue: fc_rport_eq fc_rport_work [libfc]
[17501.825395] task: ffff88101bca8000 ti: ffff881025278000 task.ti: ffff881025278000
[17501.825424] RIP: 0010:[<ffffffffc042def9>]  [<ffffffffc042def9>] qedf_unmap_sg_list.isra.15+0x89/0x90 [qedf]
[17501.825471] RSP: 0018:ffff88102527bb98  EFLAGS: 00010212
[17501.825493] RAX: ffff8800224eac00 RBX: ffffc9000cd05210 RCX: 0000000000001000
[17501.825520] RDX: 000000007e655e40 RSI: 0000000000001000 RDI: ffff88107fe3b098
[17501.826683] RBP: ffff88102527bba0 R08: ffffffff81a13200 R09: 0000000000000286
[17501.827747] R10: 0000000000000004 R11: 0000000000000005 R12: ffffc9000cd051b8
[17501.828804] R13: ffff881037640c28 R14: 0000000000000007 R15: ffffc9000cd05200
[17501.829850] FS:  0000000000000000(0000) GS:ffff88103fa00000(0000) knlGS:0000000000000000
[17501.830910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17501.831966] CR2: 00007f9b94005f38 CR3: 00000000019f2000 CR4: 00000000003407e0
[17501.833027] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17501.834087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17501.835142] Stack:
[17501.836201]  ffff881033ddbb80 ffff88102527bc30 ffffffffc042f834 0000000000002710
[17501.837264]  ffff88102527bbd0 ffffffff8133d9dd ffffc9000cd052a0 ffff88102527bc30
[17501.838325]  ffffffff816a9c65 0000000000000001 ffff88101bca8000 ffffffff810c4810
[17501.839388] Call Trace:
[17501.840446]  [<ffffffffc042f834>] qedf_scsi_done+0x54/0x1d0 [qedf]
[17501.841504]  [<ffffffff8133d9dd>] ? list_del+0xd/0x30
[17501.842537]  [<ffffffff816a9c65>] ? wait_for_completion_timeout+0x125/0x140
[17501.843560]  [<ffffffff810c4810>] ? wake_up_state+0x20/0x20
[17501.844577]  [<ffffffffc0430311>] qedf_initiate_cleanup+0x2e1/0x310 [qedf]
[17501.845587]  [<ffffffffc04305fe>] qedf_flush_active_ios+0x10e/0x260 [qedf]
[17501.846612]  [<ffffffffc042892f>] qedf_cleanup_fcport+0x5f/0x370 [qedf]
[17501.847613]  [<ffffffffc04292d8>] qedf_rport_event_handler+0x398/0x950 [qedf]
[17501.848602]  [<ffffffff810cdc7c>] ? dequeue_entity+0x11c/0x5d0
[17501.849581]  [<ffffffff81098a2b>] ? __internal_add_timer+0xab/0x130
[17501.850555]  [<ffffffff810ce54e>] ? dequeue_task_fair+0x41e/0x660
[17501.851528]  [<ffffffffc03241a4>] fc_rport_work+0xf4/0x6c0 [libfc]
[17501.852490]  [<ffffffff810a881a>] process_one_work+0x17a/0x440
[17501.853446]  [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

44c7c859