Commits · a45f937110fa6b0c2c06a5d3ef026963a5759050 · Kirill Smelkov / linux

08 Jun, 2021 6 commits

scsi: ufs: Optimize host lock on transfer requests send/compl paths · a45f9371

Can Guo authored May 24, 2021

Current UFS IRQ handler is completely wrapped by host lock, and because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test, the
optimization can bring considerable gain to random read/write performance.

Link: https://lore.kernel.org/r/1621845419-14194-3-git-send-email-cang@codeaurora.org
Cc: Stanley Chu <stanley.chu@mediatek.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a45f9371

scsi: ufs: Remove a redundant command completion logic in error handler · 1cca0c3f

Can Guo authored May 24, 2021

ufshcd_host_reset_and_restore() anyways completes all pending requests
before starts re-probing, so there is no need to complete the command on
the highest bit in tr_doorbell in advance.

Link: https://lore.kernel.org/r/1621845419-14194-2-git-send-email-cang@codeaurora.orgReviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1cca0c3f

scsi: scsi_dh_alua: Fix signedness bug in alua_rtpg() · 80927822

Dan Carpenter authored Jun 03, 2021

The "retval" variable needs to be signed for the error handling to work.

Link: https://lore.kernel.org/r/YLjMEAFNxOas1mIp@mwanda
Fixes: 7e26e3ea ("scsi: scsi_dh_alua: Check for negative result value")
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

80927822

scsi: ufs: core: Remove irrelevant reference to non-existing doc · 8b1afb7a

Avri Altman authored Jun 03, 2021

Remove all references to the description of __ufshcd_wl_{suspend,resume} as
no such description exist.

Fixes: b294ff3e (scsi: ufs: core: Enable power management for wlun)
Link: https://lore.kernel.org/r/20210603122209.635799-1-avri.altman@wdc.comSigned-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8b1afb7a

scsi: fcoe: Statically initialize flogi_maddr · ebab8e09

Kees Cook authored Jun 02, 2021

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy() avoid using an inline const buffer
argument and instead just statically initialize the destination array
directly.

Link: https://lore.kernel.org/r/20210602180000.3326448-1-keescook@chromium.orgReviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ebab8e09

scsi: qedf: Update the max_id value in host structure · 1b67f3d7

Saurav Kashyap authored Jun 02, 2021

host->max_id defines the maximum target id that the SCSI midlayer will
attempt to manually scan. The default is 8. Update the value to the max
sessions the driver supports.

[mkp: applied by hand]

Link: https://lore.kernel.org/r/20210602104653.17278-1-jhasan@marvell.comReviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1b67f3d7

03 Jun, 2021 2 commits

scsi: core: Change the type of the second argument of scsi_host_complete_all_commands() · 62af0ee9

Bart Van Assche authored May 23, 2021

Allow the compiler to verify the type of the second argument passed to
scsi_host_complete_all_commands().

Link: https://lore.kernel.org/r/20210524025457.11299-4-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

62af0ee9

scsi: core: Introduce enums for the SAM and host status codes · 149d0e48

Bart Van Assche authored May 23, 2021

Make it possible for the compiler to verify whether SAM and host
status codes are used correctly.

[mkp: resolve conflicts with Hannes' SCSI result series]

Link: https://lore.kernel.org/r/20210524025457.11299-3-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

149d0e48

02 Jun, 2021 32 commits

scsi: libsas: Introduce more SAM status code aliases in enum exec_status · d377f415

Bart Van Assche authored May 23, 2021

This patch prepares for converting SAM status codes into an enum. Without
this patch converting SAM status codes into an enumeration type would
trigger complaints about enum type mismatches for the SAS code.

Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d377f415

Merge branch '5.14/scsi-result' into 5.14/scsi-staging · 1ff28f22

Martin K. Petersen authored Jun 02, 2021

Include Hannes' SCSI command result rework in the staging branch.

[mkp: remove DRIVER_SENSE from mpi3mr]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1ff28f22

scsi: qedi: Wake up if cmd_cleanup_req is set · ed1b86ba

Mike Christie authored May 25, 2021

If we got a response then we should always wake up the conn. For both the
cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into
iscsi_itt_to_task because we don't know what the upper layers are doing.

We can also remove the qedi_clear_task_idx call here because once we signal
success libiscsi will loop over the affected commands and end up calling
the cleanup_task callout which will release it.

Link: https://lore.kernel.org/r/20210525181821.7617-29-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ed1b86ba

scsi: qedi: Complete TMF works before disconnect · b40f3894

Mike Christie authored May 25, 2021

We need to make sure that abort and reset completion work has completed
before ep_disconnect returns. After ep_disconnect we can't manipulate
cmds because libiscsi will call conn_stop and take onwership.

We are trying to make sure abort work and reset completion work has
completed before we do the cmd clean up in ep_disconnect. The problem is
that:

1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was
still pending we would not see the bit set. We need to do this before
the work is queued.

2. If we had multiple works queued then we could break from the loop in
qedi_ep_disconnect early because when abort work 1 completes it could
clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that
before work 2 has run.

3. A TMF reset completion work could run after ep_disconnect starts
cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
-> qedi_cleanup_all_io would might think it's done cleaning up cmds,
but the reset completion work could still be running. We then return
from ep_disconnect while still doing cleanup.

This replaces the bit with a counter to track the number of queued TMF
works, and adds a bool to prevent new works from starting from the
completion path once a ep_disconnect starts.

Link: https://lore.kernel.org/r/20210525181821.7617-28-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b40f3894

scsi: qedi: Pass send_iscsi_tmf task to abort · 60a0d379

Mike Christie authored May 25, 2021

qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf.

Link: https://lore.kernel.org/r/20210525181821.7617-27-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

60a0d379

scsi: qedi: Fix cleanup session block/unblock use · 0c72191d

Mike Christie authored May 25, 2021

Drivers shouldn't be calling block/unblock session for cmd cleanup because
the functions can change the session state from under libiscsi. This adds
a new a driver level bit so it can block all I/O the host while it drains
the card.

Link: https://lore.kernel.org/r/20210525181821.7617-26-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0c72191d

scsi: qedi: Fix TMF session block/unblock use · 2819b4ae

Mike Christie authored May 25, 2021

Drivers shouldn't be calling block/unblock session for tmf handling because
the functions can change the session state from under libiscsi.
iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu->
iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi
after we've started handling a TMF. So we don't need to try and block it in
the driver, and we can remove these block calls.

Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2819b4ae

scsi: qedi: Use GFP_NOIO for TMF allocation · 140d63b7

Mike Christie authored May 25, 2021

We run from a workqueue with no locks held so use GFP_NOIO.

Link: https://lore.kernel.org/r/20210525181821.7617-24-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

140d63b7

scsi: qedi: Fix TMF tid allocation · f7eea752

Mike Christie authored May 25, 2021

qedi_iscsi_abort_work and qedi_tmf_work both allocate a tid then call
qedi_send_iscsi_tmf which also allocates a tid. This removes the tid
allocation from the callers.

Link: https://lore.kernel.org/r/20210525181821.7617-23-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f7eea752

scsi: qedi: Fix use after free during abort cleanup · 5b04d050

Mike Christie authored May 25, 2021

If qedi_tmf_work's qedi_wait_for_cleanup_request call times out we will
also force the clean up of the qedi_work_map but
qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd.

To fix this issue we extend where we hold the tmf_work_lock and back_lock
so the qedi_process_cmd_cleanup_resp access is serialized with the cleanup
done in qedi_tmf_work and any completion handling for the iscsi_task.

Link: https://lore.kernel.org/r/20210525181821.7617-22-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5b04d050

scsi: qedi: Fix race during abort timeouts · 2ce00236

Mike Christie authored May 25, 2021

If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then
the qedi qedi_cmd->task_id could be freed and used for another cmd. If we
then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up
the wrong cmd.

Wait to release the task_id until the last put has been done on the
iscsi_task. Because libiscsi grabs a ref to the task when sending the
abort, we know that for the non-abort timeout case that the task_id we are
referencing is for the cmd that was supposed to be aborted.

A latter commit will fix the case where the abort times out while we are
running qedi_tmf_work.

Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2ce00236

scsi: qedi: Fix null ref during abort handling · 5777b7f0

Mike Christie authored May 25, 2021

If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets
list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is
non-NULL when it wants to force cleanup.

Link: https://lore.kernel.org/r/20210525181821.7617-20-michael.christie@oracle.comReviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

5777b7f0

scsi: iscsi: Move pool freeing · a1f3486b

Mike Christie authored May 25, 2021

This doesn't fix any bugs, but it makes more sense to free the pool after
we have removed the session. At that time we know nothing is touching any
of the session fields, because all devices have been removed and scans are
stopped.

Link: https://lore.kernel.org/r/20210525181821.7617-19-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a1f3486b

scsi: iscsi: Hold task ref during TMF timeout handling · 99b06033

Mike Christie authored May 25, 2021

For aborts, qedi needs to cleanup the FW then send the TMF from a worker
thread. While it's doing these the cmd could complete normally and the TMF
could time out. libiscsi would then complete the iscsi_task which will call
into the driver to cleanup the driver level resources while it still might
be accessing them for the cleanup/abort.

This has iscsi_eh_abort keep the iscsi_task ref if the TMF times out, so
qedi does not have to worry about if the task is being freed while in use
and does not need to get its own ref.

Link: https://lore.kernel.org/r/20210525181821.7617-18-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

99b06033

scsi: iscsi: Flush block work before unblock · 7ce9fc5e

Mike Christie authored May 25, 2021

We set the max_active iSCSI EH works to 1, so all work is going to execute
in order by default. However, userspace can now override this in sysfs. If
max_active > 1, we can end up with the block_work on CPU1 and
iscsi_unblock_session running the unblock_work on CPU2 and the session and
target/device state will end up out of sync with each other.

This adds a flush of the block_work in iscsi_unblock_session.

Link: https://lore.kernel.org/r/20210525181821.7617-17-michael.christie@oracle.com
Fixes: 1d726aa6 ("scsi: iscsi: Optimize work queue flush use")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

7ce9fc5e

scsi: iscsi: Fix completion check during abort races · f6f96457

Mike Christie authored May 25, 2021

We have a ref to the task being aborted, so SCp.ptr will never be NULL. We
need to use iscsi_task_is_completed to check for the completed state.

Link: https://lore.kernel.org/r/20210525181821.7617-16-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

f6f96457

scsi: iscsi: Fix shost->max_id use · bdd4aad7

Mike Christie authored May 25, 2021

The iscsi offload drivers are setting the shost->max_id to the max number
of sessions they support. The problem is that max_id is not the max number
of targets but the highest identifier the targets can have. To use it to
limit the number of targets we need to set it to max sessions - 1, or we
can end up with a session we might not have preallocated resources for.

Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

bdd4aad7

scsi: iscsi: Fix conn use after free during resets · ec29d0ac

Mike Christie authored May 25, 2021

If we haven't done a unbind target call we can race where
iscsi_conn_teardown wakes up the EH thread and then frees the conn while
those threads are still accessing the conn ehwait.

We can only do one TMF per session so this just moves the TMF fields from
the conn to the session. We can then rely on the
iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call
to remove the target and it's devices, and know after that point there is
no device or scsi-ml callout trying to access the session.

Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

ec29d0ac

scsi: iscsi: Get ref to conn during reset handling · fda290c5

Mike Christie authored May 25, 2021

The comment in iscsi_eh_session_reset is wrong and we don't wait for the
EH to complete before tearing down the conn. This has us get a ref to the
conn when we are not holding the eh_mutex/frwd_lock so it does not get
freed from under us.

Link: https://lore.kernel.org/r/20210525181821.7617-13-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

fda290c5

scsi: iscsi: Have abort handler get ref to conn · d39df158

Mike Christie authored May 25, 2021

If SCSI midlayer is aborting a task when we are tearing down the conn we
could free the conn while the abort thread is accessing the conn. This has
the abort handler get a ref to the conn so it won't be freed from under it.

Note: this is not needed for device/target reset because we are holding the
eh_mutex when accessing the conn.

Link: https://lore.kernel.org/r/20210525181821.7617-12-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

d39df158

scsi: iscsi: Add iscsi_cls_conn refcount helpers · b1d19e8c

Mike Christie authored May 25, 2021

There are a couple places where we could free the iscsi_cls_conn while it's
still in use. This adds some helpers to get/put a refcount on the struct
and converts an exiting user. Subsequent commits will then use the helpers
to fix 2 bugs in the eh code.

Link: https://lore.kernel.org/r/20210525181821.7617-11-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b1d19e8c

scsi: iscsi: iscsi_tcp: Start socket shutdown during conn stop · 788b71c5

Mike Christie authored May 25, 2021

Make sure the conn socket shutdown starts before we start the timer to fail
commands to upper layers.

Link: https://lore.kernel.org/r/20210525181821.7617-10-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

788b71c5

scsi: iscsi: iscsi_tcp: Set no linger · c0920cd3

Mike Christie authored May 25, 2021

Userspace (open-iscsi based tools at least) sets no linger on the socket to
prevent stale data from being sent. However, with the in-kernel cleanup if
userspace is not up the sockfd_put will release the socket without having
set that sockopt.

iscsid sets that opt at socket close time, but it seems ok to set this at
setup time in the kernel for all tools.

Link: https://lore.kernel.org/r/20210525181821.7617-9-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

c0920cd3

scsi: iscsi: Fix in-kernel conn failure handling · 23d6fefb

Mike Christie authored May 25, 2021

Commit 0ab71045 ("scsi: iscsi: Perform connection failure entirely in
kernel space") has the following regressions/bugs that this patch fixes:

1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new I/O to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect if userspace is not up.

This patch only fixes the issue for offload drivers. iscsi_tcp will be
fixed in separate commit because it doesn't have a ep_disconnect call.

2. The drivers that implement ep_disconnect expect that it's called before
conn_stop. Besides crashes, if the cleanup_task callout is called before
ep_disconnect it might free up driver/card resources for session1 then they
could be allocated for session2. But because the driver's ep_disconnect is
not called it has not cleaned up the firmware so the card is still using
the resources for the original cmd.

3. The stop_conn_work_fn can run after userspace has done its recovery and
we are happily using the session. We will then end up with various bugs
depending on what is going on at the time.

We may also run stop_conn_work_fn late after userspace has called stop_conn
and ep_disconnect and is now going to call start/bind conn. If
stop_conn_work_fn runs after bind but before start, we would leave the conn
in a unbound but sort of started state where IO might be allowed even
though the drivers have been set in a state where they no longer expect
I/O.

4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in
kernel stop_conn function is breaking userspace. We should have been doing
this for the caller.

Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com
Fixes: 0ab71045 ("scsi: iscsi: Perform connection failure entirely in kernel space")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

23d6fefb

scsi: iscsi: Rel ref after iscsi_lookup_endpoint() · 9e5fe170

Mike Christie authored May 25, 2021

Subsequent commits allow the kernel to do ep_disconnect. In that case we
will have to get a proper refcount on the ep so one thread does not delete
it from under another.

Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9e5fe170

scsi: iscsi: Use system_unbound_wq for destroy_work · b25b957d

Mike Christie authored May 25, 2021

Use the system_unbound_wq for async session destruction. We don't need a
dedicated workqueue for async session destruction because:

1. perf does not seem to be an issue since we only allow 1 active work.

2. it does not have deps with other system works and we can run them in
parallel with each other.

Link: https://lore.kernel.org/r/20210525181821.7617-6-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b25b957d

scsi: iscsi: Force immediate failure during shutdown · 06c203a5

Mike Christie authored May 25, 2021

If the system is not up, we can just fail immediately since iscsid is not
going to ever answer our netlink events. We are already setting the
recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will
block the session and start the recovery timer, because for that flag
userspace will do the unbind and destroy events which would remove the
devices and wake up and kill the eh.

Since the conn is dead and the system is going dowm this just has us use
STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately. However, if
the user has set the recovery_tmo=-1 we let the system hang like they
requested since they might have used that setting for specific reasons
(one known reason is for buggy cluster software).

Link: https://lore.kernel.org/r/20210525181821.7617-5-michael.christie@oracle.comSigned-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

06c203a5

scsi: iscsi: Drop suspend calls from ep_disconnect · 27e98628

Mike Christie authored May 25, 2021

libiscsi will now suspend the send/tx queue for the drivers so we can drop
it from the drivers ep_disconnect.

Link: https://lore.kernel.org/r/20210525181821.7617-4-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

27e98628

scsi: iscsi: Stop queueing during ep_disconnect · 891e2639

Mike Christie authored May 25, 2021

During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new
I/O but every driver except cxgbi and iscsi_tcp can still get I/O from
__iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before
ep_disconnect. This could happen if we were terminating the session, and
the logout timed out before it was even sent to libiscsi.

Fix the issue by adding a helper which reverses the bind_conn call that
allows new I/O to be queued. Drivers implementing ep_disconnect can use this
to make sure new I/O is not queued to them when handling the disconnect.

Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

891e2639

scsi: iscsi: Add task completion helper · 1486a4f5

Mike Christie authored May 25, 2021

This adds a helper to detect if a cmd has completed but is not yet freed.

Link: https://lore.kernel.org/r/20210525181821.7617-2-michael.christie@oracle.comReviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

1486a4f5

scsi: megaraid_sas: Update driver version to 07.717.02.00-rc1 · 6143f6f6

Chandrakanth Patil authored May 28, 2021

Link: https://lore.kernel.org/r/20210528131307.25683-6-chandrakanth.patil@broadcom.comSigned-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

6143f6f6

scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs · 9bedd36e

Chandrakanth Patil authored May 28, 2021

While reenabling the IRQ after IRQ poll there may be a small window for the
firmware to post the replies with interrupts raised. In that case the
driver will not see the interrupts which leads to I/O timeout.

This issue only happens when there are many I/O completions on a single
reply queue. This forces the driver to switch between the interrupt and IRQ
context.

Make the driver process the reply queue one more time after enabling the
IRQ.

Link: https://lore.kernel.org/linux-scsi/20201102072746.27410-1-sreekanth.reddy@broadcom.com/
Link: https://lore.kernel.org/r/20210528131307.25683-5-chandrakanth.patil@broadcom.com
Cc: Tomas Henzl <thenzl@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

9bedd36e