Commits · 948afc69615167a3c82430f99bfd046332b89912 · Kirill Smelkov / linux

25 Apr, 2023 3 commits

scsi: ufs: core: mcq: Fix &hwq->cq_lock deadlock issue · 948afc69

Alice Chao authored Apr 24, 2023

When ufshcd_err_handler() is executed, CQ event interrupt can enter waiting
for the same lock. This can happen in ufshcd_handle_mcq_cq_events() and
also in ufs_mtk_mcq_intr(). The following warning message will be generated
when &hwq->cq_lock is used in IRQ context with IRQ enabled. Use
ufshcd_mcq_poll_cqe_lock() with spin_lock_irqsave instead of spin_lock to
resolve the deadlock issue.

[name:lockdep&]WARNING: inconsistent lock state
[name:lockdep&]--------------------------------
[name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes:
  ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at:
ufshcd_mcq_poll_cqe_lock+0x30/0xe0
[name:lockdep&]{IN-HARDIRQ-W} state was registered at:
  lock_acquire+0x17c/0x33c
  _raw_spin_lock+0x5c/0x7c
  ufshcd_mcq_poll_cqe_lock+0x30/0xe0
  ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod]
  __handle_irq_event_percpu+0x140/0x3ec
  handle_irq_event+0x50/0xd8
  handle_fasteoi_irq+0x148/0x2b0
  generic_handle_domain_irq+0x4c/0x6c
  gic_handle_irq+0x58/0x134
  call_on_irq_stack+0x40/0x74
  do_interrupt_handler+0x84/0xe4
  el1_interrupt+0x3c/0x78
<snip>

Possible unsafe locking scenario:
       CPU0
       ----
  lock(&hwq->cq_lock);
  <Interrupt>
    lock(&hwq->cq_lock);
  *** DEADLOCK ***
2 locks held by kworker/u16:4/260:

[name:lockdep&]
 stack backtrace:
CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S      W  OE
6.1.17-mainline-android14-2-g277223301adb #1
Workqueue: ufs_eh_wq_0 ufshcd_err_handler

 Call trace:
  dump_backtrace+0x10c/0x160
  show_stack+0x20/0x30
  dump_stack_lvl+0x98/0xd8
  dump_stack+0x20/0x60
  print_usage_bug+0x584/0x76c
  mark_lock_irq+0x488/0x510
  mark_lock+0x1ec/0x25c
  __lock_acquire+0x4d8/0xffc
  lock_acquire+0x17c/0x33c
  _raw_spin_lock+0x5c/0x7c
  ufshcd_mcq_poll_cqe_lock+0x30/0xe0
  ufshcd_poll+0x68/0x1b0
  ufshcd_transfer_req_compl+0x9c/0xc8
  ufshcd_err_handler+0x3bc/0xea0
  process_one_work+0x2f4/0x7e8
  worker_thread+0x234/0x450
  kthread+0x110/0x134
  ret_from_fork+0x10/0x20

Fixes: ed975065 ("scsi: ufs: core: mcq: Add completion support in poll")
Reviewed-by: Can Guo <quic_cang@quicinc.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Alice Chao <alice.chao@mediatek.com>
Link: https://lore.kernel.org/r/20230424080400.8955-1-alice.chao@mediatek.comReviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

948afc69

scsi: ipr: Remove several unused variables · 392e4daa

Tom Rix authored Apr 20, 2023

gcc with W=1 reports
drivers/scsi/ipr.c: In function ‘ipr_init_res_entry’:
drivers/scsi/ipr.c:1104:22: error: variable ‘proto’
  set but not used [-Werror=unused-but-set-variable]
 1104 |         unsigned int proto;
      |                      ^~~~~
drivers/scsi/ipr.c: In function ‘ipr_update_res_entry’:
drivers/scsi/ipr.c:1261:22: error: variable ‘proto’
  set but not used [-Werror=unused-but-set-variable]
 1261 |         unsigned int proto;
      |                      ^~~~~
drivers/scsi/ipr.c: In function ‘ipr_change_queue_depth’:
drivers/scsi/ipr.c:4417:36: error: variable ‘res’
  set but not used [-Werror=unused-but-set-variable]
 4417 |         struct ipr_resource_entry *res;
      |                                    ^~~

These variables are not used, so remove them. The lock around res is not
needed so remove that. This makes ioa_cfg and lock_flags unneeded so remove
them as well.

Fixes: 65a15d65 ("scsi: ipr: Remove SATA support")
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230420125035.3888188-1-trix@redhat.comAcked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

392e4daa

scsi: pm80xx: Log device registration · 81221ab7

Akshat Jain authored Apr 11, 2023

Log combination of phy_id and device_id in device registration response.
Signed-off-by: Akshat Jain <akshatzen@google.com>
Signed-off-by: Pranav Prasad <pranavpp@google.com>
Link: https://lore.kernel.org/r/20230411230650.1760757-1-pranavpp@google.comAcked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

81221ab7

19 Apr, 2023 2 commits

scsi: ipr: Remove SATA support · 65a15d65

Brian King authored Apr 12, 2023

Linux SATA support in ipr has always been limited to SATA DVDs. The last
systems that had the option of including a SATA DVD was Power 8, which have
been withdrawn for some time now, so this support can be removed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20230412174015.114764-1-brking@linux.vnet.ibm.comReviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

65a15d65

scsi: scsi_debug: Abort commands from scsi_debug_device_reset() · 0c028b6a

John Garry authored Apr 16, 2023

Currently scsi_debug_device_reset() does not do much apart from setting the
SDEBUG_UA_POR ("Power on, reset, or bus device reset") flag, which is
eventually passed back to the SCSI midlayer later for a "unit attention"
command.

There is a report that blktest scsi/007 test fails due to commit
1107c7b2 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd").
The problem there is that there are dangling scsi_debug queued commands
when we attempt to remove the driver.

scsi/007 test triggers SCSI EH and attempts to abort a timed-out command.
Function scsi_debug_device_reset() is called as part of the EH, but does
not deal with outstanding erroneous command. Prior to the named commit,
removing the driver caused all dangling queued commands to be stopped -
this should have not been necessary.

Fix by aborting outstanding commands on a scsi_device basis from
scsi_debug_device_reset().

Fixes: 1107c7b2 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.comSigned-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230416175654.159163-1-john.g.garry@oracle.comReviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

0c028b6a

12 Apr, 2023 11 commits

scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command() · 3c85f087

Avri Altman authored Mar 29, 2023

Make sqe_base_addr the UTRD pointer it is, instead of an opaque void *.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20230329101303.18377-3-avri.altman@wdc.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3c85f087

scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately · 4de243c4

Avri Altman authored Mar 29, 2023

Allow Sparse and such to know that the hwq lock should be held here.
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Link: https://lore.kernel.org/r/20230329101303.18377-2-avri.altman@wdc.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

4de243c4

scsi: cxlflash: s/semahpore/semaphore/ · cabb6374

Geert Uytterhoeven authored Apr 11, 2023

Fix misspellings of "semaphore".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/d7d04004b818d7ab5d62002f286b0a1b0b493193.1681208251.git.geert+renesas@glider.beSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

cabb6374

scsi: lpfc: Silence an incorrect device output · 8bfb89f6

Jun Chen authored Apr 10, 2023

In lpfc_sli4_pci_mem_unset(), case LPFC_SLI_INTF_IF_TYPE_1 does not have a
break statement, resulting in an incorrect device output.

Fix this by adding a break statement before the default option.
Signed-off-by: Jun Chen <jun_c@hust.edu.cn>
Link: https://lore.kernel.org/r/20230410023724.3209455-1-jun_c@hust.edu.cnReviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

8bfb89f6

scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation · 2acc635a

Ranjan Kumar authored Apr 06, 2023

Driver uses spin lock without irqsave when it needs to acquire a chain
frame. This is done to protect chain frame allocation from multiple
submission threads. If there is any I/O queued from an interrupt context,
and if that requires a chain frame, and if the chain lock is held by the CPU
which got interrupted, then there will be a possible deadlock.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230406101819.10109-1-ranjan.kumar@broadcom.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

2acc635a

scsi: scsi_debug: Fix missing error code in scsi_debug_init() · b32283d7

Harshit Mogalapalli authored Apr 06, 2023

Smatch reports: drivers/scsi/scsi_debug.c:6996
scsi_debug_init() warn: missing error code 'ret'

Although it is unlikely that KMEM_CACHE might fail, but if it does then ret
might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then
goto driver_unreg.

Fixes: 1107c7b2 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.comReviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

b32283d7

scsi: hisi_sas: Work around build failure in suspend function · e01e2290

Arnd Bergmann authored Apr 05, 2023

The suspend/resume functions in this driver seem to have multiple problems,
the latest one just got introduced by a bugfix:

drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw':
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count'
5142 | if (atomic_read(&device->power.usage_count)) {

As far as I can tell, the 'usage_count' is not meant to be accessed by
device drivers at all, though I don't know what the driver is supposed to
do instead.

Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and
marking functions as __maybe_unused to avoid warnings about unused
functions. This should probably be changed to using
DEFINE_RUNTIME_DEV_PM_OPS().

Both changes require actually understanding what the driver needs to do,
and being able to test this, so instead here is the simplest patch to make
it pass the randconfig builds instead.

Fixes: e368d38c ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.orgReviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

e01e2290

scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() · 91a0c0c1

Shuchang Li authored Apr 04, 2023

When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns false, drbl_regs_memmap_p is not remapped. This passes a NULL
pointer to iounmap(), which can trigger a WARN() on certain arches.

When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4)
returns true, drbl_regs_memmap_p may has been remapped and
ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a
NULL pointer to iounmap().

To fix these issues, we need to add null checks before iounmap(), and
change some goto labels.

Fixes: 1351e69f ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Shuchang Li <lishuchang@hust.edu.cn>
Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cnReviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

91a0c0c1

scsi: mpt3sas: Fix an issue when driver is being removed · 85140baf

Tomas Henzl authored Apr 03, 2023

Warnings may be logged during driver removal:

mpt3sas 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT ..,

Fix this by deallocating DMA memory later.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Link: https://lore.kernel.org/r/20230403184736.6399-1-thenzl@redhat.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

85140baf

scsi: mpt3sas: Remove HBA BIOS version in the kernel log · 3fc5d6d6

Ranjan Kumar authored Mar 22, 2023

This is done to avoid ambiguity between BIOS and UEFI versions. Management
tools can be used for getting accurate firmware version information.
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230322092713.6961-1-ranjan.kumar@broadcom.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

3fc5d6d6

scsi: target: core: Fix invalid memory access · a0fde512

Maurizio Lombardi authored Apr 07, 2023

nr_attrs should start counting from zero, otherwise we will end up
dereferencing an invalid memory address.

$ targetcli /loopback create

 general protection fault
 RIP: 0010:configfs_create_file+0x12/0x70
 Call Trace:
  <TASK>
  configfs_attach_item.part.0+0x5f/0x150
  configfs_attach_group.isra.0+0x49/0x120
  configfs_mkdir+0x24f/0x4d0
  vfs_mkdir+0x192/0x240
  do_mkdirat+0x131/0x160
  __x64_sys_mkdir+0x48/0x70
  do_syscall_64+0x5c/0x90

Fixes: 31177b74 ("scsi: target: core: Add RTPI attribute for target port")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Link: https://lore.kernel.org/r/20230407130033.556644-1-mlombard@redhat.comAcked-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

a0fde512

03 Apr, 2023 24 commits

Merge patch series "Fix shost command overloading issues" · dc70c961

Martin K. Petersen authored Apr 02, 2023

John Garry <john.g.garry@oracle.com> says:

It's easy to get scsi_debug to error on throughput testing when we have
multiple shosts:

$ lsscsi
[7:0:0:0]       disk    Linux   scsi_debug      0191
[0:0:0:0]       disk    Linux   scsi_debug      0191

$ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read
--bs=4k --iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg
--eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error
jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 40 processes
[   27.521809] hrtimer: interrupt took 33067 ns
[   27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[   27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
fio: io_u error [   27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00
on file /dev/sda[   27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s

The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
host it is possible to overload the submit queues and run out of tags.

Another separate issue that we may reduce the shost submit queue depth,
sdebug_max_queue, dynamically causing the shost to be overloaded. How many
IOs which the shost may be sent is fixed at can_queue at init time, which
is the same initial value for sdebug_max_queue. So reducing
sdebug_max_queue means that the shost may be sent more IOs than it is
configured to handle, causing overloading.

This series removes the scsi_debug submit queue concept and uses
pre-existing APIs to manage and examine tags, like scsi_block_requests()
and blk_mq_tagset_busy_iter(). Using standard APIs makes the driver more
maintainable and extensible in future.

A restriction is also added to allow sdebug_max_queue only be modified when
no shosts are present, i.e. we need to remove shosts, modify
sdebug_max_queue, and then re-add the shosts.

Link: https://lore.kernel.org/r/20230327074310.1862889-1-john.g.garry@oracle.comSigned-off-by: Martin K. Petersen <martin.petersen@oracle.com>

dc70c961

scsi: scsi_debug: Drop sdebug_queue · f1437cd1