- 26 Feb, 2019 3 commits
-
-
Hannes Reinecke authored
The change to use dma_set_mask_and_coherent() incorrectly made a second call with the 32 bit DMA mask value when the call with the 64 bit DMA mask value succeeded. Fixes: b1fa1229 ("scsi: 3w-sas: fully convert to the generic DMA API") Cc: <stable@vger.kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Hannes Reinecke authored
The change to use dma_set_mask_and_coherent() incorrectly made a second call with the 32 bit DMA mask value when the call with the 64 bit DMA mask value succeeded. Fixes: b000bced ("scsi: 3w-9xxx: fully convert to the generic DMA API") Cc: <stable@vger.kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Hannes Reinecke authored
The change to use dma_set_mask_and_coherent() incorrectly made a second call with the 32 bit DMA mask value when the call with the 64 bit DMA mask value succeeded. This resulted in NVMe/FC connections failing due to corrupted data buffers, and various other SCSI/FCP I/O errors. Fixes: f30e1bfd ("scsi: lpfc: use dma_set_mask_and_coherent") Cc: <stable@vger.kernel.org> Suggested-by: Don Dutile <ddutile@redhat.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 16 Feb, 2019 4 commits
-
-
Martin Wilck authored
Up to 4.12, __scsi_error_from_host_byte() would reset the host byte to DID_OK for various cases including DID_NEXUS_FAILURE. Commit 2a842aca ("block: introduce new block status code type") replaced this function with scsi_result_to_blk_status() and removed the host-byte resetting code for the DID_NEXUS_FAILURE case. As the line set_host_byte(cmd, DID_OK) was preserved for the other cases, I suppose this was an editing mistake. The fact that the host byte remains set after 4.13 is causing problems with the sg_persist tool, which now returns success rather then exit status 24 when a RESERVATION CONFLICT error is encountered. Fixes: 2a842aca "block: introduce new block status code type" Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
John Garry authored
The sysfs phy_identifier attribute for a sas_end_device comes from the rphy phy_identifier value. Currently this is not being set for rphys with an end device attached, so we see incorrect symlinks from systemd disk/by-path: root@localhost:~# ls -l /dev/disk/by-path/ total 0 lrwxrwxrwx 1 root root 9 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 10 Feb 13 12:26 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy0-lun-0-part3 -> ../../sdc3 Indeed, each sas_end_device phy_identifier value is 0: root@localhost:/# more sys/class/sas_device/end_device-0\:0\:2/phy_identifier 0 root@localhost:/# more sys/class/sas_device/end_device-0\:0\:10/phy_identifier 0 This patch fixes the discovery code to set the phy_identifier. With this, we now get proper symlinks: root@localhost:~# ls -l /dev/disk/by-path/ total 0 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy10-lun-0 -> ../../sdg lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy11-lun-0 -> ../../sdh lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy2-lun-0 -> ../../sda lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy2-lun-0-part1 -> ../../sda1 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0-part1 -> ../../sdb1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy3-lun-0-part2 -> ../../sdb2 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0 -> ../../sdc lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part1 -> ../../sdc1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part2 -> ../../sdc2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy4-lun-0-part3 -> ../../sdc3 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy5-lun-0 -> ../../sdd lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0 -> ../../sde lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part1 -> ../../sde1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part2 -> ../../sde2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy7-lun-0-part3 -> ../../sde3 lrwxrwxrwx 1 root root 9 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0 -> ../../sdf lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part1 -> ../../sdf1 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part2 -> ../../sdf2 lrwxrwxrwx 1 root root 10 Feb 13 11:53 platform-HISI0162:01-sas-exp0x500e004aaaaaaa1f-phy8-lun-0-part3 -> ../../sdf3 Fixes: 2908d778 ("[SCSI] aic94xx: new driver") Reported-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Tested-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Masato Suzuki authored
The function sd_zbc_do_report_zones() issues a REPORT ZONES command with a buffer size calculated based on the number of zones requested by the caller. This value should however not exceed the capabilities of the hardware maximum command size, that is, should not exceed the max_hw_sectors limit of the device. This problem leads to failures of report zones commands when re-validating disks with some SAS HBAs. Fix this by limiting a report zone command buffer size to the minimum of the device max_hw_sectors and calculated value based on the requested number of zones. This does not change the semantic of the report_zones file operation as report zones can always return less zone reports than requested. Short reports are handled using a loop execution of the report_zones file operation in the function blk_report_zones(). [Damien] Before patch 'e76239a3 ("block: add a report_zones method")', report zones buffer allocation was limited to max_sectors when allocated in blk_report_zones(). This however does not consider the actual format of the device reply which is interface dependent. Limiting the allocation based on the size of the expected reply format rather than the size of the array of generic sturct blkzone passed by blk_report_zones() makes more sense. Fixes: e76239a3 ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Anoob Soman authored
When a target sends Check Condition, whilst initiator is busy xmiting re-queued data, could lead to race between iscsi_complete_task() and iscsi_xmit_task() and eventually crashing with the following kernel backtrace. [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 [3326150.987549] ALERT: IP: [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0 [3326150.987582] WARN: Oops: 0002 [#1] SMP [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 4.4.0+2 #1 [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 2.5.4 01/22/2016 [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi] [3326150.987799] WARN: task: ffff8801d50f3800 ti: ffff8801f5458000 task.ti: ffff8801f5458000 [3326150.987810] WARN: RIP: e030:[<ffffffffa05ce70d>] [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987825] WARN: RSP: e02b:ffff8801f545bdb0 EFLAGS: 00010246 [3326150.987831] WARN: RAX: 00000000ffffffc3 RBX: ffff880282d2ab20 RCX: ffff88026b6ac480 [3326150.987842] WARN: RDX: 0000000000000000 RSI: 00000000fffffe01 RDI: ffff880282d2ab20 [3326150.987852] WARN: RBP: ffff8801f545bdc8 R08: 0000000000000000 R09: 0000000000000008 [3326150.987862] WARN: R10: 0000000000000000 R11: 000000000000fe88 R12: 0000000000000000 [3326150.987872] WARN: R13: ffff880282d2abe8 R14: ffff880282d2abd8 R15: ffff880282d2ac08 [3326150.987890] WARN: FS: 00007f5a866b4840(0000) GS:ffff88028a640000(0000) knlGS:0000000000000000 [3326150.987900] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [3326150.987907] WARN: CR2: 0000000000000078 CR3: 0000000070244000 CR4: 0000000000042660 [3326150.987918] WARN: Stack: [3326150.987924] WARN: ffff880282d2ad58 ffff880282d2ab20 ffff880282d2abe8 ffff8801f545be18 [3326150.987938] WARN: ffffffffa05cea90 ffff880282d2abf8 ffff88026b59cc80 ffff88026b59cc00 [3326150.987951] WARN: ffff88022acf32c0 ffff880289491800 ffff880255a80800 0000000000000400 [3326150.987964] WARN: Call Trace: [3326150.987975] WARN: [<ffffffffa05cea90>] iscsi_xmitworker+0x2f0/0x360 [libiscsi] [3326150.987988] WARN: [<ffffffff8108862c>] process_one_work+0x1fc/0x3b0 [3326150.987997] WARN: [<ffffffff81088f95>] worker_thread+0x2a5/0x470 [3326150.988006] WARN: [<ffffffff8159cad8>] ? __schedule+0x648/0x870 [3326150.988015] WARN: [<ffffffff81088cf0>] ? rescuer_thread+0x300/0x300 [3326150.988023] WARN: [<ffffffff8108ddf5>] kthread+0xd5/0xe0 [3326150.988031] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110 [3326150.988040] WARN: [<ffffffff815a0bcf>] ret_from_fork+0x3f/0x70 [3326150.988048] WARN: [<ffffffff8108dd20>] ? kthread_stop+0x110/0x110 [3326150.988127] ALERT: RIP [<ffffffffa05ce70d>] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.988138] WARN: RSP <ffff8801f545bdb0> [3326150.988144] WARN: CR2: 0000000000000078 [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]--- Commit 6f8830f5 ("scsi: libiscsi: add lock around task lists to fix list corruption regression") introduced "taskqueuelock" to fix list corruption during the race, but this wasn't enough. Re-setting of conn->task to NULL, could race with iscsi_xmit_task(). iscsi_complete_task() { .... if (conn->task == task) conn->task = NULL; } conn->task in iscsi_xmit_task() could be NULL and so will be task. __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount. iscsi_xmit_task() { struct iscsi_task *task = conn->task; __iscsi_get_task(task); } This commit will take extra conn->session->back_lock in iscsi_xmit_task() to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if iscsi_complete_task() wins the race. If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments task->refcount (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task(). Signed-off-by: Anoob Soman <anoob.soman@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 13 Feb, 2019 2 commits
-
-
Bill Kuzeja authored
In qla2x00_async_tm_cmd, we reference off sp after it has been freed. This caused a panic on a system running a slub debug kernel. Since fcport is passed in anyways, just use that instead. Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com> Acked-by: Giridhar Malavali <gmalavali@marvell.com> Acked-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
James Bottomley authored
The problem is that the default for MQ is not to gather entropy, whereas the default for the legacy queue was always to gather it. The original attempt to fix entropy gathering for rotational disks under MQ added an else branch in sd_read_block_characteristics(). Unfortunately, the entire check isn't reached if the device has no characteristics VPD page. Since this page was only introduced in SBC-3 and its optional anyway, most less expensive rotational disks don't have one, meaning they all stopped gathering entropy when we made MQ the default. In a wholly unrelated change, openssl and openssh won't function until the random number generator is initialised, meaning lots of people have been seeing large delays before they could log into systems with default MQ kernels due to this lack of entropy, because it now can take tens of minutes to initialise the kernel random number generator. The fix is to set the non-rotational and add-randomness flags unconditionally early on in the disk initialization path, so they can be reset only if the device actually reports being non-rotational via the VPD page. Reported-by: Mikael Pettersson <mikpelinux@gmail.com> Fixes: 83e32a59 ("scsi: sd: Contribute to randomness when running rotational device") Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Xuewei Zhang <xueweiz@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 05 Feb, 2019 4 commits
-
-
Vaibhav Jain authored
Presently when an error is encountered during probe of the cxlflash adapter, a deadlock is seen with cpu thread stuck inside cxlflash_remove(). Below is the trace of the deadlock as logged by khungtaskd: cxlflash 0006:00:00.0: cxlflash_probe: init_afu failed rc=-16 INFO: task kworker/80:1:890 blocked for more than 120 seconds. Not tainted 5.0.0-rc4-capi2-kexec+ #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/80:1 D 0 890 2 0x00000808 Workqueue: events work_for_cpu_fn Call Trace: 0x4d72136320 (unreliable) __switch_to+0x2cc/0x460 __schedule+0x2bc/0xac0 schedule+0x40/0xb0 cxlflash_remove+0xec/0x640 [cxlflash] cxlflash_probe+0x370/0x8f0 [cxlflash] local_pci_probe+0x6c/0x140 work_for_cpu_fn+0x38/0x60 process_one_work+0x260/0x530 worker_thread+0x280/0x5d0 kthread+0x1a8/0x1b0 ret_from_kernel_thread+0x5c/0x80 INFO: task systemd-udevd:5160 blocked for more than 120 seconds. The deadlock occurs as cxlflash_remove() is called from cxlflash_probe() without setting 'cxlflash_cfg->state' to STATE_PROBED and the probe thread starts to wait on 'cxlflash_cfg->reset_waitq'. Since the device was never successfully probed the 'cxlflash_cfg->state' never changes from STATE_PROBING hence the deadlock occurs. We fix this deadlock by setting the variable 'cxlflash_cfg->state' to STATE_PROBED in case an error occurs during cxlflash_probe() and just before calling cxlflash_remove(). Cc: stable@vger.kernel.org Fixes: c21e0bbf("cxlflash: Base support for IBM CXL Flash Adapter") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ross Lagerwall authored
This reverts commit bbc0f8bd. It added a warning whose intent was to check whether the rport was still linked into the peer list. It doesn't work as intended and gives false positive warnings for two reasons: 1) If the rport is never linked into the peer list it will not be considered empty since the list_head is never initialized. 2) If the rport is deleted from the peer list using list_del_rcu(), then the list_head is in an undefined state and it is not considered empty. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Damien Le Moal authored
Commit bf505456 ("block: Introduce blk_revalidate_disk_zones()") inadvertently broke the message output of sd_zbc_print_zones() because the zone information initialization of the scsi disk structure was moved to the second scan run while sd_zbc_print_zones() is called on the first scan. This leads to the following incorrect message to be printed for any ZBC or ZAC zoned disks. "...[sdX] 4294967295 zones of 0 logical blocks + 1 runt zone" Fix this by initializing sdkp zone size and number of zones early on the first scan. This does not impact the execution of blk_revalidate_zones(). This functions is still called only once the block device capacity is set on the second revalidate run on boot, or if the disk zone configuration changed (i.e. the disk changed). Fixes: bf505456 ("block: Introduce blk_revalidate_disk_zones()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
David Disseldorp authored
pi_prot_format conversion to write-only caused userspace breakage. Make the ConfigFS path readable again and hardcode the "0\n" content, matching previous output. Fixes: 6baca760 ("scsi: target: drop unused pi_prot_format attribute storage") Link: https://bugzilla.redhat.com/show_bug.cgi?id=1667505Reported-by: Lee Duncan <lduncan@suse.com> Reported-by: Laura Abbott <labbott@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 01 Feb, 2019 1 commit
-
-
James Bottomley authored
The aic94xx driver is currently failing to load with errors like sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.3/0000:07:02.0/revision' Because the PCI code had recently added a file named 'revision' to every PCI device. Fix this by renaming the aic94xx revision file to aic_revision. This is safe to do for us because as far as I can tell, there's nothing in userspace relying on the current aic94xx revision file so it can be renamed without breaking anything. Fixes: 702ed3be (PCI: Create revision file in sysfs) Cc: stable@vger.kernel.org Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 29 Jan, 2019 5 commits
-
-
Dan Carpenter authored
The "hostdata->dev" pointer is NULL here. We set "hostdata->dev = dev;" later in the function and we also use "hostdata->dev" when we call dma_free_attrs() in NCR_700_release(). This bug predates git version control. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Dan Carpenter authored
There are two issues here. First if cmgr->hba is not set early enough then it leads to a NULL dereference. Second if we don't completely initialize cmgr->io_bdt_pool[] then we end up dereferencing uninitialized pointers. Fixes: 853e2bd2 ("[SCSI] bnx2fc: Broadcom FCoE offload driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Douglas Gilbert authored
The WRITE SAME(10) and (16) implementations didn't take account of the buffer wrap required when the virtual_gb parameter is greater than 0. Fix that and rename the fake_store() function to lba2fake_store() to lessen confusion with the global fake_storep pointer. Bump version date. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reported-by: Bart Van Assche <bvanassche@acm.org> Tested by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ming Lu authored
The issue to be fixed in this commit is when libfc found it received a invalid FLOGI response from FC switch, it would return without freeing the fc frame, which is just the skb data. This would cause memory leak if FC switch keeps sending invalid FLOGI responses. This fix is just to make it execute `fc_frame_free(fp)` before returning from function `fc_lport_flogi_resp`. Signed-off-by: Ming Lu <ming.lu@citrix.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Steffen Maier authored
Since v2.6.35 commit 68322984 ("[SCSI] zfcp: Report scatter-gather limits to SCSI and block layer"), zfcp set dma_parms.max_segment_size == PAGE_SIZE (but without using the setter dma_set_max_seg_size()) and scsi_host_template.dma_boundary == PAGE_SIZE - 1. v5.0-rc1 commit 50c2e910 ("scsi: introduce a max_segment_size host_template parameters") introduced a new field scsi_host_template.max_segment_size. If an LLDD such as zfcp does not set it, scsi_host_alloc() uses BLK_MAX_SEGMENT_SIZE = 65536 for Scsi_Host.max_segment_size. __scsi_init_queue() announced the minimum of Scsi_Host.max_segment_size and dma_parms.max_segment_size to the block layer. For zfcp: min(65536, 4096) == 4096 which was still good. v5.0 commit a8cf59a6 ("scsi: communicate max segment size to the DMA mapping code") announces Scsi_Host.max_segment_size to the block layer and overwrites dma_parms.max_segment_size with Scsi_Host.max_segment_size. For zfcp dma_parms.max_segment_size == Scsi_Host.max_segment_size == 65536 which is also reflected in block queue limits. $ cd /sys/bus/ccw/drivers/zfcp $ cd 0.0.3c40/host5/rport-5:0-4/target5:0:4/5:0:4:10/block/sdi/queue $ cat max_segment_size 65536 Zfcp I/O still works because dma_boundary implicitly still keeps the effective max segment size <= PAGE_SIZE. However, dma_boundary does not seem visible to user space, but max_segment_size is visible and shows a misleading wrong value. Fix it and inherit the stable tag of a8cf59a6. Devices on our bus ccw support DMA but no DMA mapping. Of multiple device types on the ccw bus, only zfcp needs dma_parms for SCSI limits. So, leave dma_parms setup in zfcp and do not move it to the bus. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 50c2e910 ("scsi: introduce a max_segment_size host_template parameters") Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 23 Jan, 2019 6 commits
-
-
Marc Gonzalez authored
memcpy_fromio() doesn't provide any control over access size. For example, on arm64, it is implemented using readb and readq. This may trigger a synchronous external abort: [ 3.729943] Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP [ 3.737000] Modules linked in: [ 3.744371] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G S 4.20.0-rc4 #16 [ 3.747413] Hardware name: Qualcomm Technologies, Inc. MSM8998 v1 MTP (DT) [ 3.755295] pstate: 00000005 (nzcv daif -PAN -UAO) [ 3.761978] pc : __memcpy_fromio+0x68/0x80 [ 3.766718] lr : ufshcd_dump_regs+0x50/0xb0 [ 3.770767] sp : ffff00000807ba00 [ 3.774830] x29: ffff00000807ba00 x28: 00000000fffffffb [ 3.778344] x27: ffff0000089db068 x26: ffff8000f6e58000 [ 3.783728] x25: 000000000000000e x24: 0000000000000800 [ 3.789023] x23: ffff8000f6e587c8 x22: 0000000000000800 [ 3.794319] x21: ffff000008908368 x20: ffff8000f6e1ab80 [ 3.799615] x19: 000000000000006c x18: ffffffffffffffff [ 3.804910] x17: 0000000000000000 x16: 0000000000000000 [ 3.810206] x15: ffff000009199648 x14: ffff000089244187 [ 3.815502] x13: ffff000009244195 x12: ffff0000091ab000 [ 3.820797] x11: 0000000005f5e0ff x10: ffff0000091998a0 [ 3.826093] x9 : 0000000000000000 x8 : ffff8000f6e1ac00 [ 3.831389] x7 : 0000000000000000 x6 : 0000000000000068 [ 3.836676] x5 : ffff8000f6e1abe8 x4 : 0000000000000000 [ 3.841971] x3 : ffff00000928c868 x2 : ffff8000f6e1abec [ 3.847267] x1 : ffff00000928c868 x0 : ffff8000f6e1abe8 [ 3.852567] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) [ 3.857900] Call trace: [ 3.864473] __memcpy_fromio+0x68/0x80 [ 3.866683] ufs_qcom_dump_dbg_regs+0x1c0/0x370 [ 3.870522] ufshcd_print_host_regs+0x168/0x190 [ 3.874946] ufshcd_init+0xd4c/0xde0 [ 3.879459] ufshcd_pltfrm_init+0x3c8/0x550 [ 3.883264] ufs_qcom_probe+0x24/0x60 [ 3.887188] platform_drv_probe+0x50/0xa0 Assuming aligned 32-bit registers, let's use readl, after making sure that 'offset' and 'len' are indeed multiples of 4. Fixes: ba80917d ("scsi: ufs: ufshcd_dump_regs to use memcpy_fromio") Cc: <stable@vger.kernel.org> Signed-off-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Reviewed-by: Jeffrey Hugo <jhugo@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Tested-by: Evan Green <evgreen@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiubo Li authored
Fixes: a94a2572 ("scsi: tcmu: avoid cmd/qfull timers updated whenever a new cmd comes") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mike Christie <mchristi@redhat.com> Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Varun Prakash authored
Assign fc_vport to ln->fc_vport before calling csio_fcoe_alloc_vnp() to avoid a NULL pointer dereference in csio_vport_set_state(). ln->fc_vport is dereferenced in csio_vport_set_state(). Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ewan D. Milne authored
We cannot wait on a completion object in the lpfc_nvme_targetport structure in the _destroy_targetport() code path because the NVMe/fc transport will free that structure immediately after the .targetport_delete() callback. This results in a use-after-free, and a hang if slub_debug=FZPU is enabled. Fix this by putting the completion on the stack. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ewan D. Milne authored
We cannot wait on a completion object in the lpfc_nvme_lport structure in the _destroy_localport() code path because the NVMe/fc transport will free that structure immediately after the .localport_delete() callback. This results in a use-after-free, and a hang if slub_debug=FZPU is enabled. Fix this by putting the completion on the stack. Signed-off-by: Ewan D. Milne <emilne@redhat.com> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Christoph Hellwig authored
When a host driver sets a maximum segment size we should not only propagate that setting to the block layer, which can merge segments, but also to the DMA mapping layer which can merge segments as well. Fixes: 50c2e910 ("scsi: introduce a max_segment_size host_template parameters") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 12 Jan, 2019 9 commits
-
-
Varun Prakash authored
In case of ->set_param() and ->bind_conn() cxgb4i driver does not wait for cmd completion, this can create race conditions, to avoid this add wait_for_completion(). Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Thomas Bogendoerfer authored
After Commit 54aed4dd ("MIPS: IP27: use dma_direct_ops") qla1280 driver failed on SGI IP27 machines with qla1280: QLA1040 found on PCI bus 0, dev 0 qla1280 0000:00:00.0: enabling device (0006 -> 0007) qla1280: Failed to get request memory qla1280: probe of 0000:00:00.0 failed with error -12 Reason is that SGI IP27 always generates 64bit DMA addresses and has no fallback mode for 32bit DMA addresses implemented. QLA1280 supports 64bit addressing for all DMA accesses so setting coherent mask to 64bit fixes the issue. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Avri Altman authored
Albeit we no longer rely on those hard-coded descriptor sizes, we still use them as our defaults, so better get it right. While adding its sysfs entries, we forgot to update the geometry descriptor size. It is 0x48 according to UFS2.1, and wasn't changed in UFS3.0. [mkp: typo] Fixes: c720c091 (scsi: ufs: sysfs: geometry descriptor) Signed-off-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Shivasharan S authored
commit 272652fc ("scsi: megaraid_sas: add retry logic in megasas_readl") missed changing readl to megasas_readl in megasas_clear_intr_fusion(). For Aero controllers, reads of outbound_intr_status register needs to be retried. Reported-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Manish Rangankar authored
When the driver finds invalid destination MAC for the first un-reachable target, and before completes the PATH_REQ operation, set new ep_state to OFFLDCONN_NONE so that as part of driver ep_poll mechanism, the upper open-iscsi layer is notified to complete the login process on the first un-reachable target and thus proceed login to other reachable targets. Signed-off-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Stanley Chu authored
hba->is_sys_suspended is set after successful system suspend but not clear after successful system resume. According to current behavior, hba->is_sys_suspended will not be set if host is runtime-suspended but not system-suspended. Thus we shall aligh the same policy: clear this flag even if host remains runtime-suspended after ufshcd_system_resume is successfully returned. Simply fix this flag to correct host status logs. Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Ming Lei authored
When SCSI-MQ is enabled, in some case system would present nr_possible_cpus() which is greater than requested vectors by the driver. This results into driver being able to get larger number of MSI-X vectors than actual online CPUs. Driver then uses pci_alloc_irq_vectors_affinity() to assign 1:1 mapping and affinity for each MSI-x vector to CPUs. When the command is submitted using MSI-x vector, assigned to offline CPU, it results in an ABTS and system hang. This hang is result of a driver not being able to process interrupt on a vector assigned to an Off-line CPUs This patch fixes this issue by setting irq_offset value for the blk_mq_pci_map_queues() to use only those CPUs which has CPU mask affinity assigned and are online. By using the irq_offset value, driver will allow online cpumask to decide which vectors are used in blk_mq_pci_map_queues(). Fixes: 5601236b ("scsi: qla2xxx: Add Block Multi Queue functionality.") Cc: <stable@vger.kernel.org> #4.19 Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Himanshu Madhani <hmadhani@marvell.com> Tested-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
John Garry authored
Currently we set the protection parameters after calling scsi_add_host() for v3 hw. They should be set beforehand, so make this change. Appearantly this fixes our DIX issue (not mainline yet) also, but more testing required. Fixes: d6a9000b ("scsi: hisi_sas: Add support for DIF feature for v2 hw") Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Xiubo Li authored
Currently there is one cmd timeout timer and one qfull timer for each udev, and whenever any new command is coming in we will update the cmd timer or qfull timer. For some corner cases the timers are always working only for the ringbuffer's and full queue's newest cmd. That's to say the timer won't be fired even if one cmd has been stuck for a very long time and the deadline is reached. This fix will keep the cmd/qfull timers to be pended for the oldest cmd in ringbuffer and full queue, and will update them with the next cmd's deadline only when the old cmd's deadline is reached or removed from the ringbuffer and full queue. Signed-off-by: Xiubo Li <xiubli@redhat.com> Acked-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
- 09 Jan, 2019 6 commits
-
-
Logan Gunthorpe authored
scsi_mq_setup_tags(), which is called by scsi_add_host(), calculates the command size to allocate based on the prot_capabilities. In the isci driver, scsi_host_set_prot() is called after scsi_add_host() so the command size gets calculated to be smaller than it needs to be. Eventually, scsi_mq_init_request() locates the 'prot_sdb' after the command assuming it was sized correctly and a buffer overrun may occur. However, seeing blk_mq_alloc_rqs() rounds up to the nearest cache line size, the mistake can go unnoticed. The bug was noticed after the struct request size was reduced by commit 9d037ad7 ("block: remove req->timeout_list") Which likely reduced the allocated space for the request by an entire cache line, enough that the overflow could be hit and it caused a panic, on boot, at: RIP: 0010:t10_pi_complete+0x77/0x1c0 Call Trace: <IRQ> sd_done+0xf5/0x340 scsi_finish_command+0xc3/0x120 blk_done_softirq+0x83/0xb0 __do_softirq+0xa1/0x2e6 irq_exit+0xbc/0xd0 call_function_single_interrupt+0xf/0x20 </IRQ> sd_done() would call scsi_prot_sg_count() which reads the number of entities in 'prot_sdb', but seeing 'prot_sdb' is located after the end of the allocated space it reads a garbage number and erroneously calls t10_pi_complete(). To prevent this, the calls to scsi_host_set_prot() are moved into isci_host_alloc() before the call to scsi_add_host(). Out of caution, also move the similar call to scsi_host_set_guard(). Fixes: 3d2d7525 ("[SCSI] isci: T10 DIF support") Link: http://lkml.kernel.org/r/da851333-eadd-163a-8c78-e1f4ec5ec857@deltatee.comSigned-off-by: Logan Gunthorpe <logang@deltatee.com> Cc: Intel SCU Linux support <intel-linux-scu@intel.com> Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Gustavo A. R. Silva authored
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Notice that, in this particular case, I replaced "Drop thru" and "Fall Thru" with "fall through" annotations, which is what GCC is expecting to find. Also, in some cases a dash is added as a token in order to separate the "fall through" annotation from the rest of the comment on the same line, which is what GCC is expecting to find. Addresses-Coverity-ID: 114979 ("Missing break in switch") Addresses-Coverity-ID: 114980 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Gustavo A. R. Silva authored
Fix boolean expression by using logical AND operator '&&' instead of bitwise operator '&'. This issue was detected with the help of Coccinelle. Fixes: 1e46731e ("scsi: smartpqi: check for null device pointers") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Stanley Chu authored
The commit 356fd266 ("scsi: Set request queue runtime PM status back to active on resume") fixed up the inconsistent RPM status between request queue and device. However changing request queue RPM status shall be done only on successful resume, otherwise status may be still inconsistent as below, Request queue: RPM_ACTIVE Device: RPM_SUSPENDED This ends up soft lockup because requests can be submitted to underlying devices but those devices and their required resource are not resumed. For example, After above inconsistent status happens, IO request can be submitted to UFS device driver but required resource (like clock) is not resumed yet thus lead to warning as below call stack, WARN_ON(hba->clk_gating.state != CLKS_ON); ufshcd_queuecommand scsi_dispatch_cmd scsi_request_fn __blk_run_queue cfq_insert_request __elv_add_request blk_flush_plug_list blk_finish_plug jbd2_journal_commit_transaction kjournald2 We may see all behind IO requests hang because of no response from storage host or device and then soft lockup happens in system. In the end, system may crash in many ways. Fixes: 356fd266 (scsi: Set request queue runtime PM status back to active on resume) Cc: stable@vger.kernel.org Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
Julia Lawall authored
Delete tab aligning a statement with the right hand side of a preceding assignment rather than the left hand side. Found with the help of Coccinelle. [mkp: added space] Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-
YueHaibing authored
The return code should be check while qla4xxx_copy_from_fwddb_param fails. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Manish Rangankar <mrangankar@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-