1. 28 Mar, 2019 6 commits
    • Steffen Maier's avatar
      scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN · c8206579
      Steffen Maier authored
      If an incoming ELS of type RSCN contains more than one element, zfcp
      suboptimally causes repeated erp trigger NOP trace records for each
      previously failed port. These could be ports that went away.  It loops over
      each RSCN element, and for each of those in an inner loop over all
      zfcp_ports.
      
      The trigger to recover failed ports should be just the reception of some
      RSCN, no matter how many elements it has. So we can loop over failed ports
      separately, and only then loop over each RSCN element to handle the
      non-failed ports.
      
      The call chain was:
      
        zfcp_fc_incoming_rscn
          for (i = 1; i < no_entries; i++)
            _zfcp_fc_incoming_rscn
              list_for_each_entry(port, &adapter->port_list, list)
                if (masked port->d_id match) zfcp_fc_test_link
                if (!port->d_id) zfcp_erp_port_reopen "fcrscn1"   <===
      
      In order the reduce the "flooding" of the REC trace area in such cases, we
      factor out handling the failed ports to be outside of the entries loop:
      
        zfcp_fc_incoming_rscn
          if (no_entries > 1)                                     <===
            list_for_each_entry(port, &adapter->port_list, list)  <===
              if (!port->d_id) zfcp_erp_port_reopen "fcrscn1"     <===
          for (i = 1; i < no_entries; i++)
            _zfcp_fc_incoming_rscn
              list_for_each_entry(port, &adapter->port_list, list)
                if (masked port->d_id match) zfcp_fc_test_link
      
      Abbreviated example trace records before this code change:
      
      Tag            : fcrscn1
      WWPN           : 0x500507630310d327
      ERP want       : 0x02
      ERP need       : 0x02
      
      Tag            : fcrscn1
      WWPN           : 0x500507630310d327
      ERP want       : 0x02
      ERP need       : 0x00                 NOP => superfluous trace record
      
      The last trace entry repeats if there are more than 2 RSCN elements.
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c8206579
    • Steffen Maier's avatar
      scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices · 242ec145
      Steffen Maier authored
      Suppose more than one non-NPIV FCP device is active on the same channel.
      Send I/O to storage and have some of the pending I/O run into a SCSI
      command timeout, e.g. due to bit errors on the fibre. Now the error
      situation stops. However, we saw FCP requests continue to timeout in the
      channel. The abort will be successful, but the subsequent TUR fails.
      Scsi_eh starts. The LUN reset fails. The target reset fails.  The host
      reset only did an FCP device recovery. However, for non-NPIV FCP devices,
      this does not close and reopen ports on the SAN-side if other non-NPIV FCP
      device(s) share the same open ports.
      
      In order to resolve the continuing FCP request timeouts, we need to
      explicitly close and reopen ports on the SAN-side.
      
      This was missing since the beginning of zfcp in v2.6.0 history commit
      ea127f97 ("[PATCH] s390 (7/7): zfcp host adapter.").
      
      Note: The FSF requests for forced port reopen could run into FSF request
      timeouts due to other reasons. This would trigger an internal FCP device
      recovery. Pending forced port reopen recoveries would get dismissed. So
      some ports might not get fully reopened during this host reset handler.
      However, subsequent I/O would trigger the above described escalation and
      eventually all ports would be forced reopen to resolve any continuing FCP
      request timeouts due to earlier bit errors.
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: <stable@vger.kernel.org> #3.0+
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      242ec145
    • Steffen Maier's avatar
      scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host · fe67888f
      Steffen Maier authored
      An already deleted SCSI device can exist on the Scsi_Host and remain there
      because something still holds a reference.  A new SCSI device with the same
      H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created.  When
      we try to unblock an rport, we still find the deleted SCSI device and
      return early because the zfcp_scsi_dev of that SCSI device is not
      ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if
      the new proper SCSI device would be in good state.
      
      Therefore, skip deleted SCSI devices when iterating the sdevs of the shost.
      [cf. __scsi_device_lookup{_by_target}() or scsi_device_get()]
      
      The following abbreviated trace sequence can indicate such problem:
      
      Area           : REC
      Tag            : ersfs_3
      LUN            : 0x4045400300000000
      WWPN           : 0x50050763031bd327
      LUN status     : 0x40000000     not ZFCP_STATUS_COMMON_UNBLOCKED
      Ready count    : n		not incremented yet
      Running count  : 0x00000000
      ERP want       : 0x01
      ERP need       : 0xc1		ZFCP_ERP_ACTION_NONE
      
      Area           : REC
      Tag            : ersfs_3
      LUN            : 0x4045400300000000
      WWPN           : 0x50050763031bd327
      LUN status     : 0x41000000
      Ready count    : n+1
      Running count  : 0x00000000
      ERP want       : 0x01
      ERP need       : 0x01
      
      ...
      
      Area           : REC
      Level          : 4		only with increased trace level
      Tag            : ertru_l
      LUN            : 0x4045400300000000
      WWPN           : 0x50050763031bd327
      LUN status     : 0x40000000
      Request ID     : 0x0000000000000000
      ERP status     : 0x01800000
      ERP step       : 0x1000
      ERP action     : 0x01
      ERP count      : 0x00
      
      NOT followed by a trace record with tag "scpaddy"
      for WWPN 0x50050763031bd327.
      Signed-off-by: default avatarSteffen Maier <maier@linux.ibm.com>
      Fixes: 6f2ce1c6 ("scsi: zfcp: fix rport unblock race with LUN recovery")
      Cc: <stable@vger.kernel.org> #2.6.32+
      Reviewed-by: default avatarJens Remus <jremus@linux.ibm.com>
      Reviewed-by: default avatarBenjamin Block <bblock@linux.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      fe67888f
    • Martin K. Petersen's avatar
      scsi: sd: Quiesce warning if device does not report optimal I/O size · 1d5de5bd
      Martin K. Petersen authored
      Commit a83da8a4 ("scsi: sd: Optimal I/O size should be a multiple
      of physical block size") split one conditional into several separate
      statements in an effort to provide more accurate warning messages when
      a device reports a nonsensical value. However, this reorganization
      accidentally dropped the precondition of the reported value being
      larger than zero. This lead to a warning getting emitted on devices
      that do not report an optimal I/O size at all.
      
      Remain silent if a device does not report an optimal I/O size.
      
      Fixes: a83da8a4 ("scsi: sd: Optimal I/O size should be a multiple of physical block size")
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarHussam Al-Tayeb <ht990332@gmx.com>
      Tested-by: default avatarHussam Al-Tayeb <ht990332@gmx.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      1d5de5bd
    • Bart Van Assche's avatar
      scsi: sd: Fix a race between closing an sd device and sd I/O · c14a5726
      Bart Van Assche authored
      The scsi_end_request() function calls scsi_cmd_to_driver() indirectly and
      hence needs the disk->private_data pointer. Avoid that that pointer is
      cleared before all affected I/O requests have finished. This patch avoids
      that the following crash occurs:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      Call trace:
       scsi_mq_uninit_cmd+0x1c/0x30
       scsi_end_request+0x7c/0x1b8
       scsi_io_completion+0x464/0x668
       scsi_finish_command+0xbc/0x160
       scsi_eh_flush_done_q+0x10c/0x170
       sas_scsi_recover_host+0x84c/0xa98 [libsas]
       scsi_error_handler+0x140/0x5b0
       kthread+0x100/0x12c
       ret_from_fork+0x10/0x18
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reported-by: default avatarJason Yan <yanaijie@huawei.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c14a5726
    • zhengbin's avatar
      scsi: core: Run queue when state is set to running after being blocked · 70fc085c
      zhengbin authored
      Use dd to test a SCSI device:
      
        1. echo "blocked" >/sys/block/sda/device/state
        2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10
        3. echo "running" >/sys/block/sda/device/state
      
      dd should finish this work after step 3, but it hangs.
      
      After step2, the call chain is this:
      
      blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq
      
      prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will
      transition it to BLK_STS_DEV_RESOURCE which means that driver can
      guarantee that IO dispatch will be triggered in future when the
      resource is available.  Need to follow the rule if we set the device
      state to running.
      
      [mkp: tweaked commit description and code comment as suggested by Bart]
      Signed-off-by: default avatarzhengbin <zhengbin13@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      70fc085c
  2. 26 Mar, 2019 3 commits
  3. 21 Mar, 2019 2 commits
    • Tyrel Datwyler's avatar
      scsi: ibmvscsi: Fix empty event pool access during host removal · 7f5203c1
      Tyrel Datwyler authored
      The event pool used for queueing commands is destroyed fairly early in the
      ibmvscsi_remove() code path. Since, this happens prior to the call so
      scsi_remove_host() it is possible for further calls to queuecommand to be
      processed which manifest as a panic due to a NULL pointer dereference as
      seen here:
      
      PANIC: "Unable to handle kernel paging request for data at address
      0x00000000"
      
      Context process backtrace:
      
      DSISR: 0000000042000000 ????Syscall Result: 0000000000000000
      4 [c000000002cb3820] memcpy_power7 at c000000000064204
      [Link Register] [c000000002cb3820] ibmvscsi_send_srp_event at d000000003ed14a4
      5 [c000000002cb3920] ibmvscsi_send_srp_event at d000000003ed14a4 [ibmvscsi] ?(unreliable)
      6 [c000000002cb39c0] ibmvscsi_queuecommand at d000000003ed2388 [ibmvscsi]
      7 [c000000002cb3a70] scsi_dispatch_cmd at d00000000395c2d8 [scsi_mod]
      8 [c000000002cb3af0] scsi_request_fn at d00000000395ef88 [scsi_mod]
      9 [c000000002cb3be0] __blk_run_queue at c000000000429860
      10 [c000000002cb3c10] blk_delay_work at c00000000042a0ec
      11 [c000000002cb3c40] process_one_work at c0000000000dac30
      12 [c000000002cb3cd0] worker_thread at c0000000000db110
      13 [c000000002cb3d80] kthread at c0000000000e3378
      14 [c000000002cb3e30] ret_from_kernel_thread at c00000000000982c
      
      The kernel buffer log is overfilled with this log:
      
      [11261.952732] ibmvscsi: found no event struct in pool!
      
      This patch reorders the operations during host teardown. Start by calling
      the SRP transport and Scsi_Host remove functions to flush any outstanding
      work and set the host offline. LLDD teardown follows including destruction
      of the event pool, freeing the Command Response Queue (CRQ), and unmapping
      any persistent buffers. The event pool destruction is protected by the
      scsi_host lock, and the pool is purged prior of any requests for which we
      never received a response. Finally, move the removal of the scsi host from
      our global list to the end so that the host is easily locatable for
      debugging purposes during teardown.
      
      Cc: <stable@vger.kernel.org> # v2.6.12+
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      7f5203c1
    • Tyrel Datwyler's avatar
      scsi: ibmvscsi: Protect ibmvscsi_head from concurrent modificaiton · 7205981e
      Tyrel Datwyler authored
      For each ibmvscsi host created during a probe or destroyed during a remove
      we either add or remove that host to/from the global ibmvscsi_head
      list. This runs the risk of concurrent modification.
      
      This patch adds a simple spinlock around the list modification calls to
      prevent concurrent updates as is done similarly in the ibmvfc driver and
      ipr driver.
      
      Fixes: 32d6e4b6 ("scsi: ibmvscsi: add vscsi hosts to global list_head")
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: default avatarTyrel Datwyler <tyreld@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      7205981e
  4. 20 Mar, 2019 1 commit
  5. 19 Mar, 2019 4 commits
    • Himanshu Madhani's avatar
      scsi: qla2xxx: Fix NULL pointer crash due to stale CPUID · ac444b4f
      Himanshu Madhani authored
      This patch fixes crash due to NULL pointer derefrence because CPU pointer
      is not set and used by driver.  Instead, driver is passes CPU as tag via
      ha->isp_ops->{lun_reset|target_reset}
      
      [   30.160780] qla2xxx [0000:a0:00.1]-8038:9: Cable is unplugged...
      [   69.984045] qla2xxx [0000:a0:00.0]-8009:8: DEVICE RESET ISSUED nexus=8:0:0 cmd=00000000b0d62f46.
      [   69.992849] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
      [   70.000680] PGD 0 P4D 0
      [   70.003232] Oops: 0000 [#1] SMP PTI
      [   70.006727] CPU: 2 PID: 6714 Comm: sg_reset Kdump: loaded Not tainted 4.18.0-67.el8.x86_64 #1
      [   70.015258] Hardware name: NEC Express5800/T110j [N8100-2758Y]/MX32-PH0-NJ, BIOS F11 02/13/2019
      [   70.024016] RIP: 0010:blk_mq_rq_cpu+0x9/0x10
      [   70.028315] Code: 01 58 01 00 00 48 83 c0 28 48 3d 80 02 00 00 75 ab c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
       8b 47 08 <8b> 40 40 c3 0f 1f 00 0f 1f 44 00 00 48 83 ec 10 48 c7 c6 20 6e 7c
      [   70.047087] RSP: 0018:ffff99a481487d58 EFLAGS: 00010246
      [   70.052322] RAX: 0000000000000000 RBX: ffffffffc041b08b RCX: 0000000000000000
      [   70.059466] RDX: 0000000000000000 RSI: ffff8d10b6b16898 RDI: ffff8d10b341e400
      [   70.066615] RBP: ffffffffc03a6bd0 R08: 0000000000000415 R09: 0000000000aaaaaa
      [   70.073765] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d10b341e528
      [   70.080914] R13: ffff8d10aadefc00 R14: ffff8d0f64efa998 R15: ffff8d0f64efa000
      [   70.088083] FS:  00007f90a201e540(0000) GS:ffff8d10b6b00000(0000) knlGS:0000000000000000
      [   70.096188] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   70.101959] CR2: 0000000000000040 CR3: 0000000268886005 CR4: 00000000003606e0
      [   70.109127] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   70.116277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   70.123425] Call Trace:
      [   70.125896]  __qla2xxx_eh_generic_reset+0xb1/0x220 [qla2xxx]
      [   70.131572]  scsi_ioctl_reset+0x1f5/0x2a0
      [   70.135600]  scsi_ioctl+0x18e/0x397
      [   70.139099]  ? sd_ioctl+0x7c/0x100 [sd_mod]
      [   70.143287]  blkdev_ioctl+0x32b/0x9f0
      [   70.146954]  ? __check_object_size+0xa3/0x181
      [   70.151323]  block_ioctl+0x39/0x40
      [   70.154735]  do_vfs_ioctl+0xa4/0x630
      [   70.158322]  ? syscall_trace_enter+0x1d3/0x2c0
      [   70.162769]  ksys_ioctl+0x60/0x90
      [   70.166104]  __x64_sys_ioctl+0x16/0x20
      [   70.169859]  do_syscall_64+0x5b/0x1b0
      [   70.173532]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [   70.178587] RIP: 0033:0x7f90a1b3445b
      [   70.182183] Code: 0f 1e fa 48 8b 05 2d aa 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
       00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd a9 2c 00 f7 d8 64 89 01 48
      [   70.200956] RSP: 002b:00007fffdca88b68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [   70.208535] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f90a1b3445b
      [   70.215684] RDX: 00007fffdca88b84 RSI: 0000000000002284 RDI: 0000000000000003
      [   70.222833] RBP: 00007fffdca88ca8 R08: 00007fffdca88b84 R09: 0000000000000000
      [   70.229981] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdca88b84
      [   70.237131] R13: 0000000000000000 R14: 000055ab09b0bd28 R15: 0000000000000000
      [   70.244284] Modules linked in: nft_chain_route_ipv4 xt_CHECKSUM nft_chain_nat_ipv4 ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack_ipv4
       nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 nft_counter nft_compat tun bridge stp llc nf_tables nfnetli
      nk devlink sunrpc vfat fat intel_rapl intel_pmc_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm wmi_bmof iTCO_wdt iTCO_
      vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif intel_cstate intel_uncore intel_rapl_perf ipmi_si jo
      ydev pcspkr ipmi_devintf sg wmi ipmi_msghandler video acpi_power_meter acpi_pad mei_me i2c_i801 mei ip_tables ext4 mbcache jbd2 sr_mod cd
      rom sd_mod qla2xxx ast i2c_algo_bit drm_kms_helper nvme_fc syscopyarea sysfillrect uas sysimgblt fb_sys_fops nvme_fabrics ttm
      [   70.314805]  usb_storage nvme_core crc32c_intel scsi_transport_fc ahci drm libahci tg3 libata megaraid_sas pinctrl_cannonlake pinctrl_
      intel
      [   70.327335] CR2: 0000000000000040
      
      Fixes: 9cf2bab6 ("block: kill request ->cpu member")
      Signed-off-by: default avatarHimanshu Madhani <hmadhani@marvell.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      ac444b4f
    • Quinn Tran's avatar
      scsi: qla2xxx: Fix FC-AL connection target discovery · 4705f10e
      Quinn Tran authored
      Commit 7f147f9b ("scsi: qla2xxx: Fix N2N target discovery with Local
      loop") fixed N2N target discovery for local loop.  However, same code is
      used for FC-AL discovery as well. Added check to make sure we are bypassing
      area and domain check only in N2N topology for target discovery.
      
      Fixes: 7f147f9b ("scsi: qla2xxx: Fix N2N target discovery with Local loop")
      Cc: stable@vger.kernel.org # 5.0+
      Signed-off-by: default avatarQuinn Tran <qtran@marvell.com>
      Signed-off-by: default avatarHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      4705f10e
    • Bart Van Assche's avatar
      scsi: core: Avoid that a kernel warning appears during system resume · 17605afa
      Bart Van Assche authored
      Since scsi_device_quiesce() skips SCSI devices that have another state than
      RUNNING, OFFLINE or TRANSPORT_OFFLINE, scsi_device_resume() should not
      complain about SCSI devices that have been skipped. Hence this patch.  This
      patch avoids that the following warning appears during resume:
      
      WARNING: CPU: 3 PID: 1039 at blk_clear_pm_only+0x2a/0x30
      CPU: 3 PID: 1039 Comm: kworker/u8:49 Not tainted 5.0.0+ #1
      Hardware name: LENOVO 4180F42/4180F42, BIOS 83ET75WW (1.45 ) 05/10/2013
      Workqueue: events_unbound async_run_entry_fn
      RIP: 0010:blk_clear_pm_only+0x2a/0x30
      Call Trace:
       ? scsi_device_resume+0x28/0x50
       ? scsi_dev_type_resume+0x2b/0x80
       ? async_run_entry_fn+0x2c/0xd0
       ? process_one_work+0x1f0/0x3f0
       ? worker_thread+0x28/0x3c0
       ? process_one_work+0x3f0/0x3f0
       ? kthread+0x10c/0x130
       ? __kthread_create_on_node+0x150/0x150
       ? ret_from_fork+0x1f/0x30
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Oleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Martin Steigerwald <martin@lichtvoll.de>
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Tested-by: default avatarJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Fixes: 3a0a5299 ("block, scsi: Make SCSI quiesce and resume work reliably") # v4.15
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      17605afa
    • Bart Van Assche's avatar
      scsi: core: Also call destroy_rcu_head() for passthrough requests · db983f6e
      Bart Van Assche authored
      cmd->rcu is initialized by scsi_initialize_rq(). For passthrough
      requests, blk_get_request() calls scsi_initialize_rq(). For filesystem
      requests, scsi_init_command() calls scsi_initialize_rq(). Make sure
      that destroy_rcu_head() is called for passthrough requests.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Reported-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      db983f6e
  6. 18 Mar, 2019 1 commit
    • Maurizio Lombardi's avatar
      scsi: iscsi: flush running unbind operations when removing a session · 165aa2bf
      Maurizio Lombardi authored
      In some cases, the iscsi_remove_session() function is called while an
      unbind_work operation is still running.  This may cause a situation where
      sysfs objects are removed in an incorrect order, triggering a kernel
      warning.
      
      [  605.249442] ------------[ cut here ]------------
      [  605.259180] sysfs group 'power' not found for kobject 'target2:0:0'
      [  605.321371] WARNING: CPU: 1 PID: 26794 at fs/sysfs/group.c:235 sysfs_remove_group+0x76/0x80
      [  605.341266] Modules linked in: dm_service_time target_core_user target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod nls_utf8 isofs ppdev bochs_drm nfit ttm libnvdimm drm_kms_helper syscopyarea sysfillrect sysimgblt joydev pcspkr fb_sys_fops drm i2c_piix4 sg parport_pc parport xfs libcrc32c dm_multipath sr_mod sd_mod cdrom ata_generic 8021q garp mrp ata_piix stp crct10dif_pclmul crc32_pclmul llc libata crc32c_intel virtio_net net_failover ghash_clmulni_intel serio_raw failover sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
      [  605.627479] CPU: 1 PID: 26794 Comm: kworker/u32:2 Not tainted 4.18.0-60.el8.x86_64 #1
      [  605.721401] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
      [  605.823651] Workqueue: scsi_wq_2 __iscsi_unbind_session [scsi_transport_iscsi]
      [  605.830940] RIP: 0010:sysfs_remove_group+0x76/0x80
      [  605.922907] Code: 48 89 df 5b 5d 41 5c e9 38 c4 ff ff 48 89 df e8 e0 bf ff ff eb cb 49 8b 14 24 48 8b 75 00 48 c7 c7 38 73 cb a7 e8 24 77 d7 ff <0f> 0b 5b 5d 41 5c c3 0f 1f 00 0f 1f 44 00 00 41 56 41 55 41 54 55
      [  606.122304] RSP: 0018:ffffbadcc8d1bda8 EFLAGS: 00010286
      [  606.218492] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  606.326381] RDX: ffff98bdfe85eb40 RSI: ffff98bdfe856818 RDI: ffff98bdfe856818
      [  606.514498] RBP: ffffffffa7ab73e0 R08: 0000000000000268 R09: 0000000000000007
      [  606.529469] R10: 0000000000000000 R11: ffffffffa860d9ad R12: ffff98bdf978e838
      [  606.630535] R13: ffff98bdc2cd4010 R14: ffff98bdc2cd3ff0 R15: ffff98bdc2cd4000
      [  606.824707] FS:  0000000000000000(0000) GS:ffff98bdfe840000(0000) knlGS:0000000000000000
      [  607.018333] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  607.117844] CR2: 00007f84b78ac024 CR3: 000000002c00a003 CR4: 00000000003606e0
      [  607.117844] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  607.420926] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  607.524236] Call Trace:
      [  607.530591]  device_del+0x56/0x350
      [  607.624393]  ? ata_tlink_match+0x30/0x30 [libata]
      [  607.727805]  ? attribute_container_device_trigger+0xb4/0xf0
      [  607.829911]  scsi_target_reap_ref_release+0x39/0x50
      [  607.928572]  scsi_remove_target+0x1a2/0x1d0
      [  608.017350]  __iscsi_unbind_session+0xb3/0x160 [scsi_transport_iscsi]
      [  608.117435]  process_one_work+0x1a7/0x360
      [  608.132917]  worker_thread+0x30/0x390
      [  608.222900]  ? pwq_unbound_release_workfn+0xd0/0xd0
      [  608.323989]  kthread+0x112/0x130
      [  608.418318]  ? kthread_bind+0x30/0x30
      [  608.513821]  ret_from_fork+0x35/0x40
      [  608.613909] ---[ end trace 0b98c310c8a6138c ]---
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Acked-by: default avatarChris Leech <cleech@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      165aa2bf
  7. 17 Mar, 2019 14 commits
  8. 16 Mar, 2019 9 commits
    • Linus Torvalds's avatar
      Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · a9dce667
      Linus Torvalds authored
      Pull pidfd system call from Christian Brauner:
       "This introduces the ability to use file descriptors from /proc/<pid>/
        as stable handles on struct pid. Even if a pid is recycled the handle
        will not change. For a start these fds can be used to send signals to
        the processes they refer to.
      
        With the ability to use /proc/<pid> fds as stable handles on struct
        pid we can fix a long-standing issue where after a process has exited
        its pid can be reused by another process. If a caller sends a signal
        to a reused pid it will end up signaling the wrong process.
      
        With this patchset we enable a variety of use cases. One obvious
        example is that we can now safely delegate an important part of
        process management - sending signals - to processes other than the
        parent of a given process by sending file descriptors around via scm
        rights and not fearing that the given process will have been recycled
        in the meantime. It also allows for easy testing whether a given
        process is still alive or not by sending signal 0 to a pidfd which is
        quite handy.
      
        There has been some interest in this feature e.g. from systems
        management (systemd, glibc) and container managers. I have requested
        and gotten comments from glibc to make sure that this syscall is
        suitable for their needs as well. In the future I expect it to take on
        most other pid-based signal syscalls. But such features are left for
        the future once they are needed.
      
        This has been sitting in linux-next for quite a while and has not
        caused any issues. It comes with selftests which verify basic
        functionality and also test that a recycled pid cannot be signaled via
        a pidfd.
      
        Jon has written about a prior version of this patchset. It should
        cover the basic functionality since not a lot has changed since then:
      
            https://lwn.net/Articles/773459/
      
        The commit message for the syscall itself is extensively documenting
        the syscall, including it's functionality and extensibility"
      
      * tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        selftests: add tests for pidfd_send_signal()
        signal: add pidfd_send_signal() syscall
      a9dce667
    • Linus Torvalds's avatar
      Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · f67e3fb4
      Linus Torvalds authored
      Pull device-dax updates from Dan Williams:
       "New device-dax infrastructure to allow persistent memory and other
        "reserved" / performance differentiated memories, to be assigned to
        the core-mm as "System RAM".
      
        Some users want to use persistent memory as additional volatile
        memory. They are willing to cope with potential performance
        differences, for example between DRAM and 3D Xpoint, and want to use
        typical Linux memory management apis rather than a userspace memory
        allocator layered over an mmap() of a dax file. The administration
        model is to decide how much Persistent Memory (pmem) to use as System
        RAM, create a device-dax-mode namespace of that size, and then assign
        it to the core-mm. The rationale for device-dax is that it is a
        generic memory-mapping driver that can be layered over any "special
        purpose" memory, not just pmem. On subsequent boots udev rules can be
        used to restore the memory assignment.
      
        One implication of using pmem as RAM is that mlock() no longer keeps
        data off persistent media. For this reason it is recommended to enable
        NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
        at rest. We considered making this recommendation an actively enforced
        requirement, but in the end decided to leave it as a distribution /
        administrator policy to allow for emulation and test environments that
        lack security capable NVDIMMs.
      
        Summary:
      
         - Replace the /sys/class/dax device model with /sys/bus/dax, and
           include a compat driver so distributions can opt-in to the new ABI.
      
         - Allow for an alternative driver for the device-dax address-range
      
         - Introduce the 'kmem' driver to hotplug / assign a device-dax
           address-range to the core-mm.
      
         - Arrange for the device-dax target-node to be onlined so that the
           newly added memory range can be uniquely referenced by numa apis"
      
      NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
      we currently have special - and very annoying rules in the kernel about
      accessing PMEM only with the "MC safe" accessors, because machine checks
      inside the regular repeat string copy functions can be fatal in some
      (not described) circumstances.
      
      And apparently the PMEM modules can cause that a lot more than regular
      RAM.  The argument is that this happens because PMEM doesn't necessarily
      get scrubbed at boot like RAM does, but that is planned to be added for
      the user space tooling.
      
      Quoting Dan from another email:
       "The exposure can be reduced in the volatile-RAM case by scanning for
        and clearing errors before it is onlined as RAM. The userspace tooling
        for that can be in place before v5.1-final. There's also runtime
        notifications of errors via acpi_nfit_uc_error_notify() from
        background scrubbers on the DIMM devices. With that mechanism the
        kernel could proactively clear newly discovered poison in the volatile
        case, but that would be additional development more suitable for v5.2.
      
        I understand the concern, and the need to highlight this issue by
        tapping the brakes on feature development, but I don't see PMEM as RAM
        making the situation worse when the exposure is also there via DAX in
        the PMEM case. Volatile-RAM is arguably a safer use case since it's
        possible to repair pages where the persistent case needs active
        application coordination"
      
      * tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        device-dax: "Hotplug" persistent memory for use like normal RAM
        mm/resource: Let walk_system_ram_range() search child resources
        mm/memory-hotplug: Allow memory resources to be children
        mm/resource: Move HMM pr_debug() deeper into resource code
        mm/resource: Return real error codes from walk failures
        device-dax: Add a 'modalias' attribute to DAX 'bus' devices
        device-dax: Add a 'target_node' attribute
        device-dax: Auto-bind device after successful new_id
        acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
        device-dax: Add /sys/class/dax backwards compatibility
        device-dax: Add support for a dax override driver
        device-dax: Move resource pinning+mapping into the common driver
        device-dax: Introduce bus + driver model
        device-dax: Start defining a dax bus model
        device-dax: Remove multi-resource infrastructure
        device-dax: Kill dax_region base
        device-dax: Kill dax_region ida
      f67e3fb4
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 477558d7
      Linus Torvalds authored
      Pull more SCSI updates from James Bottomley:
       "This is the final round of mostly small fixes and performance
        improvements to our initial submit.
      
        The main regression fix is the ia64 simscsi build failure which was
        missed in the serial number elimination conversion"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
        scsi: ia64: simscsi: use request tag instead of serial_number
        scsi: aacraid: Fix performance issue on logical drives
        scsi: lpfc: Fix error codes in lpfc_sli4_pci_mem_setup()
        scsi: libiscsi: Hold back_lock when calling iscsi_complete_task
        scsi: hisi_sas: Change SERDES_CFG init value to increase reliability of HiLink
        scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port
        scsi: hisi_sas: Set PHY linkrate when disconnected
        scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw
        scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
        scsi: hisi_sas: Change return variable type in phy_up_v3_hw()
        scsi: qla2xxx: check for kstrtol() failure
        scsi: lpfc: fix 32-bit format string warning
        scsi: lpfc: fix unused variable warning
        scsi: target: tcmu: Switch to bitmap_zalloc()
        scsi: libiscsi: fall back to sendmsg for slab pages
        scsi: qla2xxx: avoid printf format warning
        scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset
        scsi: lpfc: Correct __lpfc_sli_issue_iocb_s4 lockdep check
        scsi: ufs: hisi: fix ufs_hba_variant_ops passing
        scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show
        ...
      477558d7
    • Linus Torvalds's avatar
      Merge tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block · 11efae35
      Linus Torvalds authored
      Pull more block layer changes from Jens Axboe:
       "This is a collection of both stragglers, and fixes that came in after
        I finalized the initial pull. This contains:
      
         - An MD pull request from Song, with a few minor fixes
      
         - Set of NVMe patches via Christoph
      
         - Pull request from Konrad, with a few fixes for xen/blkback
      
         - pblk fix IO calculation fix (Javier)
      
         - Segment calculation fix for pass-through (Ming)
      
         - Fallthrough annotation for blkcg (Mathieu)"
      
      * tag 'for-5.1/block-post-20190315' of git://git.kernel.dk/linux-block: (25 commits)
        blkcg: annotate implicit fall through
        nvme-tcp: support C2HData with SUCCESS flag
        nvmet: ignore EOPNOTSUPP for discard
        nvme: add proper write zeroes setup for the multipath device
        nvme: add proper discard setup for the multipath device
        nvme: remove nvme_ns_config_oncs
        nvme: disable Write Zeroes for qemu controllers
        nvmet-fc: bring Disconnect into compliance with FC-NVME spec
        nvmet-fc: fix issues with targetport assoc_list list walking
        nvme-fc: reject reconnect if io queue count is reduced to zero
        nvme-fc: fix numa_node when dev is null
        nvme-fc: use nr_phys_segments to determine existence of sgl
        nvme-loop: init nvmet_ctrl fatal_err_work when allocate
        nvme: update comment to make the code easier to read
        nvme: put ns_head ref if namespace fails allocation
        nvme-trace: fix cdw10 buffer overrun
        nvme: don't warn on block content change effects
        nvme: add get-feature to admin cmds tracer
        md: Fix failed allocation of md_register_thread
        It's wrong to add len to sector_nr in raid10 reshape twice
        ...
      11efae35
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 465c209d
      Linus Torvalds authored
      Pull NFS client bugfixes from Trond Myklebust:
       "Highlights include:
      
        Bugfixes:
         - Fix an Oops in SUNRPC back channel tracepoints
         - Fix a SUNRPC client regression when handling oversized replies
         - Fix the minimal size for SUNRPC reply buffer allocation
         - rpc_decode_header() must always return a non-zero value on error
         - Fix a typo in pnfs_update_layout()
      
        Cleanup:
         - Remove redundant check for the reply length in call_decode()"
      
      * tag 'nfs-for-5.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        SUNRPC: Remove redundant check for the reply length in call_decode()
        SUNRPC: Handle the SYSTEM_ERR rpc error
        SUNRPC: rpc_decode_header() must always return a non-zero value on error
        SUNRPC: Use the ENOTCONN error on socket disconnect
        SUNRPC: Fix the minimal size for reply buffer allocation
        SUNRPC: Fix a client regression when handling oversized replies
        pNFS: Fix a typo in pnfs_update_layout
        fix null pointer deref in tracepoints in back channel
      465c209d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a9c55d58
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "One fix to prevent runtime allocation of 16GB pages when running in a
        VM (as opposed to bare metal), because it doesn't work.
      
        A small fix to our recently added KCOV support to exempt some more
        code from being instrumented.
      
        Plus a few minor build fixes, a small dead code removal and a
        defconfig update.
      
        Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Christophe Leroy,
        Jason Yan, Joel Stanley, Mahesh Salgaonkar, Mathieu Malaterre"
      
      * tag 'powerpc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64s: Include <asm/nmi.h> header file to fix a warning
        powerpc/powernv: Fix compile without CONFIG_TRACEPOINTS
        powerpc/mm: Disable kcov for SLB routines
        powerpc: remove dead code in head_fsl_booke.S
        powerpc/configs: Sync skiroot defconfig
        powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
      a9c55d58
    • Linus Torvalds's avatar
      Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 92497350
      Linus Torvalds authored
      Pull vfs mount infrastructure fix from Al Viro:
       "Fixup for sysfs braino.
      
        Capabilities checks for sysfs mount do include those on netns, but
        only if CONFIG_NET_NS is enabled. Sorry, should've caught that
        earlier..."
      
      * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fix sysfs_init_fs_context() in !CONFIG_NET_NS case
      92497350
    • Al Viro's avatar
      fix sysfs_init_fs_context() in !CONFIG_NET_NS case · ab81dabd
      Al Viro authored
      Permission checks on current's netns should be done only when
      netns are enabled.
      Reported-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Fixes: 23bf1b6bSigned-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ab81dabd
    • Linus Torvalds's avatar
      Merge tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 9c7dc824
      Linus Torvalds authored
      Pull more smb3 updates from Steve French:
       "Various tracing and debugging improvements, crediting fixes, some
        cleanup, and important fallocate fix (fixes three xfstests) and lock
        fix.
      
        Summary:
      
         - Various additional dynamic tracing tracepoints
      
         - Debugging improvements (including ability to query the server via
           SMB3 fsctl from userspace tools which can help with stats and
           debugging)
      
         - One minor performance improvement (root directory inode caching)
      
         - Crediting (SMB3 flow control) fixes
      
         - Some cleanup (docs and to mknod)
      
         - Important fixes: one to smb3 implementation of fallocate zero range
           (which fixes three xfstests) and a POSIX lock fix"
      
      * tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: (22 commits)
        CIFS: fix POSIX lock leak and invalid ptr deref
        SMB3: Allow SMB3 FSCTL queries to be sent to server from tools
        cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_falloc
        smb2: fix typo in definition of a few error flags
        CIFS: make mknod() an smb_version_op
        cifs: minor documentation updates
        cifs: remove unused value pointed out by Coverity
        SMB3: passthru query info doesn't check for SMB3 FSCTL passthru
        smb3: add dynamic tracepoints for simple fallocate and zero range
        cifs: fix smb3_zero_range so it can expand the file-size when required
        cifs: add SMB2_ioctl_init/free helpers to be used with compounding
        smb3: Add dynamic trace points for various compounded smb3 ops
        cifs: cache FILE_ALL_INFO for the shared root handle
        smb3: display volume serial number for shares in /proc/fs/cifs/DebugData
        cifs: simplify how we handle credits in compound_send_recv()
        smb3: add dynamic tracepoint for timeout waiting for credits
        smb3: display security information in /proc/fs/cifs/DebugData more accurately
        cifs: add a timeout argument to wait_for_free_credits
        cifs: prevent starvation in wait_for_free_credits for multi-credit requests
        cifs: wait_for_free_credits() make it possible to wait for >=1 credits
        ...
      9c7dc824