1. 27 May, 2020 6 commits
    • Gabriel Krisman Bertazi's avatar
      scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim · 7e7cd796
      Gabriel Krisman Bertazi authored
      iSCSI suffers from a deadlock in case a management command submitted via
      the netlink socket sleeps on an allocation while holding the rx_queue_mutex
      if that allocation causes a memory reclaim that writebacks to a failed
      iSCSI device.  The recovery procedure can never make progress to recover
      the failed disk or abort outstanding IO operations to complete the reclaim
      (since rx_queue_mutex is locked), thus locking the system.
      
      Nevertheless, just marking all allocations under rx_queue_mutex as GFP_NOIO
      (or locking the userspace process with something like PF_MEMALLOC_NOIO) is
      not enough, since the iSCSI command code relies on other subsystems that
      try to grab locked mutexes, whose threads are GFP_IO, leading to the same
      deadlock. One instance where this situation can be observed is in the
      backtraces below, stitched from multiple bugs reports, involving the kobj
      uevent sent when a session is created.
      
      The root of the problem is not the fact that iSCSI does GFP_IO allocations,
      that is acceptable. The actual problem is that rx_queue_mutex has a very
      large granularity, covering every unrelated netlink command execution at
      the same time as the error recovery path.
      
      The proposed fix leverages the recently added mechanism to stop failed
      connections from the kernel, by enabling it to execute even though a
      management command from the netlink socket is being run (rx_queue_mutex is
      held), provided that the command is known to be safe.  It splits the
      rx_queue_mutex in two mutexes, one protecting from concurrent command
      execution from the netlink socket, and one protecting stop_conn from racing
      with other connection management operations that might conflict with it.
      
      It is not very pretty, but it is the simplest way to resolve the deadlock.
      I considered making it a lock per connection, but some external mutex would
      still be needed to deal with iscsi_if_destroy_conn.
      
      The patch was tested by forcing a memory shrinker (unrelated, but used
      bufio/dm-verity) to reclaim iSCSI pages every time
      ISCSI_UEVENT_CREATE_SESSION happens, which is reasonable to simulate
      reclaims that might happen with GFP_KERNEL on that path.  Then, a faulty
      hung target causes a connection to fail during intensive IO, at the same
      time a new session is added by iscsid.
      
      The following stacktraces are stiches from several bug reports, showing a
      case where the deadlock can happen.
      
       iSCSI-write
               holding: rx_queue_mutex
               waiting: uevent_sock_mutex
      
               kobject_uevent_env+0x1bd/0x419
               kobject_uevent+0xb/0xd
               device_add+0x48a/0x678
               scsi_add_host_with_dma+0xc5/0x22d
               iscsi_host_add+0x53/0x55
               iscsi_sw_tcp_session_create+0xa6/0x129
               iscsi_if_rx+0x100/0x1247
               netlink_unicast+0x213/0x4f0
               netlink_sendmsg+0x230/0x3c0
      
       iscsi_fail iscsi_conn_failure
               waiting: rx_queue_mutex
      
               schedule_preempt_disabled+0x325/0x734
               __mutex_lock_slowpath+0x18b/0x230
               mutex_lock+0x22/0x40
               iscsi_conn_failure+0x42/0x149
               worker_thread+0x24a/0xbc0
      
       EventManager_
               holding: uevent_sock_mutex
               waiting: dm_bufio_client->lock
      
               dm_bufio_lock+0xe/0x10
               shrink+0x34/0xf7
               shrink_slab+0x177/0x5d0
               do_try_to_free_pages+0x129/0x470
               try_to_free_mem_cgroup_pages+0x14f/0x210
               memcg_kmem_newpage_charge+0xa6d/0x13b0
               __alloc_pages_nodemask+0x4a3/0x1a70
               fallback_alloc+0x1b2/0x36c
               __kmalloc_node_track_caller+0xb9/0x10d0
               __alloc_skb+0x83/0x2f0
               kobject_uevent_env+0x26b/0x419
               dm_kobject_uevent+0x70/0x79
               dev_suspend+0x1a9/0x1e7
               ctl_ioctl+0x3e9/0x411
               dm_ctl_ioctl+0x13/0x17
               do_vfs_ioctl+0xb3/0x460
               SyS_ioctl+0x5e/0x90
      
       MemcgReclaimerD"
               holding: dm_bufio_client->lock
               waiting: stuck io to finish (needs iscsi_fail thread to progress)
      
               schedule at ffffffffbd603618
               io_schedule at ffffffffbd603ba4
               do_io_schedule at ffffffffbdaf0d94
               __wait_on_bit at ffffffffbd6008a6
               out_of_line_wait_on_bit at ffffffffbd600960
               wait_on_bit.constprop.10 at ffffffffbdaf0f17
               __make_buffer_clean at ffffffffbdaf18ba
               __cleanup_old_buffer at ffffffffbdaf192f
               shrink at ffffffffbdaf19fd
               do_shrink_slab at ffffffffbd6ec000
               shrink_slab at ffffffffbd6ec24a
               do_try_to_free_pages at ffffffffbd6eda09
               try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
               mem_cgroup_resize_limit at ffffffffbd7024c0
               mem_cgroup_write at ffffffffbd703149
               cgroup_file_write at ffffffffbd6d9c6e
               sys_write at ffffffffbd6662ea
               system_call_fastpath at ffffffffbdbc34a2
      
      Link: https://lore.kernel.org/r/20200520022959.1912856-1-krisman@collabora.comReported-by: default avatarKhazhismel Kumykov <khazhy@google.com>
      Reviewed-by: default avatarLee Duncan <lduncan@suse.com>
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      7e7cd796
    • Stanley Chu's avatar
      scsi: ufs: Fix WriteBooster flush during runtime suspend · 51dd905b
      Stanley Chu authored
      Currently UFS host driver promises VCC supply if UFS device needs to do
      WriteBooster flush during runtime suspend.
      
      However the UFS specification mentions:
      
      "While the flushing operation is in progress, the device is in Active power
      mode."
      
      Therefore UFS host driver needs to promise more: Keep UFS device as "Active
      power mode", otherwise UFS device shall not do any flush if device enters
      Sleep or PowerDown power mode.  Similarly, the same promises shall be
      applied if device needs urgent BKOP during runtime suspend.
      
      Fix this by not changing device power mode if WriteBooster flush or urgent
      BKOP is required in ufshcd_suspend().
      
      Now, if device finishes its job but is not resumed for a very long time,
      system will have unnecessary power drain because VCC is still supplied. A
      method to re-check the threshold of keeping VCC supply is required to fix
      the power drain. However, the threshold re-check needs to re-activate the
      link first because the decision depends on the latest device status.
      
      Also introduce a delayed work to force runtime resume after a certain delay
      during runtime suspend. This makes threshold re-check happen natually in
      the entry of the next runtime-suspend. The device can continue its
      WriteBooster flush or urgent BKOP jobs soon after resumed if device has no
      upcoming requests and link enters hibern8 state either by Auto-Hibern8 or
      hibern8 during clk-gating scheme. This solution not only prevents power
      drain but also makes as much use of time as possible for device's
      background jobs.
      
      Link: https://lore.kernel.org/r/20200522083212.4008-5-stanley.chu@mediatek.comReviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
      Signed-off-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      51dd905b
    • Stanley Chu's avatar
      scsi: ufs: Fix index of attributes query for WriteBooster feature · e31011ab
      Stanley Chu authored
      For WriteBooster feature related attributes, the index used by query shall
      be LUN ID if LU Dedicated buffer mode is enabled.
      
      Link: https://lore.kernel.org/r/20200522083212.4008-4-stanley.chu@mediatek.comReviewed-by: default avatarAvri Altman <avri.altman@wdc.com>
      Reviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
      Signed-off-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      e31011ab
    • Stanley Chu's avatar
      scsi: ufs: Allow WriteBooster on UFS 2.2 devices · c7cee3e7
      Stanley Chu authored
      According to the UFS specification, WriteBooster is officially supported by
      UFS 2.2.
      
      Since UFS 2.2 specification has been finalized in JEDEC and such devices
      have also showed up in the market, modify the checking rule for
      ufshcd_wb_probe() to allow these devices to enable WriteBooster.
      
      Link: https://lore.kernel.org/r/20200522083212.4008-3-stanley.chu@mediatek.comReviewed-by: default avatarAvri Altman <avri.altman@wdc.com>
      Reviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
      Signed-off-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      c7cee3e7
    • Stanley Chu's avatar
      scsi: ufs: Remove unnecessary memset for dev_info · 3a66ae51
      Stanley Chu authored
      The whole UFS host instance has been zero-initialized by scsi_host_alloc(),
      thus UFS driver does not need to clear "dev_info" member specifically in
      ufshcd_device_params_init().
      
      Simply remove the unnecessary code.
      
      Link: https://lore.kernel.org/r/20200522083212.4008-2-stanley.chu@mediatek.comReviewed-by: default avatarAvri Altman <avri.altman@wdc.com>
      Reviewed-by: default avatarAsutosh Das <asutoshd@codeaurora.org>
      Signed-off-by: default avatarStanley Chu <stanley.chu@mediatek.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3a66ae51
    • Jeffrey Hugo's avatar
      scsi: ufs-qcom: Fix scheduling while atomic issue · 3be60b56
      Jeffrey Hugo authored
      ufs_qcom_dump_dbg_regs() uses usleep_range, a sleeping function, but can be
      called from atomic context in the following flow:
      
      ufshcd_intr -> ufshcd_sl_intr -> ufshcd_check_errors ->
      ufshcd_print_host_regs -> ufshcd_vops_dbg_register_dump ->
      ufs_qcom_dump_dbg_regs
      
      This causes a boot crash on the Lenovo Miix 630 when the interrupt is
      handled on the idle thread.
      
      Fix the issue by switching to udelay().
      
      Link: https://lore.kernel.org/r/20200525204125.46171-1-jeffrey.l.hugo@gmail.com
      Fixes: 9c46b867 ("scsi: ufs-qcom: dump additional testbus registers")
      Reviewed-by: default avatarBean Huo <beanhuo@micron.com>
      Reviewed-by: default avatarAvri Altman <avri.altman@wdc.com>
      Signed-off-by: default avatarJeffrey Hugo <jeffrey.l.hugo@gmail.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      3be60b56
  2. 26 May, 2020 7 commits
  3. 20 May, 2020 27 commits