1. 24 May, 2022 2 commits
    • Coly Li's avatar
      bcache: improve multithreaded bch_sectors_dirty_init() · 4dc34ae1
      Coly Li authored
      Commit b144e45f ("bcache: make bch_sectors_dirty_init() to be
      multithreaded") makes bch_sectors_dirty_init() to be much faster
      when counting dirty sectors by iterating all dirty keys in the btree.
      But it isn't in ideal shape yet, still can be improved.
      
      This patch does the following changes to improve current parallel dirty
      keys iteration on the btree,
      - Add read lock to root node when multiple threads iterating the btree,
        to prevent the root node gets split by I/Os from other registered
        bcache devices.
      - Remove local variable "char name[32]" and generate kernel thread name
        string directly when calling kthread_run().
      - Allocate "struct bch_dirty_init_state state" directly on stack and
        avoid the unnecessary dynamic memory allocation for it.
      - Decrease BCH_DIRTY_INIT_THRD_MAX from 64 to 12 which is enough indeed.
      - Increase &state->started to count created kernel thread after it
        succeeds to create.
      - When wait for all dirty key counting threads to finish, use
        wait_event() to replace wait_event_interruptible().
      
      With the above changes, the code is more clear, and some potential error
      conditions are avoided.
      
      Fixes: b144e45f ("bcache: make bch_sectors_dirty_init() to be multithreaded")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220524102336.10684-3-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4dc34ae1
    • Coly Li's avatar
      bcache: improve multithreaded bch_btree_check() · 62253644
      Coly Li authored
      Commit 8e710227 ("bcache: make bch_btree_check() to be
      multithreaded") makes bch_btree_check() to be much faster when checking
      all btree nodes during cache device registration. But it isn't in ideal
      shap yet, still can be improved.
      
      This patch does the following thing to improve current parallel btree
      nodes check by multiple threads in bch_btree_check(),
      - Add read lock to root node while checking all the btree nodes with
        multiple threads. Although currently it is not mandatory but it is
        good to have a read lock in code logic.
      - Remove local variable 'char name[32]', and generate kernel thread name
        string directly when calling kthread_run().
      - Allocate local variable "struct btree_check_state check_state" on the
        stack and avoid unnecessary dynamic memory allocation for it.
      - Reduce BCH_BTR_CHKTHREAD_MAX from 64 to 12 which is enough indeed.
      - Increase check_state->started to count created kernel thread after it
        succeeds to create.
      - When wait for all checking kernel threads to finish, use wait_event()
        to replace wait_event_interruptible().
      
      With this change, the code is more clear, and some potential error
      conditions are avoided.
      
      Fixes: 8e710227 ("bcache: make bch_btree_check() to be multithreaded")
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220524102336.10684-2-colyli@suse.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      62253644
  2. 23 May, 2022 6 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · df7e7f2b
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-5.19/drivers
      
      Pull MD updates from Song:
      
      "- Remove uses of bdevname, by Christoph Hellwig;
       - Bug fixes by Guoqing Jiang, and Xiao Ni."
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: fix double free of io_acct_set bioset
        md: Don't set mddev private to NULL in raid0 pers->free
        md: remove most calls to bdevname
        md: protect md_unregister_thread from reentrancy
        md: don't unregister sync_thread with reconfig_mutex held
      df7e7f2b
    • Xiao Ni's avatar
      md: fix double free of io_acct_set bioset · 42b805af
      Xiao Ni authored
      Now io_acct_set is alloc and free in personality. Remove the codes that
      free io_acct_set in md_free and md_stop.
      
      Fixes: 0c031fd3 (md: Move alloc/free acct bioset in to personality)
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      42b805af
    • Xiao Ni's avatar
      md: Don't set mddev private to NULL in raid0 pers->free · 0f2571ad
      Xiao Ni authored
      In normal stop process, it does like this:
         do_md_stop
            |
         __md_stop (pers->free(); mddev->private=NULL)
            |
         md_free (free mddev)
      __md_stop sets mddev->private to NULL after pers->free. The raid device
      will be stopped and mddev memory is free. But in reshape, it doesn't
      free the mddev and mddev will still be used in new raid.
      
      In reshape, it first sets mddev->private to new_pers and then runs
      old_pers->free(). Now raid0 sets mddev->private to NULL in raid0_free.
      The new raid can't work anymore. It will panic when dereference
      mddev->private because of NULL pointer dereference.
      
      It can panic like this:
      [63010.814972] kernel BUG at drivers/md/raid10.c:928!
      [63010.819778] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [63010.825011] CPU: 3 PID: 44437 Comm: md0_resync Kdump: loaded Not tainted 5.14.0-86.el9.x86_64 #1
      [63010.833789] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.15.0 09/11/2020
      [63010.841440] RIP: 0010:raise_barrier+0x161/0x170 [raid10]
      [63010.865508] RSP: 0018:ffffc312408bbc10 EFLAGS: 00010246
      [63010.870734] RAX: 0000000000000000 RBX: ffffa00bf7d39800 RCX: 0000000000000000
      [63010.877866] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffa00bf7d39800
      [63010.884999] RBP: 0000000000000000 R08: fffffa4945e74400 R09: 0000000000000000
      [63010.892132] R10: ffffa00eed02f798 R11: 0000000000000000 R12: ffffa00bbc435200
      [63010.899266] R13: ffffa00bf7d39800 R14: 0000000000000400 R15: 0000000000000003
      [63010.906399] FS:  0000000000000000(0000) GS:ffffa00eed000000(0000) knlGS:0000000000000000
      [63010.914485] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [63010.920229] CR2: 00007f5cfbe99828 CR3: 0000000105efe000 CR4: 00000000003506e0
      [63010.927363] Call Trace:
      [63010.929822]  ? bio_reset+0xe/0x40
      [63010.933144]  ? raid10_alloc_init_r10buf+0x60/0xa0 [raid10]
      [63010.938629]  raid10_sync_request+0x756/0x1610 [raid10]
      [63010.943770]  md_do_sync.cold+0x3e4/0x94c
      [63010.947698]  md_thread+0xab/0x160
      [63010.951024]  ? md_write_inc+0x50/0x50
      [63010.954688]  kthread+0x149/0x170
      [63010.957923]  ? set_kthread_struct+0x40/0x40
      [63010.962107]  ret_from_fork+0x22/0x30
      
      Removing the code that sets mddev->private to NULL in raid0 can fix
      problem.
      
      Fixes: 0c031fd3 (md: Move alloc/free acct bioset in to personality)
      Reported-by: default avatarFine Fan <ffan@redhat.com>
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      0f2571ad
    • Christoph Hellwig's avatar
      md: remove most calls to bdevname · 913cce5a
      Christoph Hellwig authored
      Use the %pg format specifier to save on stack consumption and code size.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      913cce5a
    • Guoqing Jiang's avatar
      md: protect md_unregister_thread from reentrancy · 1e267742
      Guoqing Jiang authored
      Generally, the md_unregister_thread is called with reconfig_mutex, but
      raid_message in dm-raid doesn't hold reconfig_mutex to unregister thread,
      so md_unregister_thread can be called simulitaneously from two call sites
      in theory.
      
      Then after previous commit which remove the protection of reconfig_mutex
      for md_unregister_thread completely, the potential issue could be worse
      than before.
      
      Let's take pers_lock at the beginning of function to ensure reentrancy.
      Reported-by: default avatarDonald Buczek <buczek@molgen.mpg.de>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      1e267742
    • Guoqing Jiang's avatar
      md: don't unregister sync_thread with reconfig_mutex held · 8b48ec23
      Guoqing Jiang authored
      Unregister sync_thread doesn't need to hold reconfig_mutex since it
      doesn't reconfigure array.
      
      And it could cause deadlock problem for raid5 as follows:
      
      1. process A tried to reap sync thread with reconfig_mutex held after echo
         idle to sync_action.
      2. raid5 sync thread was blocked if there were too many active stripes.
      3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
         which causes the number of active stripes can't be decreased.
      4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
         to hold reconfig_mutex.
      
      More details in the link:
      https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@molgen.mpg.de/T/#t
      
      And add one parameter to md_reap_sync_thread since it could be called by
      dm-raid which doesn't hold reconfig_mutex.
      Reported-and-tested-by: default avatarDonald Buczek <buczek@molgen.mpg.de>
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      8b48ec23
  3. 21 May, 2022 1 commit
  4. 20 May, 2022 1 commit
  5. 19 May, 2022 1 commit
    • Chaitanya Kulkarni's avatar
      nvme: set non-mdts limits in nvme_scan_work · 78288665
      Chaitanya Kulkarni authored
      In current implementation we set the non-mdts limits by calling
      nvme_init_non_mdts_limits() from nvme_init_ctrl_finish().
      This also tries to set the limits for the discovery controller which
      has no I/O queues resulting in the warning message reported by the
      nvme_log_error() when running blktest nvme/002: -
      
      [ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47
      [ 2005.192223] loop: module loaded
      [ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0
      [ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
      
      <------------------------------SNIP---------------------------------->
      
      [ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997
      [ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998
      [ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999
      [ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn.
      *[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR*
      [ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery"
      [ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
      
      Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after
      we verify that I/O queues are created since that is a converging point
      for each transport where these limits are actually used.
      
      1. FC :
      nvme_fc_create_association()
       ...
       nvme_fc_create_io_queues(ctrl);
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      2. PCIe:-
      nvme_reset_work()
       ...
       nvme_setup_io_queues()
        nvme_create_io_queues()
         nvme_alloc_queue()
       ...
       nvme_start_ctrl()
        nvme_scan_queue()
         nvme_scan_work()
      
      3. RDMA :-
      nvme_rdma_setup_ctrl
       ...
        nvme_rdma_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      4. TCP :-
      nvme_tcp_setup_ctrl
       ...
        nvme_tcp_configure_io_queues
        ...
        nvme_start_ctrl()
         nvme_scan_queue()
          nvme_scan_work()
      
      * nvme_scan_work()
      ...
      nvme_validate_or_alloc_ns()
        nvme_alloc_ns()
         nvme_update_ns_info()
          nvme_update_disk_info()
           nvme_config_discard() <---
           blk_queue_max_write_zeroes_sectors() <---
      Signed-off-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      78288665
  6. 18 May, 2022 2 commits
    • Christoph Hellwig's avatar
      nvme: add support for TP4084 - Time-to-Ready Enhancements · 354201c5
      Christoph Hellwig authored
      Add support for using longer timeouts during controller initialization
      and letting the controller come up with namespaces that are not ready
      for I/O yet.  We skip these not ready namespaces during scanning and
      only bring them online once anoter scan is kicked off by the AEN that
      is set when the NRDY bit gets set in the  I/O Command Set Independent
      Identify Namespace Data Structure.   This asynchronous probing avoids
      blocking the kernel boot when controllers take a very long time to
      recover after unclean shutdowns (up to minutes).
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      354201c5
    • Jens Axboe's avatar
      Merge tag 'nvme-5.19-2022-05-18' of git://git.infradead.org/nvme into for-5.19/drivers · da14f237
      Jens Axboe authored
      Pull NVMe updates from Christoph:
      
      "nvme updates for Linux 5.19
      
       - tighten the PCI presence check (Stefan Roese):
       - fix a potential NULL pointer dereference in an error path
         (Kyle Miller Smith)
       - fix interpretation of the DMRSL field (Tom Yan)
       - relax the data transfer alignment (Keith Busch)
       - verbose error logging improvements (Max Gurtovoy, Chaitanya Kulkarni)
       - misc cleanups (Chaitanya Kulkarni, me)"
      
      * tag 'nvme-5.19-2022-05-18' of git://git.infradead.org/nvme:
        nvme: split the enum used for various register constants
        nvme-fabrics: add a request timeout helper
        nvme-pci: harden drive presence detect in nvme_dev_disable()
        nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
        nvme: mark internal passthru request RQF_QUIET
        nvme: remove unneeded include from constants file
        nvme: add missing status values to verbose logging
        nvme: set dma alignment to dword
        nvme: fix interpretation of DMRSL
      da14f237
  7. 17 May, 2022 1 commit
  8. 16 May, 2022 9 commits
  9. 10 May, 2022 4 commits
  10. 04 May, 2022 5 commits
  11. 03 May, 2022 8 commits