1. 22 Mar, 2020 4 commits
    • Coly Li's avatar
      bcache: make bch_sectors_dirty_init() to be multithreaded · b144e45f
      Coly Li authored
      When attaching a cached device (a.k.a backing device) to a cache
      device, bch_sectors_dirty_init() is called to count dirty sectors
      and stripes (see what bcache_dev_sectors_dirty_add() does) on the
      cache device.
      
      The counting is done by a single thread recursive function
      bch_btree_map_keys() to iterate all the bcache btree nodes.
      If the btree has huge number of nodes, bch_sectors_dirty_init() will
      take quite long time. In my testing, if the registering cache set has
      a existed UUID which matches a already registered cached device, the
      automatical attachment during the registration may take more than
      55 minutes. This is too long for waiting the bcache to work in real
      deployment.
      
      Fortunately when bch_sectors_dirty_init() is called, no other thread
      will access the btree yet, it is safe to do a read-only parallelized
      dirty sectors counting by multiple threads.
      
      This patch tries to create multiple threads, and each thread tries to
      one-by-one count dirty sectors from the sub-tree indexed by a root
      node key which the thread fetched. After the sub-tree is counted, the
      counting thread will continue to fetch another root node key, until
      the fetched key is NULL. How many threads in parallel depends on
      the number of keys from the btree root node, and the number of online
      CPU core. The thread number will be the less number but no more than
      BCH_DIRTY_INIT_THRD_MAX. If there are only 2 keys in root node, it
      can only be 2x times faster by this patch. But if there are 10 keys
      in the root node, with this patch it can be 10x times faster.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b144e45f
    • Coly Li's avatar
      bcache: make bch_btree_check() to be multithreaded · 8e710227
      Coly Li authored
      When registering a cache device, bch_btree_check() is called to check
      all btree nodes, to make sure the btree is consistent and not
      corrupted.
      
      bch_btree_check() is recursively executed in a single thread, when there
      are a lot of data cached and the btree is huge, it may take very long
      time to check all the btree nodes. In my testing, I observed it took
      around 50 minutes to finish bch_btree_check().
      
      When checking the bcache btree nodes, the cache set is not running yet,
      and indeed the whole tree is in read-only state, it is safe to create
      multiple threads to check the btree in parallel.
      
      This patch tries to create multiple threads, and each thread tries to
      one-by-one check the sub-tree indexed by a key from the btree root node.
      The parallel thread number depends on how many keys in the btree root
      node. At most BCH_BTR_CHKTHREAD_MAX (64) threads can be created, but in
      practice is should be min(cpu-number/2, root-node-keys-number).
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8e710227
    • Coly Li's avatar
      bcache: add bcache_ prefix to btree_root() and btree() macros · feac1a70
      Coly Li authored
      This patch changes macro btree_root() and btree() to bcache_btree_root()
      and bcache_btree(), to avoid potential generic name clash in future.
      
      NOTE: for product kernel maintainers, this patch can be skipped if
      you feel the rename stuffs introduce inconvenince to patch backport.
      Suggested-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      feac1a70
    • Coly Li's avatar
      bcache: move macro btree() and btree_root() into btree.h · 253a99d9
      Coly Li authored
      In order to accelerate bcache registration speed, the macro btree()
      and btree_root() will be referenced out of btree.c. This patch moves
      them from btree.c into btree.h with other relative function declaration
      in btree.h, for the following changes.
      Signed-off-by: default avatarColy Li <colyli@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      253a99d9
  2. 19 Mar, 2020 1 commit
    • Gustavo A. R. Silva's avatar
      rsxx: Replace zero-length array with flexible-array member · 431d6e3e
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertenly introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      431d6e3e
  3. 17 Mar, 2020 3 commits
  4. 16 Mar, 2020 16 commits
  5. 13 Mar, 2020 1 commit
  6. 12 Mar, 2020 6 commits
  7. 10 Mar, 2020 9 commits
    • Martijn Coenen's avatar
      loop: Only freeze block queue when needed. · 0fbcf579
      Martijn Coenen authored
      __loop_update_dio() can be called as a part of loop_set_fd(), when the
      block queue is not yet up and running; avoid freezing the block queue in
      that case, since that is an expensive operation.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0fbcf579
    • Martijn Coenen's avatar
      loop: Only change blocksize when needed. · 7e81f99a
      Martijn Coenen authored
      Return early in loop_set_block_size() if the requested block size is
      identical to the one we already have; this avoids expensive calls to
      freeze the block queue.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMartijn Coenen <maco@android.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7e81f99a
    • Bart Van Assche's avatar
      null_blk: Add support for init_hctx() fault injection · 596444e7
      Bart Van Assche authored
      This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
      and also several error paths in null_blk.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      596444e7
    • Bart Van Assche's avatar
      null_blk: Handle null_add_dev() failures properly · 9b03b713
      Bart Van Assche authored
      If null_add_dev() fails then null_del_dev() is called with a NULL argument.
      Make null_del_dev() handle this scenario correctly. This patch fixes the
      following KASAN complaint:
      
      null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
      Read of size 8 at addr 0000000000000000 by task find/1062
      
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       null_del_dev+0x28/0x280 [null_blk]
       nullb_group_drop_item+0x7e/0xa0 [null_blk]
       client_drop_item+0x53/0x80 [configfs]
       configfs_rmdir+0x395/0x4e0 [configfs]
       vfs_rmdir+0xb6/0x220
       do_rmdir+0x238/0x2c0
       __x64_sys_unlinkat+0x75/0x90
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9b03b713
    • Bart Van Assche's avatar
      null_blk: Fix the null_add_dev() error path · 2004bfde
      Bart Van Assche authored
      If null_add_dev() fails, clear dev->nullb.
      
      This patch fixes the following KASAN complaint:
      
      BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
      Read of size 8 at addr ffff88803280fc30 by task check/8409
      
      Call Trace:
       dump_stack+0xa5/0xe6
       print_address_description.constprop.0+0x26/0x260
       __kasan_report.cold+0x7b/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7ff370926317
      Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
      RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
      RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
      R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0
      
      Allocated by task 8409:
       save_stack+0x23/0x90
       __kasan_kmalloc.constprop.0+0xcf/0xe0
       kasan_kmalloc+0xd/0x10
       kmem_cache_alloc_node_trace+0x129/0x4c0
       null_add_dev+0x24a/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8409:
       save_stack+0x23/0x90
       __kasan_slab_free+0x112/0x160
       kasan_slab_free+0x12/0x20
       kfree+0xdf/0x250
       null_add_dev+0xaf3/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 2984c868 ("nullb: factor disk parameters")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2004bfde
    • Bart Van Assche's avatar
      null_blk: Fix changing the number of hardware queues · 78b10be2
      Bart Van Assche authored
      Instead of initializing null_blk hardware queues explicitly after the
      request queue has been created, provide .init_hctx() and .exit_hctx()
      callback functions. The latter functions are not only called during
      request queue allocation but also when the number of hardware queues
      changes. Allocate nr_cpu_ids queues during initialization to support
      increasing the number of hardware queues above the initial hardware
      queue count.
      
      This change fixes increasing the number of hardware queues above the
      initial number of hardware queues and also keeps nullb->nr_queues in
      sync with the number of hardware queues.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78b10be2
    • Bart Van Assche's avatar
      null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed' · b9853b4d
      Bart Van Assche authored
      Although it is not clear to me why UBSAN complains when 'memory_backed'
      is set, this patch suppresses the UBSAN complaint that is triggered when
      setting that configfs attribute.
      
      UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
      load of value 16 is not a valid value for type '_Bool'
      CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       ubsan_epilogue+0x9/0x26
       __ubsan_handle_load_invalid_value+0x6d/0x76
       nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b9853b4d
    • Bart Van Assche's avatar
      blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs() · d0930bb8
      Bart Van Assche authored
      q->nr_hw_queues must only be updated once it is known that
      blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that
      reallocation fails and that q->nr_hw_queues is larger than the number of
      allocated hardware queues. This patch fixes the following crash if
      increasing the number of hardware queues fails:
      
      BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810
      Write of size 8 at addr 0000000000000118 by task check/977
      
      CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       check_memory_region+0x140/0x1b0
       memset+0x28/0x40
       blk_mq_map_swqueue+0x775/0x810
       blk_mq_update_nr_hw_queues+0x468/0x710
       nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: ac0d6b92 ("block: Reduce the amount of memory required per request queue")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d0930bb8
    • Bart Van Assche's avatar
      blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync · 6e66b493
      Bart Van Assche authored
      blk_mq_map_queues() and multiple .map_queues() implementations expect that
      set->map[HCTX_TYPE_DEFAULT].nr_queues is set to the number of hardware
      queues. Hence set .nr_queues before calling these functions. This patch
      fixes the following kernel warning:
      
      WARNING: CPU: 0 PID: 2501 at include/linux/cpumask.h:137
      Call Trace:
       blk_mq_run_hw_queue+0x19d/0x350 block/blk-mq.c:1508
       blk_mq_run_hw_queues+0x112/0x1a0 block/blk-mq.c:1525
       blk_mq_requeue_work+0x502/0x780 block/blk-mq.c:775
       process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
       worker_thread+0x98/0xe40 kernel/workqueue.c:2415
       kthread+0x361/0x430 kernel/kthread.c:255
      
      Fixes: ed76e329 ("blk-mq: abstract out queue map") # v5.0
      Reported-by: syzbot+d44e1b26ce5c3e77458d@syzkaller.appspotmail.com
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6e66b493