1. 26 Jun, 2023 4 commits
  2. 25 Jun, 2023 2 commits
  3. 23 Jun, 2023 10 commits
    • Jan Kara's avatar
      bcache: Fix bcache device claiming · 2c555598
      Jan Kara authored
      Commit 2736e8ee ("block: use the holder as indication for exclusive
      opens") introduced a change that blkdev_put() has to get exclusive
      holder of the bdev as an argument. However it overlooked that
      register_bdev() and register_cache() overwrite the bdev->bd_holder field
      in the block device to point to the real owning object which was not
      available at the time we called blkdev_get_by_path(). Messing with bdev
      internals like this is a layering violation and it also causes
      blkdev_put() to issue warning about mismatching holders.
      
      Fix bcache to reopen the block device with appropriate holder once it is
      available which also restores the behavior that multiple bcache caches
      cannot claim the same device which was broken by commit 29499ab0
      ("bcache: don't pass a stack address to blkdev_get_by_path").
      
      Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Acked-by: default avatarColy Li <colyli@suse.de>
      Link: https://lore.kernel.org/r/20230622164658.12861-2-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2c555598
    • Jan Kara's avatar
      bcache: Alloc holder object before async registration · abcc0cbd
      Jan Kara authored
      Allocate holder object (cache or cached_dev) before offloading the
      rest of the startup to async work. This will allow us to open the block
      block device with proper holder.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarColy Li <colyli@suse.de>
      Reviewed-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      Link: https://lore.kernel.org/r/20230622164658.12861-1-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      abcc0cbd
    • Jens Axboe's avatar
      Merge tag 'md-next-20230623' of... · c36591f6
      Jens Axboe authored
      Merge tag 'md-next-20230623' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.5/block-late
      
      Pull MD fixes from Song.
      
      * tag 'md-next-20230623' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        raid10: avoid spin_lock from fastpath from raid10_unplug()
        md: fix 'delete_mutex' deadlock
        md: use mddev->external to select holder in export_rdev()
        md/raid1-10: fix casting from randomized structure in raid1_submit_write()
        md/raid10: fix the condition to call bio_end_io_acct()
      c36591f6
    • Yu Kuai's avatar
      raid10: avoid spin_lock from fastpath from raid10_unplug() · a8d5fdd4
      Yu Kuai authored
      Commit 0c0be98b ("md/raid10: prevent unnecessary calls to wake_up()
      in fast path") missed one place, for example, with:
      
      	fio -direct=1 -rw=write/randwrite -iodepth=1 ...
      
      Plug and unplug are called for each io, then wake_up() from raid10_unplug()
      will cause lock contention as well.
      
      Avoid this contention by using wake_up_barrier() instead of wake_up(),
      where spin_lock is not held if waitqueue is empty.
      
      Fio test script:
      
      [global]
      name=random reads and writes
      ioengine=libaio
      direct=1
      readwrite=randrw
      rwmixread=70
      iodepth=64
      buffered=0
      filename=/dev/md0
      size=1G
      runtime=30
      time_based
      randrepeat=0
      norandommap
      refill_buffers
      ramp_time=10
      bs=4k
      numjobs=400
      group_reporting=1
      [job1]
      
      Test result with ramdisk raid10(By Ali):
      
      	Before this patch	With this patch
      READ	IOPS=2033k		IOPS=3642k
      WRITE	IOPS=871k		IOPS=1561K
      
      By the way, in this scenario, blk_plug_cb() will be allocated and freed
      for each io, this seems need to be optimized as well.
      Reported-and-tested-by: default avatarAli Gholami Rudi <aligrudi@gmail.com>
      Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
      a8d5fdd4
    • Yu Kuai's avatar
      md: fix 'delete_mutex' deadlock · 4934b640
      Yu Kuai authored
      Commit 3ce94ce5 ("md: fix duplicate filename for rdev") introduce a
      new lock 'delete_mutex', and trigger a new deadlock:
      
      t1: remove rdev			t2: sysfs writer
      
      rdev_attr_store			rdev_attr_store
       mddev_lock
       state_store
       md_kick_rdev_from_array
        lock delete_mutex
        list_add mddev->deleting
        unlock delete_mutex
       mddev_unlock
      				 mddev_lock
      				 ...
        lock delete_mutex
        kobject_del
        // wait for sysfs writers to be done
      				 mddev_unlock
      				 lock delete_mutex
      				 // wait for delete_mutex, deadlock
      
      'delete_mutex' is used to protect the list 'mddev->deleting', turns out
      that this list can be protected by 'reconfig_mutex' directly, and this
      lock can be removed.
      
      Fix this problem by removing the lock, and use 'reconfig_mutex' to
      protect the list. mddev_unlock() will move this list to a local list to
      be handled after 'reconfig_mutex' is dropped.
      
      Fixes: 3ce94ce5 ("md: fix duplicate filename for rdev")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230621142933.1395629-1-yukuai1@huaweicloud.com
      4934b640
    • Song Liu's avatar
      md: use mddev->external to select holder in export_rdev() · a1d76719
      Song Liu authored
      mdadm test "10ddf-create-fail-rebuild" triggers warnings like the following
      
      [  215.526357] ------------[ cut here ]------------
      [  215.527243] WARNING: CPU: 18 PID: 1264 at block/bdev.c:617 blkdev_put+0x269/0x350
      [  215.528334] Modules linked in:
      [  215.528806] CPU: 18 PID: 1264 Comm: mdmon Not tainted 6.4.0-rc2+ #768
      [  215.529863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      [  215.531464] RIP: 0010:blkdev_put+0x269/0x350
      [  215.532167] Code: ff ff 49 8d 7d 10 e8 56 bf b8 ff 4d 8b 65 10 49 8d bc
      24 58 05 00 00 e8 05 be b8 ff 41 83 ac 24 58 05 00 00 01 e9 44 ff ff ff
      <0f> 0b e9 52 fe ff ff 0f 0b e9 6b fe ff ff1
      [  215.534780] RSP: 0018:ffffc900040bfbf0 EFLAGS: 00010283
      [  215.535635] RAX: ffff888174001000 RBX: ffff88810b1c3b00 RCX: ffffffff819a4061
      [  215.536645] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88810b1c3ba0
      [  215.537657] RBP: ffff88810dbde800 R08: fffffbfff0fca983 R09: fffffbfff0fca983
      [  215.538674] R10: ffffc900040bfbf0 R11: fffffbfff0fca982 R12: ffff88810b1c3b38
      [  215.539687] R13: ffff88810b1c3b10 R14: ffff88810dbdecb8 R15: ffff88810b1c3b00
      [  215.540833] FS:  00007f2aabdff700(0000) GS:ffff888dfb400000(0000) knlGS:0000000000000000
      [  215.541961] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  215.542775] CR2: 00007fa19a85d934 CR3: 000000010c076006 CR4: 0000000000370ee0
      [  215.543814] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  215.544840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  215.545885] Call Trace:
      [  215.546257]  <TASK>
      [  215.546608]  export_rdev.isra.63+0x71/0xe0
      [  215.547338]  mddev_unlock+0x1b1/0x2d0
      [  215.547898]  array_state_store+0x28d/0x450
      [  215.548519]  md_attr_store+0xd7/0x150
      [  215.549059]  ? __pfx_sysfs_kf_write+0x10/0x10
      [  215.549702]  kernfs_fop_write_iter+0x1b9/0x260
      [  215.550351]  vfs_write+0x491/0x760
      [  215.550863]  ? __pfx_vfs_write+0x10/0x10
      [  215.551445]  ? __fget_files+0x156/0x230
      [  215.552053]  ksys_write+0xc0/0x160
      [  215.552570]  ? __pfx_ksys_write+0x10/0x10
      [  215.553141]  ? ktime_get_coarse_real_ts64+0xec/0x100
      [  215.553878]  do_syscall_64+0x3a/0x90
      [  215.554403]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [  215.555125] RIP: 0033:0x7f2aade11847
      [  215.555696] Code: c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec
      10 e8 1b fd ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 b8 01 00 00 00 0f 05
      <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 448
      [  215.558398] RSP: 002b:00007f2aabdfeba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      [  215.559516] RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f2aade11847
      [  215.560515] RDX: 0000000000000005 RSI: 0000000000438b8b RDI: 0000000000000010
      [  215.561512] RBP: 0000000000438b8b R08: 0000000000000000 R09: 00007f2aaecf0060
      [  215.562511] R10: 000000000e3ba40b R11: 0000000000000293 R12: 0000000000000005
      [  215.563647] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000c70750
      [  215.564693]  </TASK>
      [  215.565029] irq event stamp: 15979
      [  215.565584] hardirqs last  enabled at (15991): [<ffffffff811a7432>] __up_console_sem+0x52/0x60
      [  215.566806] hardirqs last disabled at (16000): [<ffffffff811a7417>] __up_console_sem+0x37/0x60
      [  215.568022] softirqs last  enabled at (15716): [<ffffffff8277a2db>] __do_softirq+0x3eb/0x531
      [  215.569239] softirqs last disabled at (15711): [<ffffffff810d8f45>] irq_exit_rcu+0x115/0x160
      [  215.570434] ---[ end trace 0000000000000000 ]---
      
      This means export_rdev() calls blkdev_put with a different holder than the
      one used by blkdev_get_by_dev(). This is because mddev->major_version == -2
      is not a good check for external metadata. Fix this by using
      mddev->external instead.
      
      Also, do not clear mddev->external in md_clean(), as the flag might be used
      later in export_rdev().
      
      Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230617052405.305871-1-song@kernel.org
      a1d76719
    • Yu Kuai's avatar
      md/raid1-10: fix casting from randomized structure in raid1_submit_write() · b5a99602
      Yu Kuai authored
      Following build error triggered while build with clang version 17.0.0
      with W=1(this can't be reporduced with gcc 13.1.0):
      
      drivers/md/raid1-10.c:117:25: error: casting from randomized structure
      pointer type 'struct block_device *' to 'struct md_rdev *'
           117 |         struct md_rdev *rdev = (struct md_rdev *)bio->bi_bdev;
               |                                ^
      
      Fix this by casting 'bio->bi_bdev' to 'void *', as it used to be.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202306142042.fmjfmTF8-lkp@intel.com/
      Fixes: 8295efbe ("md/raid1-10: factor out a helper to submit normal write")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230616012136.3047071-1-yukuai1@huaweicloud.com
      b5a99602
    • Li Nan's avatar
      md/raid10: fix the condition to call bio_end_io_acct() · 125bfc7c
      Li Nan authored
      /sys/block/[device]/queue/iostats is used to control whether to count io
      stat. Write 0 to it will clear queue_flags QUEUE_FLAG_IO_STAT which means
      iostats is disabled. If we disable iostats and later endable it, the io
      issued during this period will be counted incorrectly, inflight will be
      decreased to -1.
      
        //T1 set iostats
        echo 0 > /sys/block/md0/queue/iostats
         clear QUEUE_FLAG_IO_STAT
      
      			//T2 issue io
      			if (QUEUE_FLAG_IO_STAT) -> false
      			 bio_start_io_acct
      			  inflight++
      
        echo 1 > /sys/block/md0/queue/iostats
         set QUEUE_FLAG_IO_STAT
      
      					//T3 io end
      					if (QUEUE_FLAG_IO_STAT) -> true
      					 bio_end_io_acct
      					  inflight--	-> -1
      
      Also, if iostats is enabled while issuing io but disabled while io end,
      inflight will never be decreased.
      
      Fix it by checking start_time when io end. If start_time is not 0, call
      bio_end_io_acct().
      
      Fixes: 528bc2cf ("md/raid10: enable io accounting")
      Signed-off-by: default avatarLi Nan <linan122@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20230609094320.2397604-1-linan666@huaweicloud.com
      125bfc7c
    • Yu Kuai's avatar
      scsi/sg: don't grab scsi host module reference · fcaa174a
      Yu Kuai authored
      In order to prevent request_queue to be freed before cleaning up
      blktrace debugfs entries, commit db59133e ("scsi: sg: fix blktrace
      debugfs entries leakage") use scsi_device_get(), however,
      scsi_device_get() will also grab scsi module reference and scsi module
      can't be removed.
      
      It's reported that blktests can't unload scsi_debug after block/001:
      
      blktests (master) # ./check block
      block/001 (stress device hotplugging) [failed]
           +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
            Running block/001
            Stressing sd
           +modprobe: FATAL: Module scsi_debug is in use.
      
      Fix this problem by grabbing request_queue reference directly, so that
      scsi host module can still be unloaded while request_queue will be
      pinged by sg device.
      Reported-by: default avatarChaitanya Kulkarni <chaitanyak@nvidia.com>
      Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
      Fixes: db59133e ("scsi: sg: fix blktrace debugfs entries leakage")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fcaa174a
    • Jan Kara's avatar
      ext4: Fix warning in blkdev_put() · a42fb5a7
      Jan Kara authored
      ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which
      triggers a warning there. Fix it.
      
      Fixes: 2736e8ee ("block: use the holder as indication for exclusive opens")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.czSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a42fb5a7
  4. 22 Jun, 2023 2 commits
  5. 21 Jun, 2023 7 commits
  6. 20 Jun, 2023 11 commits
  7. 16 Jun, 2023 4 commits