1. 11 Sep, 2024 2 commits
  2. 10 Sep, 2024 10 commits
  3. 07 Sep, 2024 2 commits
  4. 06 Sep, 2024 10 commits
  5. 05 Sep, 2024 1 commit
  6. 04 Sep, 2024 3 commits
  7. 03 Sep, 2024 5 commits
    • Jens Axboe's avatar
      MAINTAINERS: move the BFQ io scheduler to orphan state · 761e5afb
      Jens Axboe authored
      Nobody is maintaining this code, and it just falls under the umbrella
      of block layer code. But at least mark it as such, in case anyone wants
      to care more deeply about it and assume the responsibility of doing so.
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      761e5afb
    • Yu Kuai's avatar
      block, bfq: use bfq_reassign_last_bfqq() in bfq_bfqq_move() · f45916ae
      Yu Kuai authored
      Instead of open coding it, there are no functional changes.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240902130329.3787024-5-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f45916ae
    • Yu Kuai's avatar
      block, bfq: don't break merge chain in bfq_split_bfqq() · 42c306ed
      Yu Kuai authored
      Consider the following scenario:
      
          Process 1       Process 2       Process 3       Process 4
           (BIC1)          (BIC2)          (BIC3)          (BIC4)
            Λ               |               |                |
             \-------------\ \-------------\ \--------------\|
                            V               V                V
            bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
      ref    0              1               2                4
      
      If Process 1 issue a new IO and bfqq2 is found, and then bfq_init_rq()
      decide to spilt bfqq2 by bfq_split_bfqq(). Howerver, procress reference
      of bfqq2 is 1 and bfq_split_bfqq() just clear the coop flag, which will
      break the merge chain.
      
      Expected result: caller will allocate a new bfqq for BIC1
      
          Process 1       Process 2       Process 3       Process 4
           (BIC1)          (BIC2)          (BIC3)          (BIC4)
                            |               |                |
                             \-------------\ \--------------\|
                                            V                V
            bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
      ref    0              0               1                3
      
      Since the condition is only used for the last bfqq4 when the previous
      bfqq2 and bfqq3 are already splited. Fix the problem by checking if
      bfqq is the last one in the merge chain as well.
      
      Fixes: 36eca894 ("block, bfq: add Early Queue Merge (EQM)")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240902130329.3787024-4-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      42c306ed
    • Yu Kuai's avatar
      block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator() · 0e456dba
      Yu Kuai authored
      Consider the following merge chain:
      
      Process 1       Process 2       Process 3	Process 4
       (BIC1)          (BIC2)          (BIC3)		 (BIC4)
        Λ                |               |               |
         \--------------\ \-------------\ \-------------\|
                         V               V		   V
        bfqq1--------->bfqq2---------->bfqq3----------->bfqq4
      
      IO from Process 1 will get bfqf2 from BIC1 first, then
      bfq_setup_cooperator() will found bfqq2 already merged to bfqq3 and then
      handle this IO from bfqq3. However, the merge chain can be much deeper
      and bfqq3 can be merged to other bfqq as well.
      
      Fix this problem by iterating to the last bfqq in
      bfq_setup_cooperator().
      
      Fixes: 36eca894 ("block, bfq: add Early Queue Merge (EQM)")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240902130329.3787024-3-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0e456dba
    • Yu Kuai's avatar
      block, bfq: fix possible UAF for bfqq->bic with merge chain · 18ad4df0
      Yu Kuai authored
      1) initial state, three tasks:
      
      		Process 1       Process 2	Process 3
      		 (BIC1)          (BIC2)		 (BIC3)
      		  |  Λ            |  Λ		  |  Λ
      		  |  |            |  |		  |  |
      		  V  |            V  |		  V  |
      		  bfqq1           bfqq2		  bfqq3
      process ref:	   1		    1		    1
      
      2) bfqq1 merged to bfqq2:
      
      		Process 1       Process 2	Process 3
      		 (BIC1)          (BIC2)		 (BIC3)
      		  |               |		  |  Λ
      		  \--------------\|		  |  |
      		                  V		  V  |
      		  bfqq1--------->bfqq2		  bfqq3
      process ref:	   0		    2		    1
      
      3) bfqq2 merged to bfqq3:
      
      		Process 1       Process 2	Process 3
      		 (BIC1)          (BIC2)		 (BIC3)
      	 here -> Λ                |		  |
      		  \--------------\ \-------------\|
      		                  V		  V
      		  bfqq1--------->bfqq2---------->bfqq3
      process ref:	   0		    1		    3
      
      In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then
      get bfqq3 through merge chain, and finially handle IO by bfqq3.
      Howerver, current code will think bfqq2 is owned by BIC1, like initial
      state, and set bfqq2->bic to BIC1.
      
      bfq_insert_request
      -> by Process 1
       bfqq = bfq_init_rq(rq)
        bfqq = bfq_get_bfqq_handle_split
         bfqq = bic_to_bfqq
         -> get bfqq2 from BIC1
       bfqq->ref++
       rq->elv.priv[0] = bic
       rq->elv.priv[1] = bfqq
       if (bfqq_process_refs(bfqq) == 1)
        bfqq->bic = bic
        -> record BIC1 to bfqq2
      
        __bfq_insert_request
         new_bfqq = bfq_setup_cooperator
         -> get bfqq3 from bfqq2->new_bfqq
         bfqq_request_freed(bfqq)
         new_bfqq->ref++
         rq->elv.priv[1] = new_bfqq
         -> handle IO by bfqq3
      
      Fix the problem by checking bfqq is from merge chain fist. And this
      might fix a following problem reported by our syzkaller(unreproducible):
      
      ==================================================================
      BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline]
      BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline]
      BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889
      Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595
      
      CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G             L     6.6.0-07439-gba2303cacfda #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Workqueue: kblockd blk_mq_requeue_work
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:364 [inline]
       print_report+0x10d/0x610 mm/kasan/report.c:475
       kasan_report+0x8e/0xc0 mm/kasan/report.c:588
       bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline]
       bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline]
       bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889
       bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757
       bfq_init_rq block/bfq-iosched.c:6876 [inline]
       bfq_insert_request block/bfq-iosched.c:6254 [inline]
       bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304
       blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593
       blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502
       process_one_work kernel/workqueue.c:2627 [inline]
       process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
       worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
       kthread+0x33c/0x440 kernel/kthread.c:388
       ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305
       </TASK>
      
      Allocated by task 20776:
       kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328
       kasan_slab_alloc include/linux/kasan.h:188 [inline]
       slab_post_alloc_hook mm/slab.h:763 [inline]
       slab_alloc_node mm/slub.c:3458 [inline]
       kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503
       ioc_create_icq block/blk-ioc.c:370 [inline]
       ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436
       bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812
       blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403
       __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517
       blk_mq_get_new_requests block/blk-mq.c:2940 [inline]
       blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042
       __submit_bio+0x331/0x6f0 block/blk-core.c:624
       __submit_bio_noacct_mq block/blk-core.c:703 [inline]
       submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732
       submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826
       xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958
       xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619
       xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330
       process_one_work kernel/workqueue.c:2627 [inline]
       process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
       worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
       kthread+0x33c/0x440 kernel/kthread.c:388
       ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305
      
      Freed by task 946:
       kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522
       ____kasan_slab_free mm/kasan/common.c:236 [inline]
       __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244
       kasan_slab_free include/linux/kasan.h:164 [inline]
       slab_free_hook mm/slub.c:1815 [inline]
       slab_free_freelist_hook mm/slub.c:1841 [inline]
       slab_free mm/slub.c:3786 [inline]
       kmem_cache_free+0x118/0x6f0 mm/slub.c:3808
       rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189
       rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462
       __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553
      
      Last potentially related work creation:
       kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
       __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492
       __call_rcu_common kernel/rcu/tree.c:2712 [inline]
       call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826
       ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105
       ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124
       process_one_work kernel/workqueue.c:2627 [inline]
       process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
       worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
       kthread+0x33c/0x440 kernel/kthread.c:388
       ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305
      
      Second to last potentially related work creation:
       kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
       __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492
       __call_rcu_common kernel/rcu/tree.c:2712 [inline]
       call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826
       ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105
       ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124
       process_one_work kernel/workqueue.c:2627 [inline]
       process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700
       worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781
       kthread+0x33c/0x440 kernel/kthread.c:388
       ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305
      
      The buggy address belongs to the object at ffff888123839d68
       which belongs to the cache bfq_io_cq of size 1360
      The buggy address is located 336 bytes inside of
       freed 1360-byte region [ffff888123839d68, ffff88812383a2b8)
      
      The buggy address belongs to the physical page:
      page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838
      head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
      page_type: 0xffffffff()
      raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488
      raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 36eca894 ("block, bfq: add Early Queue Merge (EQM)")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      18ad4df0
  8. 30 Aug, 2024 2 commits
  9. 29 Aug, 2024 5 commits
    • Jens Axboe's avatar
      Merge tag 'md-6.12-20240829' of... · 12c612e1
      Jens Axboe authored
      Merge tag 'md-6.12-20240829' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.12/block
      
      Pull MD updates from Song:
      
      "Major changes in this set are:
      
       1. md-bitmap refactoring, by Yu Kuai;
       2. raid5 performance optimization, by Artur Paszkiewicz;
       3. Other small fixes, by Yu Kuai and Chen Ni."
      
      * tag 'md-6.12-20240829' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (49 commits)
        md/raid5: rename wait_for_overlap to wait_for_reshape
        md/raid5: only add to wq if reshape is in progress
        md/raid5: use wait_on_bit() for R5_Overlap
        md: Remove flush handling
        md/md-bitmap: make in memory structure internal
        md/md-bitmap: merge md_bitmap_enabled() into bitmap_operations
        md/md-bitmap: merge md_bitmap_wait_behind_writes() into bitmap_operations
        md/md-bitmap: merge md_bitmap_free() into bitmap_operations
        md/md-bitmap: merge md_bitmap_set_pages() into struct bitmap_operations
        md/md-bitmap: merge md_bitmap_copy_from_slot() into struct bitmap_operation.
        md/md-bitmap: merge get_bitmap_from_slot() into bitmap_operations
        md/md-bitmap: merge md_bitmap_resize() into bitmap_operations
        md/md-bitmap: pass in mddev directly for md_bitmap_resize()
        md/md-bitmap: merge md_bitmap_daemon_work() into bitmap_operations
        md/md-bitmap: merge bitmap_unplug() into bitmap_operations
        md/md-bitmap: merge md_bitmap_unplug_async() into md_bitmap_unplug()
        md/md-bitmap: merge md_bitmap_sync_with_cluster() into bitmap_operations
        md/md-bitmap: merge md_bitmap_cond_end_sync() into bitmap_operations
        md/md-bitmap: merge md_bitmap_close_sync() into bitmap_operations
        md/md-bitmap: merge md_bitmap_end_sync() into bitmap_operations
        ...
      12c612e1
    • Song Liu's avatar
      Merge branch 'md-6.12-raid5-opt' into md-6.12 · fb16787b
      Song Liu authored
      From Artur:
      
      The wait_for_overlap wait queue is currently used in two cases, which
      are not really related:
       - waiting for actual overlapping bios, which uses R5_Overlap bit,
       - waiting for events related to reshape.
      
      Handling every write request in raid5_make_request() involves adding to
      and removing from this wait queue, which uses a spinlock. With fast
      storage and multiple submitting threads the contention on this lock is
      noticeable.
      
      This patch series aims to resolve this by separating the two cases
      mentioned above and using this wait queue only when reshape is in
      progress.
      
      The results when testing 4k random writes on raid5 with null_blk
      (8 jobs, qd=64, group_thread_cnt=8):
      before: 463k IOPS
      after:  523k IOPS
      
      The improvement is not huge with this series alone but it is just one of
      the bottlenecks. When applied onto some other changes I'm working on, it
      allowed to go from 845k IOPS to 975k IOPS on the same test.
      
      * md-6.12-raid5-opt:
        md/raid5: rename wait_for_overlap to wait_for_reshape
        md/raid5: only add to wq if reshape is in progress
        md/raid5: use wait_on_bit() for R5_Overlap
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      fb16787b
    • Artur Paszkiewicz's avatar
      md/raid5: rename wait_for_overlap to wait_for_reshape · 6f039cc4
      Artur Paszkiewicz authored
      The only remaining uses of wait_for_overlap are related to reshape so
      rename it accordingly.
      Signed-off-by: default avatarArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Link: https://lore.kernel.org/r/20240827153536.6743-4-artur.paszkiewicz@intel.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      6f039cc4
    • Artur Paszkiewicz's avatar
      md/raid5: only add to wq if reshape is in progress · 0e4aac73
      Artur Paszkiewicz authored
      Now that actual overlaps are not handled on the wait_for_overlap wq
      anymore, the remaining cases when we wait on this wq are limited to
      reshape. If reshape is not in progress, don't add to the wq in
      raid5_make_request() because add_wait_queue() / remove_wait_queue()
      operations take a spinlock and cause noticeable contention when multiple
      threads are submitting requests to the mddev.
      Signed-off-by: default avatarArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Link: https://lore.kernel.org/r/20240827153536.6743-3-artur.paszkiewicz@intel.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      0e4aac73
    • Artur Paszkiewicz's avatar
      md/raid5: use wait_on_bit() for R5_Overlap · e6a03207
      Artur Paszkiewicz authored
      Convert uses of wait_for_overlap wait queue with R5_Overlap bit to
      wait_on_bit() / wake_up_bit().
      Signed-off-by: default avatarArtur Paszkiewicz <artur.paszkiewicz@intel.com>
      Link: https://lore.kernel.org/r/20240827153536.6743-2-artur.paszkiewicz@intel.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      e6a03207