1. 05 Dec, 2022 7 commits
    • Kemeng Shi's avatar
      blk-throttle: remove incorrect comment for tg_last_low_overflow_time · e3031d4c
      Kemeng Shi authored
      Function tg_last_low_overflow_time is called with intermediate node as
      following:
      throtl_hierarchy_can_downgrade
        throtl_tg_can_downgrade
          tg_last_low_overflow_time
      
      throtl_hierarchy_can_upgrade
        throtl_tg_can_upgrade
          tg_last_low_overflow_time
      
      throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse
      from leaf node to sub-root node and pass traversed intermediate node
      to tg_last_low_overflow_time.
      
      No such limit could be found from context and implementation of
      tg_last_low_overflow_time, so remove this limit in comment.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e3031d4c
    • Kemeng Shi's avatar
    • Kemeng Shi's avatar
      blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade · a4d508e3
      Kemeng Shi authored
      Commit c79892c5 ("blk-throttle: add upgrade logic for LIMIT_LOW
      state") added upgrade logic for low limit and methioned that
      1. "To determine if a cgroup exceeds its limitation, we check if the cgroup
      has pending request. Since cgroup is throttled according to the limit,
      pending request means the cgroup reaches the limit."
      2. "If a cgroup has limit set for both read and write, we consider the
      combination of them for upgrade. The reason is read IO and write IO can
      interfere with each other. If we do the upgrade based in one direction IO,
      the other direction IO could be severly harmed."
      Besides, we also determine that cgroup reaches low limit if low limit is 0,
      see comment in throtl_tg_can_upgrade.
      
      Collect the information above, the desgin of upgrade check is as following:
      1.The low limit is reached if limit is zero or io is already queued.
      2.Cgroup will pass upgrade check if low limits of READ and WRITE are both
      reached.
      
      Simpfy the check code described above to removce repeat check and improve
      readability. There is no functional change.
      
      Detail equivalence proof is as following:
      All replaced conditions to return true are as following:
      condition 1
        (!read_limit && !write_limit)
      condition 2
        read_limit && sq->nr_queued[READ] &&
        (!write_limit || sq->nr_queued[WRITE])
      condition 3
        write_limit && sq->nr_queued[WRITE] &&
        (!read_limit || sq->nr_queued[READ])
      
      Transferring condition 2 as following:
        (read_limit && sq->nr_queued[READ]) &&
        (!write_limit || sq->nr_queued[WRITE])
      is equivalent to
        (read_limit && sq->nr_queued[READ]) &&
        (!write_limit || (write_limit && sq->nr_queued[WRITE]))
      is equivalent to
      condition 2.1
        (read_limit && sq->nr_queued[READ] &&
        !write_limit) ||
      condition 2.2
        (read_limit && sq->nr_queued[READ] &&
        (write_limit && sq->nr_queued[WRITE]))
      
      Transferring condition 3 as following:
        write_limit && sq->nr_queued[WRITE] &&
        (!read_limit || sq->nr_queued[READ])
      is equivalent to
        (write_limit && sq->nr_queued[WRITE]) &&
        (!read_limit || (read_limit && sq->nr_queued[READ]))
      is equivalent to
      condition 3.1
        ((write_limit && sq->nr_queued[WRITE]) &&
        !read_limit) ||
      condition 3.2
        ((write_limit && sq->nr_queued[WRITE]) &&
        (read_limit && sq->nr_queued[READ]))
      
      Condition 3.2 is the same as condition 2.2, so all conditions we get to
      return are as following:
        (!read_limit && !write_limit) (1)
        (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1)
        ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1)
        ((write_limit && sq->nr_queued[WRITE]) &&
        (read_limit && sq->nr_queued[READ])) (2.2)
      
      As we can extract conditions "(a1 || a2) && (b1 || b2)" to:
      a1 && b1
      a1 && b2
      a2 && b1
      ab && b2
      
      Considering that:
      a1 = !read_limit
      a2 = read_limit && sq->nr_queued[READ]
      b1 = !write_limit
      b2 = write_limit && sq->nr_queued[WRITE]
      
      We can pack replaced conditions to
        (!read_limit || (read_limit && sq->nr_queued[READ])) &&
        (!write_limit || (write_limit && sq->nr_queued[WRITE]))
      which is equivalent to
        (!read_limit || sq->nr_queued[READ]) &&
        (!write_limit || sq->nr_queued[WRITE])
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a4d508e3
    • Kemeng Shi's avatar
      blk-throttle: correct calculation of wait time in tg_may_dispatch · 183daeb1
      Kemeng Shi authored
      In C language, When executing "if (expression1 && expression2)" and
      expression1 return false, the expression2 may not be executed.
      For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
      tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is
      limited, tg_within_bps_limit will return false and
      tg_within_iops_limit will not be called. So even bps and iops are
      both limited, iops_wait will not be calculated and is always zero.
      So wait time of iops is always ignored.
      
      Fix this by always calling tg_within_bps_limit and tg_within_iops_limit
      to get wait time for both bps and iops.
      
      Observed that:
      1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always
      be stored as wait argument is always passed.
      2. wait time is stored to zero if iops/bps is limited otherwise non-zero
      is stored.
      Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument
      and return wait time directly. Caller tg_may_dispatch checks if wait time
      is zero to find if iops/bps is limited.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      183daeb1
    • Kemeng Shi's avatar
      blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios · eb184791
      Kemeng Shi authored
      Ignore cgroup without io queued in blk_throtl_cancel_bios for two
      reasons:
      1. Save cpu cycle for trying to dispatch cgroup which is no io queued.
      2. Avoid non-consistent state that cgroup is inserted to service queue
      without THROTL_TG_PENDING set as tg_update_disptime will unconditional
      re-insert cgroup to service queue. If we are on the default hierarchy,
      IO dispatched from child in tg_dispatch_one_bio will trigger inserting
      cgroup to service queue without erase first and ruin the tree.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eb184791
    • Kemeng Shi's avatar
      blk-throttle: Fix that bps of child could exceed bps limited in parent · 84aca0a7
      Kemeng Shi authored
      Consider situation as following (on the default hierarchy):
       HDD
        |
      root (bps limit: 4k)
        |
      child (bps limit :8k)
        |
      fio bs=8k
      Rate of fio is supposed to be 4k, but result is 8k. Reason is as
      following:
      Size of single IO from fio is larger than bytes allowed in one
      throtl_slice in child, so IOs are always queued in child group first.
      When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED
      is set and these IOs will not be limited by tg_within_bps_limit anymore.
      Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire
      tree.
      
      There patch has no influence on situation which is not on the default
      hierarchy as each group is a single root group without parent.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      84aca0a7
    • Kemeng Shi's avatar
      blk-throttle: correct stale comment in throtl_pd_init · f56019ae
      Kemeng Shi authored
      On the default hierarchy (cgroup2), the throttle interface files don't
      exist in the root cgroup, so the ablity to limit the whole system
      by configuring root group is not existing anymore. In general, cgroup
      doesn't wanna be in the business of restricting resources at the
      system level, so correct the stale comment that we can limit whole
      system to we can only limit subtree.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKemeng Shi <shikemeng@huawei.com>
      Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f56019ae
  2. 04 Dec, 2022 2 commits
    • Jens Axboe's avatar
      Merge tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy into for-6.2/block · b1476451
      Jens Axboe authored
      Pull floppy fix from Denis:
      
      "Floppy patch for 6.2
      
       The patch from Yuan Can fixes a memory leak in floppy init code.
      
       Signed-off-by: Denis Efremov <efremov@linux.com>"
      
      * tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy:
        floppy: Fix memory leak in do_floppy_init()
      b1476451
    • Yuan Can's avatar
      floppy: Fix memory leak in do_floppy_init() · f8ace2e3
      Yuan Can authored
      A memory leak was reported when floppy_alloc_disk() failed in
      do_floppy_init().
      
      unreferenced object 0xffff888115ed25a0 (size 8):
        comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s)
        hex dump (first 8 bytes):
          00 ac 67 5b 81 88 ff ff                          ..g[....
        backtrace:
          [<000000007f457abb>] __kmalloc_node+0x4c/0xc0
          [<00000000a87bfa9e>] blk_mq_realloc_tag_set_tags.part.0+0x6f/0x180
          [<000000006f02e8b1>] blk_mq_alloc_tag_set+0x573/0x1130
          [<0000000066007fd7>] 0xffffffffc06b8b08
          [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0
          [<00000000e26d04ee>] do_init_module+0x1a4/0x680
          [<000000001bb22407>] load_module+0x6249/0x7110
          [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200
          [<000000007bddca46>] do_syscall_64+0x35/0x80
          [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      unreferenced object 0xffff88810fc30540 (size 32):
        comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000007f457abb>] __kmalloc_node+0x4c/0xc0
          [<000000006b91eab4>] blk_mq_alloc_tag_set+0x393/0x1130
          [<0000000066007fd7>] 0xffffffffc06b8b08
          [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0
          [<00000000e26d04ee>] do_init_module+0x1a4/0x680
          [<000000001bb22407>] load_module+0x6249/0x7110
          [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200
          [<000000007bddca46>] do_syscall_64+0x35/0x80
          [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      If the floppy_alloc_disk() failed, disks of current drive will not be set,
      thus the lastest allocated set->tag cannot be freed in the error handling
      path. A simple call graph shown as below:
      
       floppy_module_init()
         floppy_init()
           do_floppy_init()
             for (drive = 0; drive < N_DRIVE; drive++)
               blk_mq_alloc_tag_set()
                 blk_mq_alloc_tag_set_tags()
                   blk_mq_realloc_tag_set_tags() # set->tag allocated
               floppy_alloc_disk()
                 blk_mq_alloc_disk() # error occurred, disks failed to allocated
      
             ->out_put_disk:
             for (drive = 0; drive < N_DRIVE; drive++)
               if (!disks[drive][0]) # the last disks is not set and loop break
                 break;
               blk_mq_free_tag_set() # the latest allocated set->tag leaked
      
      Fix this problem by free the set->tag of current drive before jump to
      error handling path.
      
      Cc: stable@vger.kernel.org
      Fixes: 302cfee1 ("floppy: use a separate gendisk for each media format")
      Signed-off-by: default avatarYuan Can <yuancan@huawei.com>
      [efremov: added stable list, changed title]
      Signed-off-by: default avatarDenis Efremov <efremov@linux.com>
      f8ace2e3
  3. 03 Dec, 2022 1 commit
  4. 02 Dec, 2022 6 commits
  5. 01 Dec, 2022 13 commits
  6. 30 Nov, 2022 7 commits
  7. 29 Nov, 2022 4 commits