Commits · e3031d4c7d2c5bff6b5944d61d4e31319739d216 · Kirill Smelkov / linux

05 Dec, 2022 7 commits

blk-throttle: remove incorrect comment for tg_last_low_overflow_time · e3031d4c

Kemeng Shi authored Dec 05, 2022

Function tg_last_low_overflow_time is called with intermediate node as
following:
throtl_hierarchy_can_downgrade
  throtl_tg_can_downgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_upgrade
  throtl_tg_can_upgrade
    tg_last_low_overflow_time

throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse
from leaf node to sub-root node and pass traversed intermediate node
to tg_last_low_overflow_time.

No such limit could be found from context and implementation of
tg_last_low_overflow_time, so remove this limit in comment.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e3031d4c

blk-throttle: fix typo in comment of throtl_adjusted_limit · 009df341

Kemeng Shi authored Dec 05, 2022

lapsed time -> elapsed time
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-7-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

009df341

blk-throttle: simpfy low limit reached check in throtl_tg_can_upgrade · a4d508e3

Kemeng Shi authored Dec 05, 2022

Commit c79892c5 ("blk-throttle: add upgrade logic for LIMIT_LOW
state") added upgrade logic for low limit and methioned that
1. "To determine if a cgroup exceeds its limitation, we check if the cgroup
has pending request. Since cgroup is throttled according to the limit,
pending request means the cgroup reaches the limit."
2. "If a cgroup has limit set for both read and write, we consider the
combination of them for upgrade. The reason is read IO and write IO can
interfere with each other. If we do the upgrade based in one direction IO,
the other direction IO could be severly harmed."
Besides, we also determine that cgroup reaches low limit if low limit is 0,
see comment in throtl_tg_can_upgrade.

Collect the information above, the desgin of upgrade check is as following:
1.The low limit is reached if limit is zero or io is already queued.
2.Cgroup will pass upgrade check if low limits of READ and WRITE are both
reached.

Simpfy the check code described above to removce repeat check and improve
readability. There is no functional change.

Detail equivalence proof is as following:
All replaced conditions to return true are as following:
condition 1
  (!read_limit && !write_limit)
condition 2
  read_limit && sq->nr_queued[READ] &&
  (!write_limit || sq->nr_queued[WRITE])
condition 3
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])

Transferring condition 2 as following:
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])
is equivalent to
  (read_limit && sq->nr_queued[READ]) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
is equivalent to
condition 2.1
  (read_limit && sq->nr_queued[READ] &&
  !write_limit) ||
condition 2.2
  (read_limit && sq->nr_queued[READ] &&
  (write_limit && sq->nr_queued[WRITE]))

Transferring condition 3 as following:
  write_limit && sq->nr_queued[WRITE] &&
  (!read_limit || sq->nr_queued[READ])
is equivalent to
  (write_limit && sq->nr_queued[WRITE]) &&
  (!read_limit || (read_limit && sq->nr_queued[READ]))
is equivalent to
condition 3.1
  ((write_limit && sq->nr_queued[WRITE]) &&
  !read_limit) ||
condition 3.2
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ]))

Condition 3.2 is the same as condition 2.2, so all conditions we get to
return are as following:
  (!read_limit && !write_limit) (1)
  (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1)
  ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1)
  ((write_limit && sq->nr_queued[WRITE]) &&
  (read_limit && sq->nr_queued[READ])) (2.2)

As we can extract conditions "(a1 || a2) && (b1 || b2)" to:
a1 && b1
a1 && b2
a2 && b1
ab && b2

Considering that:
a1 = !read_limit
a2 = read_limit && sq->nr_queued[READ]
b1 = !write_limit
b2 = write_limit && sq->nr_queued[WRITE]

We can pack replaced conditions to
  (!read_limit || (read_limit && sq->nr_queued[READ])) &&
  (!write_limit || (write_limit && sq->nr_queued[WRITE]))
which is equivalent to
  (!read_limit || sq->nr_queued[READ]) &&
  (!write_limit || sq->nr_queued[WRITE])
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

a4d508e3

blk-throttle: correct calculation of wait time in tg_may_dispatch · 183daeb1

Kemeng Shi authored Dec 05, 2022

In C language, When executing "if (expression1 && expression2)" and
expression1 return false, the expression2 may not be executed.
For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) &&
tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is
limited, tg_within_bps_limit will return false and
tg_within_iops_limit will not be called. So even bps and iops are
both limited, iops_wait will not be calculated and is always zero.
So wait time of iops is always ignored.

Fix this by always calling tg_within_bps_limit and tg_within_iops_limit
to get wait time for both bps and iops.

Observed that:
1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always
be stored as wait argument is always passed.
2. wait time is stored to zero if iops/bps is limited otherwise non-zero
is stored.
Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument
and return wait time directly. Caller tg_may_dispatch checks if wait time
is zero to find if iops/bps is limited.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

183daeb1

blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_bios · eb184791

Kemeng Shi authored Dec 05, 2022

Ignore cgroup without io queued in blk_throtl_cancel_bios for two
reasons:
1. Save cpu cycle for trying to dispatch cgroup which is no io queued.
2. Avoid non-consistent state that cgroup is inserted to service queue
without THROTL_TG_PENDING set as tg_update_disptime will unconditional
re-insert cgroup to service queue. If we are on the default hierarchy,
IO dispatched from child in tg_dispatch_one_bio will trigger inserting
cgroup to service queue without erase first and ruin the tree.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

eb184791

blk-throttle: Fix that bps of child could exceed bps limited in parent · 84aca0a7

Kemeng Shi authored Dec 05, 2022

Consider situation as following (on the default hierarchy):
 HDD
  |
root (bps limit: 4k)
  |
child (bps limit :8k)
  |
fio bs=8k
Rate of fio is supposed to be 4k, but result is 8k. Reason is as
following:
Size of single IO from fio is larger than bytes allowed in one
throtl_slice in child, so IOs are always queued in child group first.
When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED
is set and these IOs will not be limited by tg_within_bps_limit anymore.
Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire
tree.

There patch has no influence on situation which is not on the default
hierarchy as each group is a single root group without parent.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

84aca0a7

blk-throttle: correct stale comment in throtl_pd_init · f56019ae

Kemeng Shi authored Dec 05, 2022

On the default hierarchy (cgroup2), the throttle interface files don't
exist in the root cgroup, so the ablity to limit the whole system
by configuring root group is not existing anymore. In general, cgroup
doesn't wanna be in the business of restricting resources at the
system level, so correct the stale comment that we can limit whole
system to we can only limit subtree.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f56019ae

04 Dec, 2022 2 commits

Merge tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy into for-6.2/block · b1476451

Jens Axboe authored Dec 04, 2022

Pull floppy fix from Denis:

"Floppy patch for 6.2

 The patch from Yuan Can fixes a memory leak in floppy init code.

 Signed-off-by: Denis Efremov <efremov@linux.com>"

* tag 'floppy-for-6.2' of https://github.com/evdenis/linux-floppy:
  floppy: Fix memory leak in do_floppy_init()

b1476451

floppy: Fix memory leak in do_floppy_init() · f8ace2e3

Yuan Can authored Oct 31, 2022

A memory leak was reported when floppy_alloc_disk() failed in
do_floppy_init().

unreferenced object 0xffff888115ed25a0 (size 8):
  comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s)
  hex dump (first 8 bytes):
    00 ac 67 5b 81 88 ff ff                          ..g[....
  backtrace:
    [<000000007f457abb>] __kmalloc_node+0x4c/0xc0
    [<00000000a87bfa9e>] blk_mq_realloc_tag_set_tags.part.0+0x6f/0x180
    [<000000006f02e8b1>] blk_mq_alloc_tag_set+0x573/0x1130
    [<0000000066007fd7>] 0xffffffffc06b8b08
    [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0
    [<00000000e26d04ee>] do_init_module+0x1a4/0x680
    [<000000001bb22407>] load_module+0x6249/0x7110
    [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200
    [<000000007bddca46>] do_syscall_64+0x35/0x80
    [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
unreferenced object 0xffff88810fc30540 (size 32):
  comm "modprobe", pid 727, jiffies 4295051278 (age 25.529s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<000000007f457abb>] __kmalloc_node+0x4c/0xc0
    [<000000006b91eab4>] blk_mq_alloc_tag_set+0x393/0x1130
    [<0000000066007fd7>] 0xffffffffc06b8b08
    [<0000000081f5ac40>] do_one_initcall+0xd0/0x4f0
    [<00000000e26d04ee>] do_init_module+0x1a4/0x680
    [<000000001bb22407>] load_module+0x6249/0x7110
    [<00000000ad31ac4d>] __do_sys_finit_module+0x140/0x200
    [<000000007bddca46>] do_syscall_64+0x35/0x80
    [<00000000b5afec39>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

If the floppy_alloc_disk() failed, disks of current drive will not be set,
thus the lastest allocated set->tag cannot be freed in the error handling
path. A simple call graph shown as below:

 floppy_module_init()
   floppy_init()
     do_floppy_init()
       for (drive = 0; drive < N_DRIVE; drive++)
         blk_mq_alloc_tag_set()
           blk_mq_alloc_tag_set_tags()
             blk_mq_realloc_tag_set_tags() # set->tag allocated
         floppy_alloc_disk()
           blk_mq_alloc_disk() # error occurred, disks failed to allocated

       ->out_put_disk:
       for (drive = 0; drive < N_DRIVE; drive++)
         if (!disks[drive][0]) # the last disks is not set and loop break
           break;
         blk_mq_free_tag_set() # the latest allocated set->tag leaked

Fix this problem by free the set->tag of current drive before jump to
error handling path.

Cc: stable@vger.kernel.org
Fixes: 302cfee1 ("floppy: use a separate gendisk for each media format")
Signed-off-by: Yuan Can <yuancan@huawei.com>
[efremov: added stable list, changed title]
Signed-off-by: Denis Efremov <efremov@linux.com>

f8ace2e3

03 Dec, 2022 1 commit

block: remove devnode callback from struct block_device_operations · 85d6ce58

Greg Kroah-Hartman authored Dec 03, 2022

With the removal of the pktcdvd driver, there are no in-kernel users of
the devnode callback in struct block_device_operations, so it can be
safely removed.  If it is needed for new block drivers in the future, it
can be brought back.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20221203140747.1942969-1-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

85d6ce58

02 Dec, 2022 6 commits

Merge branch 'md-next' of... · 368c7f1f

Jens Axboe authored Dec 02, 2022

Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.2/block

Pull MD fixes from Song:

"This contains code cleanup by Christoph."

* 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fold unbind_rdev_from_array into md_kick_rdev_from_array
  md: mark md_kick_rdev_from_array static
  md: remove lock_bdev / unlock_bdev

368c7f1f

pktcdvd: remove driver. · f40eb998

Greg Kroah-Hartman authored Dec 02, 2022

Way back in 2016 in commit 5a8b187c ("pktcdvd: mark as unmaintained
and deprecated") this driver was marked as "will be removed soon".  5
years seems long enough to have it stick around after that, so finally
remove the thing now.
Reported-by: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Maier <balagi@justmail.de>
Cc: Peter Osterlund <petero2@telia.com>
Cc: linux-block@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20221202182758.1339039-1-gregkh@linuxfoundation.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

f40eb998

md: fold unbind_rdev_from_array into md_kick_rdev_from_array · b5c1acf0

Christoph Hellwig authored Nov 29, 2022

unbind_rdev_from_array is only called from md_kick_rdev_from_array, so
merge it into its only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>

b5c1acf0

md: mark md_kick_rdev_from_array static · d57d9d69

Christoph Hellwig authored Nov 29, 2022

md_kick_rdev_from_array is only used in md.c, so unexport it and mark
the symbol static.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>

d57d9d69

md: remove lock_bdev / unlock_bdev · fb541ca4

Christoph Hellwig authored Nov 29, 2022

These wrappers for blkdev_get / blkdev_put just horribly confuse the
code with their odd naming.  Remove them and improve the error unwinding
in md_import_device with the now folded code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>

fb541ca4

blk-cgroup: Fix some kernel-doc comments · 1d6df9d3

Yang Li authored Dec 02, 2022

Make the description of @gendisk to @disk in blkcg_schedule_throttle()
to clear the below warnings:

block/blk-cgroup.c:1850: warning: Function parameter or member 'disk' not described in 'blkcg_schedule_throttle'
block/blk-cgroup.c:1850: warning: Excess function parameter 'gendisk' description in 'blkcg_schedule_throttle'

Fixes: de185b56 ("blk-cgroup: pass a gendisk to blkcg_schedule_throttle")
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3338Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221202011713.14834-1-yang.lee@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

1d6df9d3

01 Dec, 2022 13 commits

null_blk: support read-only and offline zone conditions · d3a57388

Shin'ichiro Kawasaki authored Dec 01, 2022

In zoned mode, zones with write pointers can have conditions "read-only"
or "offline". In read-only condition, zones can not be written. In
offline condition, the zones can be neither written nor read. These
conditions are intended for zones with media failures, then it is
difficult to set those conditions to zones on real devices.

To test handling of zones in the conditions, add a feature to null_blk
to set up zones in read-only or offline condition. Add new configuration
attributes "zone_readonly" and "zone_offline". Write a sector to the
attribute files to specify the target zone to set the zone conditions.
For example, following command lines do it:

echo 0 > nullb1/zone_readonly
echo 524288 > nullb1/zone_offline

When the specified zones are already in read-only or offline condition,
normal empty condition is restored to the zones. These condition changes
can be done only after the null_blk device get powered, since status
area of each zone is not yet allocated before power-on.

Also improve zone condition checks to inhibit all commands for zones in
offline conditions. In same manner, inhibit write and zone management
commands for zones in read-only condition.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

d3a57388

drbd: add context parameter to expect() macro · 677b3672

Christoph Böhmwalder authored Dec 01, 2022

Originally-from: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-6-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

677b3672

drbd: introduce drbd_ratelimit() · e3fa02d7

Christoph Böhmwalder authored Dec 01, 2022

Use call site specific ratelimit instead of one single static global.
Also ratelimit ASSERTION messages generated by expect().

Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-5-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e3fa02d7

drbd: introduce dynamic debug · aa034695

Christoph Böhmwalder authored Dec 01, 2022

Incorporate as many out-of-tree changes as possible without changing the
genl API.

Over the years, we restructured this several times, and also changed the
log format.

One breaking change is that DRBD 9 gained "implicit options", like a
connection name. This cannot be replayed here without changing the API,
so save it for later.

Originally-from: Andreas Gruenbacher <agruen@linbit.com>
Originally-from: Philipp Reisner <philipp.reisner@linbit.com>
Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-4-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

aa034695

drbd: split polymorph printk to its own file · 136160c1

Christoph Böhmwalder authored Dec 01, 2022

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-3-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

136160c1

drbd: unify how failed assertions are logged · c3f89741

Christoph Böhmwalder authored Dec 01, 2022

Unify how failed assertions from D_ASSERT() and expect() are logged.

Originally-from: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-2-christoph.boehmwalder@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c3f89741

block: bdev & blktrace: use consistent function doc. notation · 2e833c8c

Randy Dunlap authored Nov 30, 2022

Use only one hyphen in kernel-doc notation between the function name
and its short description.

The is the documented kerenl-doc format. It also fixes the HTML
presentation to be consistent with other functions.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

2e833c8c

blk-iocost: Correct comment in blk_iocost_init · 7a88b1a8

Kemeng Shi authored Oct 18, 2022

There is no iocg_pd_init function. The pd_alloc_fn function pointer of
iocost policy is set with ioc_pd_init. Just correct it.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221018121932.10792-6-shikemeng@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

7a88b1a8

blk-iocost: Remove vrate member in struct ioc_now · 6c31be32

Kemeng Shi authored Oct 18, 2022

If we trace vtime_base_rate instead of vtime_rate, there is nowhere
which accesses now->vrate except function ioc_now using now->vrate locally.
Just remove it.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221018121932.10792-5-shikemeng@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

6c31be32

blk-iocost: Trace vtime_base_rate instead of vtime_rate · 63c9eac4

Kemeng Shi authored Oct 18, 2022

Since commit ac33e91e ("blk-iocost: implement vtime loss
compensation") rename original vtime_rate to vtime_base_rate
and current vtime_rate is original vtime_rate with compensation.
The current rate showed in tracepoint is mixed with vtime_rate
and vtime_base_rate:
1) In function ioc_adjust_base_vrate, the first trace_iocost_ioc_vrate_adj
shows vtime_rate, the second trace_iocost_ioc_vrate_adj shows
vtime_base_rate.
2) In function iocg_activate shows vtime_rate by calling
TRACE_IOCG_PATH(iocg_activate...
3) In function ioc_check_iocgs shows vtime_rate by calling
TRACE_IOCG_PATH(iocg_idle...

Trace vtime_base_rate instead of vtime_rate as:
1) Before commit ac33e91e ("blk-iocost: implement vtime loss
compensation"), the traced rate is without compensation, so still
show rate without compensation.
2) The vtime_base_rate is more stable while vtime_rate heavily depends on
excess budeget on current period which may change abruptly in next period.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221018121932.10792-4-shikemeng@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

63c9eac4

blk-iocost: Reset vtime_base_rate in ioc_refresh_params · c6d2efdd

Kemeng Shi authored Oct 18, 2022

Since commit ac33e91e("blk-iocost: implement vtime loss compensation")
split vtime_rate into vtime_rate and vtime_base_rate, we need reset both
vtime_base_rate and vtime_rate when device parameters are refreshed.
If vtime_base_rate is no reset here, vtime_rate will be overwritten with
old vtime_base_rate soon in ioc_refresh_vrate.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221018121932.10792-3-shikemeng@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

c6d2efdd

blk-iocost: Fix typo in comment · ecaaaabe

Kemeng Shi authored Oct 18, 2022

soley -> solely
Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20221018121932.10792-2-shikemeng@huawei.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ecaaaabe

block: Do not reread partition table on exclusively open device · 36369f46

Jan Kara authored Nov 30, 2022

Since commit 10c70d95 ("block: remove the bd_openers checks in
blk_drop_partitions") we allow rereading of partition table although
there are users of the block device. This has an undesirable consequence
that e.g. if sda and sdb are assembled to a RAID1 device md0 with
partitions, BLKRRPART ioctl on sda will rescan partition table and
create sda1 device. This partition device under a raid device confuses
some programs (such as libstorage-ng used for initial partitioning for
distribution installation) leading to failures.

Fix the problem refusing to rescan partitions if there is another user
that has the block device exclusively open.

Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3
Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz
[axboe: fold in followup fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

36369f46

30 Nov, 2022 7 commits

virtio-blk: replace ida_simple[get|remove] with ida_[alloc_range|free] · 92a34c46

Pankaj Raghav authored Nov 30, 2022

ida_simple[get|remove] are deprecated, and are just wrappers to
ida_[alloc_range|free]. Replace ida_simple[get|remove] with their
corresponding counterparts.

No functional changes.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

92a34c46

block: mark blk_put_queue as potentially blocking · 63f93fd6

Christoph Hellwig authored Nov 14, 2022

We can't just say that the last reference release may block, as any
reference dropped could be the last one. So move the might_sleep() from
blk_free_queue to blk_put_queue and update the documentation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221114042637.1009333-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

63f93fd6

block: untangle request_queue refcounting from sysfs · 2bd85221

Christoph Hellwig authored Nov 14, 2022

The kobject embedded into the request_queue is used for the queue
directory in sysfs, but that is a child of the gendisks directory and is
intimately tied to it. Move this kobject to the gendisk and use a
refcount_t in the request_queue for the actual request_queue refcounting
that is completely unrelated to the device model.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221114042637.1009333-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

2bd85221

block: fix error unwinding in blk_register_queue · 40602997

Christoph Hellwig authored Nov 14, 2022

blk_register_queue fails to handle errors from blk_mq_sysfs_register,
leaks various resources on errors and accidentally sets queue refs percpu
refcount to percpu mode on kobject_add failure. Fix all that by
properly unwinding on errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221114042637.1009333-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

40602997

block: factor out a blk_debugfs_remove helper · 6fc75f30

Christoph Hellwig authored Nov 14, 2022

Split the debugfs removal from blk_unregister_queue into a helper so that
the it can be reused for blk_register_queue error handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221114042637.1009333-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

6fc75f30

blk-crypto: pass a gendisk to blk_crypto_sysfs_{,un}register · 450deb93

Christoph Hellwig authored Nov 14, 2022

Prepare for changes to the block layer sysfs handling by passing the
readily available gendisk to blk_crypto_sysfs_{,un}register.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20221114042637.1009333-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

450deb93

Revert "blk-cgroup: Flush stats at blkgs destruction path" · c62256dd

Jens Axboe authored Nov 30, 2022

This reverts commit dae590a6.

We've had a few reports on this causing a crash at boot time, because
of a reference issue. While this problem seemginly did exist before
the patch and needs solving separately, this patch makes it a lot
easier to trigger.

Link: https://lore.kernel.org/linux-block/CA+QYu4oxiRKC6hJ7F27whXy-PRBx=Tvb+-7TQTONN8qTtV3aDA@mail.gmail.com/
Link: https://lore.kernel.org/linux-block/69af7ccb-6901-c84c-0e95-5682ccfb750c@acm.org/Signed-off-by: Jens Axboe <axboe@kernel.dk>

c62256dd

29 Nov, 2022 4 commits

block: use bool as the return type of elv_iosched_allow_bio_merge · 8d283ee6

Jinlong Chen authored Nov 29, 2022

We have bool type now, update the old signature.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/0db0a0298758d60d0f4df8b7126ac6a381e5a5bb.1669736350.git.nickyc975@zju.edu.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

8d283ee6

block: replace "len+name" with "name+len" in elv_iosched_show · c6451ede

Jinlong Chen authored Nov 29, 2022

The "pointer + offset" pattern is more resonable.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/d9beaee71b14f7b2a39ab0db6458dc0f7d961ceb.1669736350.git.nickyc975@zju.edu.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

c6451ede

block: always use 'e' when printing scheduler name · 7a3b3660

Jinlong Chen authored Nov 29, 2022

Printing e->elevator_name in all cases improves the readability, and
'e' and 'cur' are identical in this branch.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Link: https://lore.kernel.org/r/4bae180ffbac608ea0cf46ffa9739ce0973b60aa.1669736350.git.nickyc975@zju.edu.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

7a3b3660

block: replace continue with else-if in elv_iosched_show · 5998249e

Jinlong Chen authored Nov 29, 2022

else-if is more readable than continue here.
Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn>
Link: https://lore.kernel.org/r/77ac19ba556efd2c8639a6396eb4203c59bc13d6.1669736350.git.nickyc975@zju.edu.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

5998249e