Commits · 3534e5a5ed2997ca1b00f44a0378a075bd05e8a3 · Kirill Smelkov / linux

15 Jul, 2022 1 commit

dm thin: fix use-after-free crash in dm_sm_register_threshold_callback · 3534e5a5

Luo Meng authored Jul 14, 2022

Fault inject on pool metadata device reports:
  BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80
  Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950

  CPU: 7 PID: 950 Comm: dmsetup Tainted: G        W         5.19.0-rc6 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   print_address_description.constprop.0.cold+0xeb/0x3f4
   kasan_report.cold+0xe6/0x147
   dm_pool_register_metadata_threshold+0x40/0x80
   pool_ctr+0xa0a/0x1150
   dm_table_add_target+0x2c8/0x640
   table_load+0x1fd/0x430
   ctl_ioctl+0x2c4/0x5a0
   dm_ctl_ioctl+0xa/0x10
   __x64_sys_ioctl+0xb3/0xd0
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x46/0xb0

This can be easily reproduced using:
  echo offline > /sys/block/sda/device/state
  dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10
  dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0"

If a metadata commit fails, the transaction will be aborted and the
metadata space maps will be destroyed. If a DM table reload then
happens for this failed thin-pool, a use-after-free will occur in
dm_sm_register_threshold_callback (called from
dm_pool_register_metadata_threshold).

Fix this by in dm_pool_register_metadata_threshold() by returning the
-EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr()
with a new error message: "Error registering metadata threshold".

Fixes: ac8c3f3d ("dm thin: generate event when metadata threshold passed")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

3534e5a5

14 Jul, 2022 6 commits

dm writecache: count number of blocks discarded, not number of discard bios · 2ee73ef6

Mikulas Patocka authored Jul 11, 2022

Change dm-writecache, so that it counts the number of blocks discarded
instead of the number of discard bios. Make it consistent with the
read and write statistics counters that were changed to count the
number of blocks instead of bios.

Fixes: e3a35d03 ("dm writecache: add event counters")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

2ee73ef6

dm writecache: count number of blocks written, not number of write bios · b2676e14

Mikulas Patocka authored Jul 11, 2022

Change dm-writecache, so that it counts the number of blocks written
instead of the number of write bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d03 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

b2676e14

dm writecache: count number of blocks read, not number of read bios · 2c6e755b

Mikulas Patocka authored Jul 11, 2022

Change dm-writecache, so that it counts the number of blocks read
instead of the number of read bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d03 ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

2c6e755b

dm writecache: return void from functions · 9bc0c92e

Mikulas Patocka authored Jul 11, 2022

The functions writecache_map_remap_origin and writecache_bio_copy_ssd
only return a single value, thus they can be made to return void.

This helps simplify the following IO accounting changes.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

9bc0c92e

dm kcopyd: use __GFP_HIGHMEM when allocating pages · 949d49ec

Mikulas Patocka authored Jul 13, 2022

dm-kcopyd doesn't access the allocated pages directly, it only passes
them to dm-io which adds them to a bio list - thus, we can allocate
the pages from high memory. This will reduce pressure on the low
memory when there are a large number of kcopyd jobs in progress.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

949d49ec

dm writecache: set a default MAX_WRITEBACK_JOBS · ca7dc242

Mikulas Patocka authored Jul 13, 2022

dm-writecache has the capability to limit the number of writeback jobs
in progress. However, this feature was off by default. As such there
were some out-of-memory crashes observed when lowering the low
watermark while the cache is full.

This commit enables writeback limit by default. It is set to 256MiB or
1/16 of total system memory, whichever is smaller.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

ca7dc242

07 Jul, 2022 11 commits

Documentation: dm writecache: Render status list as list · 11093e6f

Bagas Sanjaya authored Jul 02, 2022

The status list isn't rendered as list, but rather as normal paragraph,
because there is missing blank line between "Status:" line and the list.

Fix the issue by adding the blank line separator.

Fixes: 48debafe ("dm: add writecache target")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

11093e6f

Documentation: dm writecache: add blank line before optional parameters · 8b301af4

Mauro Carvalho Chehab authored Jul 02, 2022

Otherwise this warning occurs:
  Documentation/admin-guide/device-mapper/writecache.rst:23: WARNING: Unexpected indentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

8b301af4

dm snapshot: fix typo in snapshot_map() comment · 962c6296

Zhang Jiaming authored Jul 01, 2022

Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

962c6296

dm raid: remove redundant "the" in parse_raid_params() comment · ce92fc4b
Jiang Jian authored Jun 21, 2022
```
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
```
ce92fc4b

dm cache: fix typo in 2 comment blocks · 5c29e784

Steven Lung authored Jun 21, 2022

Replace neccessarily with necessarily.
Signed-off-by: Steven Lung <1030steven@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

5c29e784

dm verity: fix checkpatch close brace error · 20e6fc85

JeongHyeon Lee authored Jun 15, 2022

Resolves: ERROR: else should follow close brace '}'
Signed-off-by: JeongHyeon Lee <jhs2.lee@samsung.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

20e6fc85

dm table: rename dm_target variable in dm_table_add_target() · 899ab445

Mike Snitzer authored Jul 05, 2022

Rename from "tgt" to "ti" so that all of dm-table.c code uses the same
naming for dm_target variables.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

899ab445

dm table: audit all dm_table_get_target() callers · 564b5c54

Mike Snitzer authored Jul 05, 2022

All callers of dm_table_get_target() are expected to do proper bounds
checking on the index they pass.

Move dm_table_get_target() to dm-core.h to make it extra clear that only
DM core code should be using it. Switch it to be inlined while at it.

Standardize all DM core callers to use the same for loop pattern and
make associated variables as local as possible. Rename some variables
(e.g. s/table/t/ and s/tgt/ti/) along the way.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

564b5c54

dm table: remove dm_table_get_num_targets() wrapper · 2aec377a

Mike Snitzer authored Jul 05, 2022

More efficient and readable to just access table->num_targets directly.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

2aec377a

dm: add two stage requeue mechanism · 8b211aac

Ming Lei authored Jun 24, 2022

Commit 61b6e2e5 ("dm: fix BLK_STS_DM_REQUEUE handling when dm_io
represents split bio") reverted DM core's bio splitting back to using
bio_split()+bio_chain() because it was found that otherwise DM's
BLK_STS_DM_REQUEUE would trigger a live-lock waiting for bio
completion that would never occur.

Restore using bio_trim()+bio_inc_remaining(), like was done in commit
7dd76d1f ("dm: improve bio splitting and associated IO
accounting"), but this time with proper handling for the above
scenario that is covered in more detail in the commit header for
61b6e2e5.

Solve this issue by adding a two staged dm_io requeue mechanism that
uses the new dm_bio_rewind() via dm_io_rewind():

1) requeue the dm_io into the requeue_list added to struct
   mapped_device, and schedule it via new added requeue work. This
   workqueue just clones the dm_io->orig_bio (which DM saves and
   ensures its end sector isn't modified). dm_io_rewind() uses the
   sectors and sectors_offset members of the dm_io that are recorded
   relative to the end of orig_bio: dm_bio_rewind()+bio_trim() are
   then used to make that cloned bio reflect the subset of the
   original bio that is represented by the dm_io that is being
   requeued.

2) the 2nd stage requeue is same with original requeue, but
   io->orig_bio points to new cloned bio (which matches the requeued
   dm_io as described above).

This allows DM core to shift the need for bio cloning from bio-split
time (during IO submission) to the less likely BLK_STS_DM_REQUEUE
handling (after IO completes with that error).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

8b211aac

dm: add dm_bio_rewind() API to DM core · 61cbe788

Ming Lei authored Jun 24, 2022

Commit 7759eb23 ("block: remove bio_rewind_iter()") removed
a similar API for the following reasons:
    ```
    It is pointed that bio_rewind_iter() is one very bad API[1]:

    1) bio size may not be restored after rewinding

    2) it causes some bogus change, such as 5151842b (block: reset
    bi_iter.bi_done after splitting bio)

    3) rewinding really makes things complicated wrt. bio splitting

    4) unnecessary updating of .bi_done in fast path

    [1] https://marc.info/?t=153549924200005&r=1&w=2

    So this patch takes Kent's suggestion to restore one bio into its original
    state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(),
    given now bio_rewind_iter() is only used by bio integrity code.
    ```
However, saving off a copy of the 32 bytes bio->bi_iter in case rewind
needed isn't efficient because it bloats per-bio-data for what is an
unlikely case. That suggestion also ignores the need to restore
crypto and integrity info.

Add dm_bio_rewind() API for a specific use-case that is much more narrow
than the previous more generic rewind code that was reverted:

1) most bios have a fixed end sector since bio split is done from front
   of the bio, if driver just records how many sectors between current
   bio's start sector and the original bio's end sector, the original
   position can be restored. Keeping the original bio's end sector
   fixed is a _hard_ requirement for this interface!

2) if a bio's end sector won't change (usually bio_trim() isn't
   called, or in the case of DM it preserves original bio), user can
   restore the original position by storing sector offset from the
   current ->bi_iter.bi_sector to bio's end sector; together with
   saving bio size, only 8 bytes is needed to restore to original
   bio.

3) DM's requeue use case: when BLK_STS_DM_REQUEUE happens, DM core
   needs to restore to an "original bio" which represents the current
   dm_io to be requeued (which may be a subset of the original bio).
   By storing the sector offset from the original bio's end sector and
   dm_io's size, dm_bio_rewind() can restore such original bio. See
   commit 7dd76d1f ("dm: improve bio splitting and associated IO
   accounting") for more details on how DM does this. Leveraging this,
   allows DM core to shift the need for bio cloning from bio-split
   time (during IO submission) to the less likely BLK_STS_DM_REQUEUE
   handling (after IO completes with that error).

4) Unlike the original rewind API, dm_bio_rewind() doesn't add .bi_done
   to bvec_iter and there is no effect on the fast path.

Implement dm_bio_rewind() by factoring out clear helpers that it calls:
dm_bio_integrity_rewind, dm_bio_crypt_rewind and dm_bio_rewind_iter.

DM is able to ensure that dm_bio_rewind() is used safely but, given
the constraint that the bio's end must never change, other
hypothetical future callers may not take the same care. So make
dm_bio_rewind() and all supporting code local to DM to avoid risk of
hypothetical abuse. A "dm_" prefix was added to all functions to avoid
any namespace collisions.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

61cbe788

29 Jun, 2022 6 commits

dm: improve BLK_STS_DM_REQUEUE and BLK_STS_AGAIN handling · 444fe04f

Ming Lei authored Jun 24, 2022

If either BLK_STS_DM_REQUEUE or BLK_STS_AGAIN is returned for POLLED
io, we requeue the original bio into deferred list and kick md->wq to
re-submit it to block layer.

Improve the handling in the following way:

1) Factor out dm_handle_requeue() for handling dm_io requeue.

2) Unify handling for BLK_STS_DM_REQUEUE and BLK_STS_AGAIN: clear
   REQ_POLLED for BLK_STS_DM_REQUEUE too, for the sake of simplicity,
   given BLK_STS_DM_REQUEUE is very unusual.

3) Queue md->wq explicitly in dm_handle_requeue(), so requeue handling
   becomes more robust.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

444fe04f

dm: refactor dm_md_mempool allocation · e810cb78

Christoph Hellwig authored Jun 08, 2022

The current split between dm_table_alloc_md_mempools and
dm_alloc_md_mempools is rather arbitrary, so merge the two
into one easy to follow function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

e810cb78

dm: unexport dm_get_reserved_rq_based_ios · 4ed045d8

Christoph Hellwig authored Jun 08, 2022

dm_get_reserved_rq_based_ios is only used in the core dm code, so
remove the export.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

4ed045d8

block: simplify disk_set_independent_access_ranges · 22d0c408

Christoph Hellwig authored Jun 29, 2022

Lift setting disk->ia_ranges from disk_register_independent_access_ranges
into disk_set_independent_access_ranges, and make the behavior the same
for the registered vs non-registered queue cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220629062013.1331068-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

22d0c408

block: move ->ia_ranges from the request_queue to the gendisk · 6a27d28c

Christoph Hellwig authored Jun 29, 2022

Independent access ranges only matter for file system I/O and are only
valid with a registered gendisk, so move them there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220629062013.1331068-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

6a27d28c

block: remove "select BLK_RQ_IO_DATA_LEN" from BLK_CGROUP_IOCOST dependency · b9a1c179

Ying Sun authored Jun 29, 2022

The configuration item BLK_RQ_IO_DATA_LEN is not declared in the kernel.
Select BLK_RQ_IO_DATA_LEN is meaningless which could be removed.
Signed-off-by: Ying Sun <sunying@nj.iscas.ac.cn>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220629062409.19458-1-sunying@nj.iscas.ac.cnSigned-off-by: Jens Axboe <axboe@kernel.dk>

b9a1c179

28 Jun, 2022 12 commits

blk-mq: cleanup disk sysfs registration · 8682b92e

Christoph Hellwig authored Jun 28, 2022

Pass a gendisk to the sysfs register/unregister functions and give
them descriptive names. Also move the unregistration helper next
to the one doing the registration.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

8682b92e

blk-mq: rename blk_mq_sysfs_{,un}register · eaa870f9

Christoph Hellwig authored Jun 28, 2022

Add a _hctx postfix to better describe what the functions do, match
the debugfs equivalents and release the old names for functions that
should be using them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

eaa870f9

block: remove the extra gendisk reference in __blk_mq_register_dev · 81f0c2ef

Christoph Hellwig authored Jun 28, 2022

kobject_add already grabs a reference to the parent, no need to have
another one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

81f0c2ef

block: use default groups to register the queue attributes · 4a8d14bb

Christoph Hellwig authored Jun 28, 2022

Set up the default_groups for blk_queue_ktype instead of manually calling
sysfs_create_group.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

4a8d14bb

block: remove a superflous queue kobject reference · 060f131e

Christoph Hellwig authored Jun 28, 2022

kobject_add already adds a reference to the parent that is dropped
on deletion, so don't bother grabbing another one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

060f131e

block: simplify blktrace sysfs attribute creation · cc5c516d

Christoph Hellwig authored Jun 28, 2022

Add the trace attributes to the default gendisk attributes, just like
we already do for partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

cc5c516d

block: remove blk_cleanup_disk · 8b9ab626

Christoph Hellwig authored Jun 19, 2022

blk_cleanup_disk is nothing but a trivial wrapper for put_disk now,
so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

8b9ab626

block: simplify disk shutdown · 6f8191fd

Christoph Hellwig authored Jun 19, 2022

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

6f8191fd

block: stop setting the nomerges flags in blk_cleanup_queue · 0e353402

Christoph Hellwig authored Jun 19, 2022

These flags only apply to file system I/O, and all file system I/O is
already drained by del_gendisk and thus can't be in progress when
blk_cleanup_queue is called.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-5-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

0e353402

block: remove QUEUE_FLAG_DEAD · 1f90307e

Christoph Hellwig authored Jun 19, 2022

Disallow setting the blk-mq state on any queue that is already dying as
setting the state even then is a bad idea, and remove the now unused
QUEUE_FLAG_DEAD flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

1f90307e

mtip32xx: fix device removal · e8b58ef0

Christoph Hellwig authored Jun 19, 2022

Use the proper helper to mark a surpise removal, remove the gendisk as
soon as possible when removing the device and implement the ->free_disk
callback to ensure the private data is alive as long as the gendisk has
references.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

e8b58ef0

mtip32xx: remove the device_status debugfs file · ec5263f4

Christoph Hellwig authored Jun 19, 2022

This file is a huge mess that iterates over all devices and is in the
way of fixing the device removal in this driver, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

ec5263f4

27 Jun, 2022 4 commits

blk-mq: blk_mq_tag_busy is no need to return a value · ee78ec10

Liu Song authored Jun 25, 2022

Currently "blk_mq_tag_busy" return value has no effect, so adjust it.
Some code implementations have also been adjusted to enhance
readability.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/1656170121-1619-1-git-send-email-liusong@linux.alibaba.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

ee78ec10

block: Always initialize bio IO priority on submit · a78418e6

Jan Kara authored Jun 23, 2022

Currently, IO priority set in task's IO context is not reflected in the
bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
results in odd results where process is submitting some bios with one
priority and other bios with a different (unset) priority and due to
differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
always set on bio submission.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-9-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>

a78418e6

block: Initialize bio priority earlier · 9c6227e0

Jan Kara authored Jun 23, 2022

Bio's IO priority needs to be initialized before we try to merge the bio
with other bios. Otherwise we could merge bios which would otherwise
receive different IO priorities leading to possible QoS issues.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-8-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>

9c6227e0

blk-ioprio: Convert from rqos policy to direct call · 82b74cac

Jan Kara authored Jun 23, 2022

Convert blk-ioprio handling from a rqos policy to a direct call from
blk_mq_submit_bio(). Firstly, blk-ioprio is not much of a rqos policy
anyway, it just needs a hook in bio submission path to set the bio's IO
priority. Secondly, the rqos .track hook gets actually called too late
for blk-ioprio purposes and introducing a special rqos hook just for
blk-ioprio looks even weirder.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623074840.5960-7-jack@suse.czSigned-off-by: Jens Axboe <axboe@kernel.dk>

82b74cac