Commits · 0e46064ebebb90b02c53283106f26600aa38c986 · Kirill Smelkov / linux

06 Mar, 2024 21 commits

virtio_blk: Do not use disk_set_max_open/active_zones() · 0e46064e

Damien Le Moal authored Mar 02, 2024

In virtblk_read_zoned_limits(), setting a zoned block device maximum
number of open and active zones using the functions
disk_set_max_open_zones() and disk_set_max_active_zones() is incorrect
as setting the limits for the request queue is now done atomically when
the gendisk is created (with blk_mq_alloc_disk()). The value set by the
disk_set_max_open/active_zones() functions will be overwritten.
Fix this by setting the maximum number of open and active zones directly
in the queue_limits structure passed to virtblk_read_zoned_limits().

Fixes: 8b837256 ("virtio_blk: pass queue_limits to blk_mq_alloc_disk")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240301192639.410183-2-dlemoal@kernel.orgSigned-off-by: Jens Axboe <axboe@kernel.dk>

0e46064e

aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts · f98364e9

Chun-Yi Lee authored Mar 05, 2024

This patch is against CVE-2023-6270. The description of cve is:

  A flaw was found in the ATA over Ethernet (AoE) driver in the Linux
  kernel. The aoecmd_cfg_pkts() function improperly updates the refcnt on
  `struct net_device`, and a use-after-free can be triggered by racing
  between the free on the struct and the access through the `skbtxq`
  global queue. This could lead to a denial of service condition or
  potential code execution.

In aoecmd_cfg_pkts(), it always calls dev_put(ifp) when skb initial
code is finished. But the net_device ifp will still be used in
later tx()->dev_queue_xmit() in kthread. Which means that the
dev_put(ifp) should NOT be called in the success path of skb
initial code in aoecmd_cfg_pkts(). Otherwise tx() may run into
use-after-free because the net_device is freed.

This patch removed the dev_put(ifp) in the success path in
aoecmd_cfg_pkts(), and added dev_put() after skb xmit in tx().

Link: https://nvd.nist.gov/vuln/detail/CVE-2023-6270
Fixes: 7562f876 ("[NET]: Rework dev_base via list_head (v3)")
Signed-off-by: Chun-Yi Lee <jlee@suse.com>
Link: https://lore.kernel.org/r/20240305082048.25526-1-jlee@suse.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

f98364e9

block: move capacity validation to blkpg_do_ioctl() · b9355185

Li Lingfeng authored Mar 05, 2024

Commit 6d4e80db ("block: add capacity validation in
bdev_add_partition()") add check of partition's start and end sectors to
prevent exceeding the size of the disk when adding partitions. However,
there is still no check for resizing partitions now.
Move the check to blkpg_do_ioctl() to cover resizing partitions.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240305032132.548958-1-lilingfeng@huaweicloud.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

b9355185

block: prevent division by zero in blk_rq_stat_sum() · 93f52fbe

Roman Smirnov authored Mar 05, 2024

The expression dst->nr_samples + src->nr_samples may
have zero value on overflow. It is necessary to add
a check to avoid division by zero.

Found by Linux Verification Center (linuxtesting.org) with Svace.
Signed-off-by: Roman Smirnov <r.smirnov@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Link: https://lore.kernel.org/r/20240305134509.23108-1-r.smirnov@omp.ruSigned-off-by: Jens Axboe <axboe@kernel.dk>

93f52fbe

drbd: atomically update queue limits in drbd_reconsider_queue_parameters · e6dfe748

Christoph Hellwig authored Mar 05, 2024

Switch drbd_reconsider_queue_parameters to set up the queue parameters
in an on-stack queue_limits structure and apply the atomically. Remove
various helpers that have become so trivial that they can be folded into
drbd_reconsider_queue_parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240305134041.137006-8-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

e6dfe748

drbd: split out a drbd_discard_supported helper · 5eaee6e9

Christoph Hellwig authored Mar 06, 2024

Add a helper to check if discard is supported for a given connection /
backing device combination.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20240306140332.623759-7-philipp.reisner@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5eaee6e9

drbd: don't set max_write_zeroes_sectors in decide_on_discard_support · e3992e02

Christoph Hellwig authored Mar 06, 2024

fixup_write_zeroes always overrides the max_write_zeroes_sectors value
a little further down the callchain, so don't bother to setup a limit
in decide_on_discard_support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20240306140332.623759-6-philipp.reisner@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e3992e02

drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters · e16344e5

Christoph Hellwig authored Mar 06, 2024

drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters
and there is no really clear boundary of responsibilities between the
two.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20240306140332.623759-5-philipp.reisner@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

e16344e5

drbd: refactor the backing dev max_segments calculation · 2828908d

Christoph Hellwig authored Mar 06, 2024

Factor out a drbd_backing_dev_max_segments helper that checks the
backing device limitation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Philipp Reisner <philipp.reisner@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20240306140332.623759-4-philipp.reisner@linbit.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2828908d

drbd: refactor drbd_reconsider_queue_parameters · 342d81fd

Christoph Hellwig authored Mar 05, 2024

Split out a drbd_max_peer_bio_size helper for the peer I/O size,
and condense the various checks to a nested min3(..., max())) instead
of using a lot of local variables.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240305134041.137006-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

342d81fd

drbd: pass the max_hw_sectors limit to blk_alloc_disk · aa067325

Christoph Hellwig authored Mar 05, 2024

Pass a queue_limits structure with the max_hw_sectors limit to
blk_alloc_disk instead of updating the limit on the allocated gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240305134041.137006-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

aa067325

sed-opal: Remove the ret variable from the function · 5f2ad31f

Li kunyu authored Mar 06, 2024

The ret variable in the function has not yet been effective and can be
removed.
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Link: https://lore.kernel.org/r/20240306101444.1244-1-kunyu@nfschina.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

5f2ad31f

sed-opal: Remove unnecessary ‘0’ values from ret · 2449be8c

Li kunyu authored Mar 06, 2024

ret is assigned first, so it does not need to initialize the assignment.
Signed-off-by: Li kunyu <kunyu@nfschina.com>
Link: https://lore.kernel.org/r/20240306100659.106521-1-kunyu@nfschina.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

2449be8c

sed-opal: Remove unnecessary ‘0’ values from err · 217fcc48

Li zeming authored Mar 06, 2024

err is assigned first, so it does not need to initialize the assignment.
Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20240306100216.69340-1-zeming@nfschina.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

217fcc48

sed-opal: Remove unnecessary ‘0’ values from error · 147fe613

Li zeming authored Mar 06, 2024

error is assigned first, so it does not need to initialize the assignment.
Signed-off-by: Li zeming <zeming@nfschina.com>
Link: https://lore.kernel.org/r/20240306095608.26839-1-zeming@nfschina.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

147fe613

block: make block_class constant · f8c7511d

Ricardo B. Marliere authored Mar 05, 2024

Since commit 43a7206b ("driver core: class: make class_register() take
a const *"), the driver core allows for struct class to be in read-only
memory, so move the block_class structure to be declared at build time
placing it into read-only memory, instead of having to be dynamically
allocated at boot time.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240305-class_cleanup-block-v1-1-130bb27b9c72@marliere.netSigned-off-by: Jens Axboe <axboe@kernel.dk>

f8c7511d

Merge tag 'md-6.9-20240305' of... · 33bffbb7

Jens Axboe authored Mar 06, 2024

Merge tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block

Pull MD fixes from Song:

"This set fixes two issues:

 1. dmraid regression since 6.7 kernels. This issue was initially
    reported in [1]. This set of fix has been reviewed and tested by
    md and dm folks.

 2. raid5 hang since 6.7 kernel, reported in [2]. We haven't got a
    better fix for this issue yet. This revert is a workaround. It has
    been applied to 6.7 stable kernels [3], and proved to be affective.
    We will look more into this issue for a better fix.

 [1] https://lore.kernel.org/linux-raid/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
 [2] https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/
 [3] 87165c64fe1a in linux-6.7.y branch."

* tag 'md-6.9-20240305' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  dm-raid: fix lockdep waring in "pers->hot_add_disk"
  dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
  dm-raid: add a new helper prepare_suspend() in md_personality
  md/dm-raid: don't call md_reap_sync_thread() directly
  dm-raid: really frozen sync_thread during suspend
  md: add a new helper reshape_interrupted()
  md: export helper md_is_rdwr()
  md: export helpers to stop sync_thread
  md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
  Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d""

33bffbb7

dasd: use the atomic queue limits API · fde07a4d

Christoph Hellwig authored Feb 28, 2024

Pass the constant limits directly to blk_mq_alloc_disk, set the nonrot
flag there as well, and then use the commit API to change the transfer
size and logical block size dependent values.

This relies on the assumption that no I/O can be pending before the
devices moves into the ready state and doesn't need extra freezing
for changes to the queue limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240228133742.806274-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

fde07a4d

dasd: move queue setup to common code · 0127a47f

Christoph Hellwig authored Feb 28, 2024

Most of the code in setup_blk_queue is shared between all disciplines.
Move it to common code and leave a method to query the maximum number
of transferable blocks, and a flag to indicate discard support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240228133742.806274-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

0127a47f

dasd: cleamup dasd_state_basic_to_ready · 41463f2d

Christoph Hellwig authored Feb 28, 2024

Reflow dasd_state_basic_to_ready a bit to make it easier to modify.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Haberland <sth@linux.ibm.com>
Link: https://lore.kernel.org/r/20240228133742.806274-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

41463f2d

block: Fix page refcounts for unaligned buffers in __bio_release_pages() · 38b43539

Tony Battersby authored Feb 29, 2024

Fix an incorrect number of pages being released for buffers that do not
start at the beginning of a page.

Fixes: 1b151e24 ("block: Remove special-casing of compound pages")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Tested-by: Greg Edwards <gedwards@ddn.com>
Link: https://lore.kernel.org/r/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.comSigned-off-by: Jens Axboe <axboe@kernel.dk>

38b43539

05 Mar, 2024 11 commits

Merge branch 'dmraid-fix-6.9' into md-6.9 · 3a889fdc

Song Liu authored Mar 05, 2024

This is the second half of fixes for dmraid. The first half is available
at [1].

This set contains fixes:
 - reshape can start unexpected, cause data corruption, patch 1,5,6;
 - deadlocks that reshape concurrent with IO, patch 8;
 - a lockdep warning, patch 9;

For all the dmraid related tests in lvm2 suite, there is no new
regressions compared against 6.6 kernels (which is good baseline before
recent regressions).

[1] https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@mail.gmail.com/

* dmraid-fix-6.9:
  dm-raid: fix lockdep waring in "pers->hot_add_disk"
  dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape
  dm-raid: add a new helper prepare_suspend() in md_personality
  md/dm-raid: don't call md_reap_sync_thread() directly
  dm-raid: really frozen sync_thread during suspend
  md: add a new helper reshape_interrupted()
  md: export helper md_is_rdwr()
  md: export helpers to stop sync_thread
  md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume

3a889fdc

dm-raid: fix lockdep waring in "pers->hot_add_disk" · 95009ae9

Yu Kuai authored Mar 05, 2024

The lockdep assert is added by commit a448af25 ("md/raid10: remove
rcu protection to access rdev from conf") in print_conf(). And I didn't
notice that dm-raid is calling "pers->hot_add_disk" without holding
'reconfig_mutex'.

"pers->hot_add_disk" read and write many fields that is protected by
'reconfig_mutex', and raid_resume() already grab the lock in other
contex. Hence fix this problem by protecting "pers->host_add_disk"
with the lock.

Fixes: 9092c02d ("DM RAID: Add ability to restore transiently failed devices on resume")
Fixes: a448af25 ("md/raid10: remove rcu protection to access rdev from conf")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-10-yukuai1@huaweicloud.com

95009ae9

dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io concurrent with reshape · 41425f96

Yu Kuai authored Mar 05, 2024

For raid456, if reshape is still in progress, then IO across reshape
position will wait for reshape to make progress. However, for dm-raid,
in following cases reshape will never make progress hence IO will hang:

1) the array is read-only;
2) MD_RECOVERY_WAIT is set;
3) MD_RECOVERY_FROZEN is set;

After commit c467e97f ("md/raid6: use valid sector values to determine
if an I/O should wait on the reshape") fix the problem that IO across
reshape position doesn't wait for reshape, the dm-raid test
shell/lvconvert-raid-reshape.sh start to hang:

[root@fedora ~]# cat /proc/979/stack
[<0>] wait_woken+0x7d/0x90
[<0>] raid5_make_request+0x929/0x1d70 [raid456]
[<0>] md_handle_request+0xc2/0x3b0 [md_mod]
[<0>] raid_map+0x2c/0x50 [dm_raid]
[<0>] __map_bio+0x251/0x380 [dm_mod]
[<0>] dm_submit_bio+0x1f0/0x760 [dm_mod]
[<0>] __submit_bio+0xc2/0x1c0
[<0>] submit_bio_noacct_nocheck+0x17f/0x450
[<0>] submit_bio_noacct+0x2bc/0x780
[<0>] submit_bio+0x70/0xc0
[<0>] mpage_readahead+0x169/0x1f0
[<0>] blkdev_readahead+0x18/0x30
[<0>] read_pages+0x7c/0x3b0
[<0>] page_cache_ra_unbounded+0x1ab/0x280
[<0>] force_page_cache_ra+0x9e/0x130
[<0>] page_cache_sync_ra+0x3b/0x110
[<0>] filemap_get_pages+0x143/0xa30
[<0>] filemap_read+0xdc/0x4b0
[<0>] blkdev_read_iter+0x75/0x200
[<0>] vfs_read+0x272/0x460
[<0>] ksys_read+0x7a/0x170
[<0>] __x64_sys_read+0x1c/0x30
[<0>] do_syscall_64+0xc6/0x230
[<0>] entry_SYSCALL_64_after_hwframe+0x6c/0x74

This is because reshape can't make progress.

For md/raid, the problem doesn't exist because register new sync_thread
doesn't rely on the IO to be done any more:

1) If array is read-only, it can switch to read-write by ioctl/sysfs;
2) md/raid never set MD_RECOVERY_WAIT;
3) If MD_RECOVERY_FROZEN is set, mddev_suspend() doesn't hold
   'reconfig_mutex', hence it can be cleared and reshape can continue by
   sysfs api 'sync_action'.

However, I'm not sure yet how to avoid the problem in dm-raid yet. This
patch on the one hand make sure raid_message() can't change
sync_thread() through raid_message() after presuspend(), on the other
hand detect the above 3 cases before wait for IO do be done in
dm_suspend(), and let dm-raid requeue those IO.

Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-9-yukuai1@huaweicloud.com

41425f96

dm-raid: add a new helper prepare_suspend() in md_personality · 5625ff8b

Yu Kuai authored Mar 05, 2024

There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.

Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-8-yukuai1@huaweicloud.com

5625ff8b

md/dm-raid: don't call md_reap_sync_thread() directly · cd32b27a

Yu Kuai authored Mar 05, 2024

Currently md_reap_sync_thread() is called from raid_message() directly
without holding 'reconfig_mutex', this is definitely unsafe because
md_reap_sync_thread() can change many fields that is protected by
'reconfig_mutex'.

However, hold 'reconfig_mutex' here is still problematic because this
will cause deadlock, for example, commit 130443d6 ("md: refactor
idle/frozen_sync_thread() to fix deadlock").

Fix this problem by using stop_sync_thread() to unregister sync_thread,
like md/raid did.

Fixes: be83651f ("DM RAID: Add message/status support for changing sync action")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-7-yukuai1@huaweicloud.com

cd32b27a

dm-raid: really frozen sync_thread during suspend · 16c4770c

Yu Kuai authored Mar 05, 2024

1) commit f52f5c71 ("md: fix stopping sync thread") remove
   MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
   dm-raid relies on __md_stop_writes() to frozen sync_thread
   indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
   md_stop_writes(), and since stop_sync_thread() is only used for
   dm-raid in this case, also move stop_sync_thread() to
   md_stop_writes().
2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
   it only prevent new sync_thread to start, and it can't stop the
   running sync thread; In order to frozen sync_thread, after seting the
   flag, stop_sync_thread() should be used.
3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
   it as condition for md_stop_writes() in raid_postsuspend() doesn't
   look correct. Consider that reentrant stop_sync_thread() do nothing,
   always call md_stop_writes() in raid_postsuspend().
4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
   and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
   new sync_thread can start unexpected. Fix this by disallow
   raid_message() to change sync_thread status during suspend.

Note that after commit f52f5c71 ("md: fix stopping sync thread"), the
test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
and with previous fixes, the test won't hang there anymore, however, the
test will still fail and complain that ext4 is corrupted. And with this
patch, the test won't hang due to stop_sync_thread() or fail due to ext4
is corrupted anymore. However, there is still a deadlock related to
dm-raid456 that will be fixed in following patches.
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
Fixes: 1af2048a ("dm raid: fix deadlock caused by premature md_stop_writes()")
Fixes: 9dbd1aa3 ("dm raid: add reshaping support to the target")
Fixes: f52f5c71 ("md: fix stopping sync thread")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-6-yukuai1@huaweicloud.com

16c4770c

md: add a new helper reshape_interrupted() · 503f9d43

Yu Kuai authored Mar 05, 2024

The helper will be used for dm-raid456 later to detect the case that
reshape can't make progress.

Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-5-yukuai1@huaweicloud.com

503f9d43

md: export helper md_is_rdwr() · 314e9af0

Yu Kuai authored Mar 05, 2024

There are no functional changes for now, prepare to fix a deadlock for
dm-raid456.

Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-4-yukuai1@huaweicloud.com

314e9af0

md: export helpers to stop sync_thread · 7a2347e2

Yu Kuai authored Mar 05, 2024

Add new helpers:

  void md_idle_sync_thread(struct mddev *mddev);
  void md_frozen_sync_thread(struct mddev *mddev);
  void md_unfrozen_sync_thread(struct mddev *mddev);

The helpers will be used in dm-raid in later patches to fix regressions
and prevent calling md_reap_sync_thread() directly.

Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-3-yukuai1@huaweicloud.com

7a2347e2

md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume · 2f03d0c2

Yu Kuai authored Mar 05, 2024

After commit 9dbd1aa3 ("dm raid: add reshaping support to the
target") raid_ctr() will set MD_RECOVERY_FROZEN before md_run() and
expect to keep array frozen until resume. However, md_run() will clear
the flag by setting mddev->recovery to 0.

Before commit 1baae052 ("md: Don't ignore suspended array in
md_check_recovery()"), dm-raid actually relied on suspending to prevent
starting new sync_thread.

Fix this problem by keeping 'MD_RECOVERY_FROZEN' for dm-raid in
md_run().

Fixes: 1baae052 ("md: Don't ignore suspended array in md_check_recovery()")
Fixes: 9dbd1aa3 ("dm raid: add reshaping support to the target")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240305072306.2562024-2-yukuai1@huaweicloud.com

2f03d0c2

Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"" · 3445139e

Song Liu authored Jan 25, 2024

This reverts commit bed9e27b.

The original set [1][2] was expected to undo a suboptimal fix in [2], and
replace it with a better fix [1]. However, as reported by Dan Moulding [2]
causes an issue with raid5 with journal device.

Revert [2] for now to close the issue. We will follow up on another issue
reported by Juxiao Bi, as [2] is expected to fix it. We believe this is a
good trade-off, because the latter issue happens less freqently.

In the meanwhile, we will NOT revert [1], as it contains the right logic.

[1] commit d6e035aa ("md: bypass block throttle for superblock update")
[2] commit bed9e27b ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
Reported-by: Dan Moulding <dan@danm.net>
Closes: https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/
Fixes: bed9e27b ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
Cc: stable@vger.kernel.org # v5.19+
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240125082131.788600-1-song@kernel.org

3445139e

01 Mar, 2024 8 commits

nbd: use the atomic queue limits API in nbd_set_size · 26828324

Christoph Hellwig authored Feb 29, 2024

Use queue_limits_start_update / queue_limits_commit_update to update
all the limits in one go and with proper sanity checking.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240229143846.1047223-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

26828324

nbd: freeze the queue for queue limits updates · 242a49e5

Christoph Hellwig authored Feb 29, 2024

nbd currently updates the logical and physical block sizes as well
as the discard_sectors on a live queue. Freeze the queue first to
make sure there are not commands in flight that can see torn or
inconsistent limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240229143846.1047223-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

242a49e5

nbd: don't clear discard_sectors in nbd_config_put · 7ea201f2

Christoph Hellwig authored Feb 29, 2024

nbd_config_put currently clears discard_sectors when unusing a device.
This is pretty odd behavior and different from the sector size
configuration which is simply left in places and then reconfigured when
nbd_set_size is as part of configuring the device. Change nbd_set_size
to clear discard_sectors if discard is not supported so that all the
queue limits changes are handled in one place.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240229143846.1047223-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

7ea201f2

pktcdvd: don't set max_hw_sectors on the underlying device · eabf5dfc

Christoph Hellwig authored Feb 29, 2024

pktcdvd sets max_hw_sectors on the queue of the underlying device that
it doesn't own (and doesn't reset it ever) since the driver was merged.
This can create all kinds of problems as the underlying driver doesn't
even know about it changing the limit.

As the state purpose is to not create I/Os larger than a single frame,
and pktcdvd never builds bios larger than that, just set REQ_NOMERGE
on the bios it submits so that largers I/Os never get built.

Note: I don't have packet writing hardware, so this is compile tested
only.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240229144408.1047967-1-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

eabf5dfc

dm: use queue_limits_set · 8e0ef412

Christoph Hellwig authored Feb 28, 2024

Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue.  For that make sure all limits are actually updated before
the assignment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20240228225653.947152-4-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

8e0ef412

block: add a queue_limits_stack_bdev helper · c1373f1c

Christoph Hellwig authored Feb 28, 2024

Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-3-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

c1373f1c

block: add a queue_limits_set helper · 631d4efb

Christoph Hellwig authored Feb 28, 2024

Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240228225653.947152-2-hch@lst.deSigned-off-by: Jens Axboe <axboe@kernel.dk>

631d4efb

Merge tag 'md-6.9-20240301' of... · 86b1e613

Jens Axboe authored Mar 01, 2024

Merge tag 'md-6.9-20240301' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.9/block

Pull MD updates from Song:

"The major changes are:

 1. Refactor raid1 read_balance, by Yu Kuai and Paul Luse.
 2. Clean up and fix for md_ioctl, by Li Nan.
 3. Other small fixes, by Gui-Dong Han and Heming Zhao."

* tag 'md-6.9-20240301' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: (22 commits)
  md/raid1: factor out helpers to choose the best rdev from read_balance()
  md/raid1: factor out the code to manage sequential IO
  md/raid1: factor out choose_bb_rdev() from read_balance()
  md/raid1: factor out choose_slow_rdev() from read_balance()
  md/raid1: factor out read_first_rdev() from read_balance()
  md/raid1-10: factor out a new helper raid1_should_read_first()
  md/raid1-10: add a helper raid1_check_read_range()
  md/raid1: fix choose next idle in read_balance()
  md/raid1: record nonrot rdevs while adding/removing rdevs to conf
  md/raid1: factor out helpers to add rdev to conf
  md: add a new helper rdev_has_badblock()
  md/raid5: fix atomicity violation in raid5_cache_count
  md/md-bitmap: fix incorrect usage for sb_index
  md: check mddev->pers before calling md_set_readonly()
  md: clean up openers check in do_md_stop() and md_set_readonly()
  md: sync blockdev before stopping raid or setting readonly
  md: factor out a helper to sync mddev
  md: Don't clear MD_CLOSING when the raid is about to stop
  md: return directly before setting did_set_md_closing
  md: clean up invalid BUG_ON in md_ioctl
  ...

86b1e613