• Yu Kuai's avatar
    dm-raid: really frozen sync_thread during suspend · 16c4770c
    Yu Kuai authored
    1) commit f52f5c71 ("md: fix stopping sync thread") remove
       MD_RECOVERY_FROZEN from __md_stop_writes() and doesn't realize that
       dm-raid relies on __md_stop_writes() to frozen sync_thread
       indirectly. Fix this problem by adding MD_RECOVERY_FROZEN in
       md_stop_writes(), and since stop_sync_thread() is only used for
       dm-raid in this case, also move stop_sync_thread() to
       md_stop_writes().
    2) The flag MD_RECOVERY_FROZEN doesn't mean that sync thread is frozen,
       it only prevent new sync_thread to start, and it can't stop the
       running sync thread; In order to frozen sync_thread, after seting the
       flag, stop_sync_thread() should be used.
    3) The flag MD_RECOVERY_FROZEN doesn't mean that writes are stopped, use
       it as condition for md_stop_writes() in raid_postsuspend() doesn't
       look correct. Consider that reentrant stop_sync_thread() do nothing,
       always call md_stop_writes() in raid_postsuspend().
    4) raid_message can set/clear the flag MD_RECOVERY_FROZEN at anytime,
       and if MD_RECOVERY_FROZEN is cleared while the array is suspended,
       new sync_thread can start unexpected. Fix this by disallow
       raid_message() to change sync_thread status during suspend.
    
    Note that after commit f52f5c71 ("md: fix stopping sync thread"), the
    test shell/lvconvert-raid-reshape.sh start to hang in stop_sync_thread(),
    and with previous fixes, the test won't hang there anymore, however, the
    test will still fail and complain that ext4 is corrupted. And with this
    patch, the test won't hang due to stop_sync_thread() or fail due to ext4
    is corrupted anymore. However, there is still a deadlock related to
    dm-raid456 that will be fixed in following patches.
    Reported-by: default avatarMikulas Patocka <mpatocka@redhat.com>
    Closes: https://lore.kernel.org/all/e5e8afe2-e9a8-49a2-5ab0-958d4065c55e@redhat.com/
    Fixes: 1af2048a ("dm raid: fix deadlock caused by premature md_stop_writes()")
    Fixes: 9dbd1aa3 ("dm raid: add reshaping support to the target")
    Fixes: f52f5c71 ("md: fix stopping sync thread")
    Cc: stable@vger.kernel.org # v6.7+
    Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
    Signed-off-by: default avatarXiao Ni <xni@redhat.com>
    Acked-by: default avatarMike Snitzer <snitzer@kernel.org>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240305072306.2562024-6-yukuai1@huaweicloud.com
    16c4770c
dm-raid.c 118 KB