• Li Nan's avatar
    md: fix deadlock between mddev_suspend and flush bio · 611d5cbc
    Li Nan authored
    Deadlock occurs when mddev is being suspended while some flush bio is in
    progress. It is a complex issue.
    
    T1. the first flush is at the ending stage, it clears 'mddev->flush_bio'
        and tries to submit data, but is blocked because mddev is suspended
        by T4.
    T2. the second flush sets 'mddev->flush_bio', and attempts to queue
        md_submit_flush_data(), which is already running (T1) and won't
        execute again if on the same CPU as T1.
    T3. the third flush inc active_io and tries to flush, but is blocked because
        'mddev->flush_bio' is not NULL (set by T2).
    T4. mddev_suspend() is called and waits for active_io dec to 0 which is inc
        by T3.
    
      T1		T2		T3		T4
      (flush 1)	(flush 2)	(third 3)	(suspend)
      md_submit_flush_data
       mddev->flush_bio = NULL;
       .
       .	 	md_flush_request
       .	  	 mddev->flush_bio = bio
       .	  	 queue submit_flushes
       .		 .
       .		 .		md_handle_request
       .		 .		 active_io + 1
       .		 .		 md_flush_request
       .		 .		  wait !mddev->flush_bio
       .		 .
       .		 .				mddev_suspend
       .		 .				 wait !active_io
       .		 .
       .		 submit_flushes
       .		 queue_work md_submit_flush_data
       .		 //md_submit_flush_data is already running (T1)
       .
       md_handle_request
        wait resume
    
    The root issue is non-atomic inc/dec of active_io during flush process.
    active_io is dec before md_submit_flush_data is queued, and inc soon
    after md_submit_flush_data() run.
      md_flush_request
        active_io + 1
        submit_flushes
          active_io - 1
          md_submit_flush_data
            md_handle_request
            active_io + 1
              make_request
            active_io - 1
    
    If active_io is dec after md_handle_request() instead of within
    submit_flushes(), make_request() can be called directly intead of
    md_handle_request() in md_submit_flush_data(), and active_io will
    only inc and dec once in the whole flush process. Deadlock will be
    fixed.
    
    Additionally, the only difference between fixing the issue and before is
    that there is no return error handling of make_request(). But after
    previous patch cleaned md_write_start(), make_requst() only return error
    in raid5_make_request() by dm-raid, see commit 41425f96 ("dm-raid456,
    md/raid456: fix a deadlock for dm-raid456 while io concurrent with
    reshape)". Since dm always splits data and flush operation into two
    separate io, io size of flush submitted by dm always is 0, make_request()
    will not be called in md_submit_flush_data(). To prevent future
    modifications from introducing issues, add WARN_ON to ensure
    make_request() no error is returned in this context.
    
    Fixes: fa2bbff7 ("md: synchronize flush io with array reconfiguration")
    Signed-off-by: default avatarLi Nan <linan122@huawei.com>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    Link: https://lore.kernel.org/r/20240525185257.3896201-3-linan666@huaweicloud.com
    611d5cbc
md.c 267 KB