• Shaohua Li's avatar
    md/raid1/10: fix potential deadlock · c4743e80
    Shaohua Li authored
    [ Upstream commit 61eb2b43 ]
    
    Neil Brown pointed out a potential deadlock in raid 10 code with
    bio_split/chain. The raid1 code could have the same issue, but recent
    barrier rework makes it less likely to happen. The deadlock happens in
    below sequence:
    
    1. generic_make_request(bio), this will set current->bio_list
    2. raid10_make_request will split bio to bio1 and bio2
    3. __make_request(bio1), wait_barrer, add underlayer disk bio to
    current->bio_list
    4. __make_request(bio2), wait_barrer
    
    If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
    raise_barrier waits for IO completion from 3. And since raise_barrier
    sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
    dispatched because raid10_make_request() doesn't finished yet.
    
    The solution is to adjust the IO ordering. Quotes from Neil:
    "
    It is much safer to:
    
        if (need to split) {
            split = bio_split(bio, ...)
            bio_chain(...)
            make_request_fn(split);
            generic_make_request(bio);
       } else
            make_request_fn(mddev, bio);
    
    This way we first process the initial section of the bio (in 'split')
    which will queue some requests to the underlying devices.  These
    requests will be queued in generic_make_request.
    Then we queue the remainder of the bio, which will be added to the end
    of the generic_make_request queue.
    Then we return.
    generic_make_request() will pop the lower-level device requests off the
    queue and handle them first.  Then it will process the remainder
    of the original bio once the first section has been fully processed.
    "
    
    Note, this only happens in read path. In write path, the bio is flushed to
    underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
    It's queued in current->bio_list.
    
    Cc: Coly Li <colyli@suse.de>
    Cc: stable@vger.kernel.org (v3.14+, only the raid10 part)
    Suggested-by: default avatarNeilBrown <neilb@suse.com>
    Reviewed-by: default avatarJack Wang <jinpu.wang@profitbricks.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
    c4743e80
raid10.c 131 KB