• Filipe Manana's avatar
    Btrfs: fix block group remaining RO forever after error during device replace · 042528f8
    Filipe Manana authored
    When doing a device replace, while at scrub.c:scrub_enumerate_chunks(), we
    set the block group to RO mode and then wait for any ongoing writes into
    extents of the block group to complete. While doing that wait we overwrite
    the value of the variable 'ret' and can break out of the loop if an error
    happens without turning the block group back into RW mode. So what happens
    is the following:
    
    1) btrfs_inc_block_group_ro() returns 0, meaning it set the block group
       to RO mode (its ->ro field set to 1 or incremented to some value > 1);
    
    2) Then btrfs_wait_ordered_roots() returns a value > 0;
    
    3) Then if either joining or committing the transaction fails, we break
       out of the loop wihtout calling btrfs_dec_block_group_ro(), leaving
       the block group in RO mode forever.
    
    To fix this, just remove the code that waits for ongoing writes to extents
    of the block group, since it's not needed because in the initial setup
    phase of a device replace operation, before starting to find all chunks
    and their extents, we set the target device for replace while holding
    fs_info->dev_replace->rwsem, which ensures that after releasing that
    semaphore, any writes into the source device are made to the target device
    as well (__btrfs_map_block() guarantees that). So while at
    scrub_enumerate_chunks() we only need to worry about finding and copying
    extents (from the source device to the target device) that were written
    before we started the device replace operation.
    
    Fixes: f0e9b7d6 ("Btrfs: fix race setting block group readonly during device replace")
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    042528f8
ordered-data.c 28.2 KB