• Miao Xie's avatar
    Btrfs: fix use-after-free in the finishing procedure of the device replace · c404e0dc
    Miao Xie authored
    During device replace test, we hit a null pointer deference (It was very easy
    to reproduce it by running xfstests' btrfs/011 on the devices with the virtio
    scsi driver). There were two bugs that caused this problem:
    - We might allocate new chunks on the replaced device after we updated
      the mapping tree. And we forgot to replace the source device in those
      mapping of the new chunks.
    - We might get the mapping information which including the source device
      before the mapping information update. And then submit the bio which was
      based on that mapping information after we freed the source device.
    
    For the first bug, we can fix it by doing mapping tree update and source
    device remove in the same context of the chunk mutex. The chunk mutex is
    used to protect the allocable device list, the above method can avoid
    the new chunk allocation, and after we remove the source device, all
    the new chunks will be allocated on the new device. So it can fix
    the first bug.
    
    For the second bug, we need make sure all flighting bios are finished and
    no new bios are produced during we are removing the source device. To fix
    this problem, we introduced a global @bio_counter, we not only inc/dec
    @bio_counter outsize of map_blocks, but also inc it before submitting bio
    and dec @bio_counter when ending bios.
    
    Since Raid56 is a little different and device replace dosen't support raid56
    yet, it is not addressed in the patch and I add comments to make sure we will
    fix it in the future.
    Reported-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
    Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
    Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
    Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
    c404e0dc
disk-io.c 111 KB