• Anand Jain's avatar
    btrfs: fix false EIO for missing device · 102ed2c5
    Anand Jain authored
    When one of the device is missing, bbio_error() takes care of setting
    the error status. And if its only IO that is pending in that stripe, it
    fails to check the status of the other IO at %bbio_error before setting
    the error %bi_status for the %orig_bio. Fix this by checking if
    %bbio->error has exceeded the %bbio->max_errors.
    
    Reproducer as below fdatasync error is seen intermittently.
    
     mount -o degraded /dev/sdc /btrfs
     dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
    
     dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
    
     The reason for the intermittences of the problem is because
     the following conditions have to be met, which depends on timing:
     In btrfs_map_bio()
      - the RAID1 the missing device has to be at %dev_nr = 1
     In bbio_error()
      . before bbio_error() is called the bio of the not-missing
        device at %dev_nr = 0 must be completed so that the below
        condition is true
         if (atomic_dec_and_test(&bbio->stripes_pending)) {
    Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
    Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    102ed2c5
volumes.c 187 KB