• Filipe Manana's avatar
    btrfs: add btree read ahead for incremental send operations · 2ce73c63
    Filipe Manana authored
    Currently we do not do btree read ahead when doing an incremental send,
    however we know that we will read and process any node or leaf in the
    send root that has a generation greater than the generation of the parent
    root. So triggering read ahead for such nodes and leafs is beneficial
    for an incremental send.
    
    This change does that, triggers read ahead of any node or leaf in the
    send root that has a generation greater then the generation of the
    parent root. As for the parent root, no readahead is triggered because
    knowing in advance which nodes/leaves are going to be read is not so
    linear and there's often a large time window between visiting nodes or
    leaves of the parent root. So I opted to leave out the parent root,
    and triggering read ahead for its nodes/leaves seemed to have not made
    significant difference.
    
    The following test script was used to measure the improvement on a box
    using an average, consumer grade, spinning disk and with 16GiB of ram:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
      MKFS_OPTIONS="--nodesize 16384"     # default, just to be explicit
      MOUNT_OPTIONS="-o max_inline=2048"  # default, just to be explicit
    
      mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null
      mount $MOUNT_OPTIONS $DEV $MNT
    
      # Create files with inline data to make it easier and faster to create
      # large btrees.
      add_files()
      {
          local total=$1
          local start_offset=$2
          local number_jobs=$3
          local total_per_job=$(($total / $number_jobs))
    
          echo "Creating $total new files using $number_jobs jobs"
          for ((n = 0; n < $number_jobs; n++)); do
              (
                  local start_num=$(($start_offset + $n * $total_per_job))
                  for ((i = 1; i <= $total_per_job; i++)); do
                      local file_num=$((start_num + $i))
                      local file_path="$MNT/file_${file_num}"
                      xfs_io -f -c "pwrite -S 0xab 0 2000" $file_path > /dev/null
                      if [ $? -ne 0 ]; then
                          echo "Failed creating file $file_path"
                          break
                      fi
                  done
              ) &
              worker_pids[$n]=$!
          done
    
          wait ${worker_pids[@]}
    
          sync
          echo
          echo "btree node/leaf count: $(btrfs inspect-internal dump-tree -t 5 $DEV | egrep '^(node|leaf) ' | wc -l)"
      }
    
      initial_file_count=500000
      add_files $initial_file_count 0 4
    
      echo
      echo "Creating first snapshot..."
      btrfs subvolume snapshot -r $MNT $MNT/snap1
    
      echo
      echo "Adding more files..."
      add_files $((initial_file_count / 4)) $initial_file_count 4
    
      echo
      echo "Updating 1/50th of the initial files..."
      for ((i = 1; i < $initial_file_count; i += 50)); do
          xfs_io -c "pwrite -S 0xcd 0 20" $MNT/file_$i > /dev/null
      done
    
      echo
      echo "Creating second snapshot..."
      btrfs subvolume snapshot -r $MNT $MNT/snap2
    
      umount $MNT
    
      echo 3 > /proc/sys/vm/drop_caches
      blockdev --flushbufs $DEV &> /dev/null
      hdparm -F $DEV &> /dev/null
    
      mount $MOUNT_OPTIONS $DEV $MNT
    
      echo
      echo "Testing full send..."
      start=$(date +%s)
      btrfs send $MNT/snap1 > /dev/null
      end=$(date +%s)
      echo
      echo "Full send took $((end - start)) seconds"
    
      umount $MNT
    
      echo 3 > /proc/sys/vm/drop_caches
      blockdev --flushbufs $DEV &> /dev/null
      hdparm -F $DEV &> /dev/null
    
      mount $MOUNT_OPTIONS $DEV $MNT
    
      echo
      echo "Testing incremental send..."
      start=$(date +%s)
      btrfs send -p $MNT/snap1 $MNT/snap2 > /dev/null
      end=$(date +%s)
      echo
      echo "Incremental send took $((end - start)) seconds"
    
      umount $MNT
    
    Before this change, incremental send duration:
    
      with $initial_file_count == 200000:  51 seconds
      with $initial_file_count == 500000: 168 seconds
    
    After this change, incremental send duration:
    
      with $initial_file_count == 200000:   39 seconds (-26.7%)
      with $initial_file_count == 500000:  125 seconds (-29.4%)
    
    For $initial_file_count == 200000 there are 62600 nodes and leaves in the
    btree of the first snapshot, and 77759 nodes and leaves in the btree of
    the second snapshot. The root nodes were at level 2.
    
    While for $initial_file_count == 500000 there are 152476 nodes and leaves
    in the btree of the first snapshot, and 190511 nodes and leaves in the
    btree of the second snapshot. The root nodes were at level 2 as well.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    2ce73c63
send.c 182 KB