• Filipe Manana's avatar
    btrfs: improve batch insertion of delayed dir index items · 06ac264f
    Filipe Manana authored
    Currently we group delayed dir index items for insertion as a single batch
    (a single btree operation) as long as their keys are sequential in the key
    space.
    
    For example we have delayed index items for the following index keys:
    
       10, 11, 12, 15, 16, 20, 21
    
    We end up building three batches:
    
    1) First one for index keys 10, 11 and 12;
    2) Second one for index keys 15 and 16;
    3) Third one for index keys 20 and 21.
    
    However, since the dir index numbers come from a monotonically increasing
    counter and are never reused, we could group all these items into a single
    batch. The existence of holes in the sequence happens only when we had
    delayed dir index items for insertion that got deleted before they were
    flushed to the subvolume's tree.
    
    The delayed items are stored in a rbtree based on their key order, so
    we can just group items into a batch as long as they all fit in a leaf,
    and ignore if there's a gap (key offset, index number) between two
    consecutive items. This is more efficient and reduces the amount of
    time spent when running delayed items if there are gaps between dir
    index items.
    
    For example running the following test script:
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/sdj
      MNT=/mnt/sdj
    
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      NUM_FILES=100
    
      mkdir $MNT/testdir
      for ((i = 1; i <= $NUM_FILES; i++)); do
           echo -n > $MNT/testdir/file_$i
      done
    
      # Now delete every other file, to create gaps in the dir index keys.
      for ((i = 1; i <= $NUM_FILES; i += 2)); do
          rm -f $MNT/testdir/file_$i
      done
    
      start=$(date +%s%N)
      sync
      end=$(date +%s%N)
      dur=$(( (end - start) / 1000000 ))
    
      echo -e "\nsync took $dur milliseconds"
    
      umount $MNT
    
    While having the following bpftrace script running in another shell:
    
      $ cat bpf-delayed-items-inserts.sh
      #!/usr/bin/bpftrace
    
      /* Must add 'noinline' to btrfs_insert_delayed_items(). */
      k:btrfs_insert_delayed_items
      {
          @start_insert_delayed_items[tid] = nsecs;
      }
    
      k:btrfs_insert_empty_items
      /@start_insert_delayed_items[tid]/
      {
         @insert_batches = count();
      }
    
      kr:btrfs_insert_delayed_items
      /@start_insert_delayed_items[tid]/
      {
          $dur = (nsecs - @start_insert_delayed_items[tid]) / 1000;
          @btrfs_insert_delayed_items_total_time = sum($dur);
          delete(@start_insert_delayed_items[tid]);
      }
    
    Before this change:
    
    @btrfs_insert_delayed_items_total_time: 576
    @insert_batches: 51
    
    After this change:
    
    @btrfs_insert_delayed_items_total_time: 174
    @insert_batches: 2
    Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    06ac264f
delayed-inode.c 51 KB