• Filipe Manana's avatar
    btrfs: avoid logging all directory changes during renames · 88d2beec
    Filipe Manana authored
    When doing a rename of a file, if the file or its old parent directory
    were logged before, we log the new name of the file and then make sure
    we log the old parent directory, to ensure that after a log replay the
    old name of the file is deleted and the new name added.
    
    The logging of the old parent directory can take some time, because it
    will scan all leaves modified in the current transaction, check which
    directory entries were already logged, copy the ones that were not
    logged before, etc. In this rename context all we need to do is make
    sure that the old name of the file is deleted on log replay, so instead
    of triggering a directory log operation, we can just delete the old
    directory entry from the log if it's there, or in case it isn't there,
    just log a range item to signal log replay that the old name must be
    deleted. So change btrfs_log_new_name() to do that.
    
    This scenario is actually not uncommon to trigger, and recently on a
    5.15 kernel, an openSUSE Tumbleweed user reported package installations
    and upgrades, with the zypper tool, were often taking a long time to
    complete, much more than usual. With strace it could be observed that
    zypper was spending over 99% of its time on rename operations, and then
    with further analysis we checked that directory logging was happening
    too frequently and causing high latencies for the rename operations.
    Taking into account that installation/upgrade of some of these packages
    needed about a few thousand file renames, the slowdown was very noticeable
    for the user.
    
    The issue was caused indirectly due to an excessive number of inode
    evictions on a 5.15 kernel, about 100x more compared to a 5.13, 5.14
    or a 5.16-rc8 kernel. After an inode eviction we can't tell for sure,
    in an efficient way, if an inode was previously logged in the current
    transaction, so we are pessimistic and assume it was, because in case
    it was we need to update the logged inode. More details on that in one
    of the patches in the same series (subject "btrfs: avoid inode logging
    during rename and link when possible"). Either way, in case the parent
    directory was logged before, we currently do more work then necessary
    during a rename, and this change minimizes that amount of work.
    
    The following script mimics part of what a package installation/upgrade
    with zypper does, which is basically renaming a lot of files, in some
    directory under /usr, to a name with a suffix of "-RPMDELETE":
    
      $ cat test.sh
      #!/bin/bash
    
      DEV=/dev/nvme0n1
      MNT=/mnt/nvme0n1
    
      NUM_FILES=10000
    
      mkfs.btrfs -f $DEV
      mount $DEV $MNT
    
      mkdir $MNT/testdir
    
      for ((i = 1; i <= $NUM_FILES; i++)); do
          echo -n > $MNT/testdir/file_$i
      done
    
      sync
    
      # Do some change to testdir and fsync it.
      echo -n > $MNT/testdir/file_$((NUM_FILES + 1))
      xfs_io -c "fsync" $MNT/testdir
    
      echo "Renaming $NUM_FILES files..."
      start=$(date +%s%N)
      for ((i = 1; i <= $NUM_FILES; i++)); do
          mv $MNT/testdir/file_$i $MNT/testdir/file_$i-RPMDELETE
      done
      end=$(date +%s%N)
    
      dur=$(( (end - start) / 1000000 ))
      echo "Renames took $dur milliseconds"
    
      umount $MNT
    
    Testing this change on box using a non-debug kernel (Debian's default
    kernel config) gave the following results:
    
    NUM_FILES=10000, before this patch: 27399 ms
    NUM_FILES=10000, after this patch:   9093 ms (-66.8%)
    
    NUM_FILES=5000, before this patch:   9241 ms
    NUM_FILES=5000, after this patch:    4642 ms (-49.8%)
    
    NUM_FILES=2000, before this patch:   2550 ms
    NUM_FILES=2000, after this patch:    1788 ms (-29.9%)
    
    NUM_FILES=1000, before this patch:   1088 ms
    NUM_FILES=1000, after this patch:     905 ms (-16.9%)
    
    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1193549Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    88d2beec
tree-log.h 2.87 KB