• Filipe Manana's avatar
    btrfs: avoid unnecessary logging of xattrs during fast fsyncs · b590b839
    Filipe Manana authored
    When logging an inode we always log all its xattrs, so that we are able
    to figure out which ones should be deleted during log replay. However this
    is unnecessary when we are doing a fast fsync and no xattrs were added,
    changed or deleted since the last time we logged the inode in the current
    transaction.
    
    So skip the logging of xattrs when the inode was previously logged in the
    current transaction and no xattrs were added, changed or deleted. If any
    changes to xattrs happened, than the inode has BTRFS_INODE_COPY_EVERYTHING
    set in its runtime flags and the xattrs get logged. This saves time on
    scanning for xattrs, allocating memory, COWing log tree extent buffers and
    adding more lock contention on the extent buffers when there are multiple
    tasks logging in parallel.
    
    The use of xattrs is common when using ACLs, some applications, or when
    using security modules like SELinux where every inode gets a security
    xattr added to it.
    
    The following test script, using fio, was used on a box with 12 cores, 64G
    of RAM, a NVMe device and the default non-debug kernel config from Debian.
    It uses 8 concurrent jobs each writing in blocks of 64K to its own 4G file,
    each file with a single xattr of 50 bytes (about the same size for an ACL
    or SELinux xattr), doing random buffered writes with an fsync after each
    write.
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/nvme0n1
       MNT=/mnt/test
       MOUNT_OPTIONS="-o ssd"
       MKFS_OPTIONS="-d single -m single"
    
       NUM_JOBS=8
       FILE_SIZE=4G
    
       cat <<EOF > /tmp/fio-job.ini
       [writers]
       rw=randwrite
       fsync=1
       fallocate=none
       group_reporting=1
       direct=0
       bs=64K
       ioengine=sync
       size=$FILE_SIZE
       directory=$MNT
       numjobs=$NUM_JOBS
       EOF
    
       echo "performance" | \
           tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
       mkfs.btrfs -f $MKFS_OPTIONS $DEV > /dev/null
       mount $MOUNT_OPTIONS $DEV $MNT
    
       echo "Creating files before fio runs, each with 1 xattr of 50 bytes"
       for ((i = 0; i < $NUM_JOBS; i++)); do
           path="$MNT/writers.$i.0"
           truncate -s $FILE_SIZE $path
           setfattr -n user.xa1 -v $(printf '%0.sX' $(seq 50)) $path
       done
    
       fio /tmp/fio-job.ini
       umount $MNT
    
    fio output before this change:
    
    WRITE: bw=120MiB/s (126MB/s), 120MiB/s-120MiB/s (126MB/s-126MB/s), io=32.0GiB (34.4GB), run=272145-272145msec
    
    fio output after this change:
    
    WRITE: bw=142MiB/s (149MB/s), 142MiB/s-142MiB/s (149MB/s-149MB/s), io=32.0GiB (34.4GB), run=230408-230408msec
    
    +16.8% throughput, -16.6% runtime
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    b590b839
tree-log.c 176 KB