1. 14 Mar, 2022 15 commits
    • Jiapeng Chong's avatar
      btrfs: zoned: remove redundant initialization of to_add · c4bf1909
      Jiapeng Chong authored
      to_add is being initialized to len but this is never read as to_add is
      overwritten later on. Remove the redundant initialization.
      
      Cleans up the following clang-analyzer warning:
      
      fs/btrfs/extent-tree.c:2769:8: warning: Value stored to 'to_add' during
      its initialization is never read [clang-analyzer-deadcode.DeadStores].
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c4bf1909
    • Anand Jain's avatar
      btrfs: cleanup temporary variables when finding rotational device status · 823f8e5c
      Anand Jain authored
      The pointer to struct request_queue is used only to get device type
      rotating or the non-rotating. So use it directly.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      823f8e5c
    • Anand Jain's avatar
      btrfs: use dev_t to match device in device_matched · 330a5bf4
      Anand Jain authored
      Commit "btrfs: add device major-minor info in the struct btrfs_device"
      saved the device major-minor number in the struct btrfs_device upon
      discovering it.
      
      So no need to lookup_bdev() again just match, which means
      device_matched() can go away.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      330a5bf4
    • Anand Jain's avatar
      btrfs: add device major-minor info in the struct btrfs_device · 4889bc05
      Anand Jain authored
      Internally it is common to use the major-minor number to identify a
      device and, at a few locations in btrfs, we use the major-minor number
      to match the device.
      
      So when we identify a new btrfs device through device add or device
      replace or device-scan/ready save the device's major-minor (dev_t) in the
      struct btrfs_device so that we don't have to call lookup_bdev() again.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4889bc05
    • Anand Jain's avatar
      btrfs: match stale devices by dev_t · 16cab91a
      Anand Jain authored
      After the commit "btrfs: harden identification of the stale device", we
      don't have to match the device path anymore. Instead, we match the dev_t.
      So pass in the dev_t instead of the device path, in the call chain
      btrfs_forget_devices()->btrfs_free_stale_devices().
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      16cab91a
    • Anand Jain's avatar
      btrfs: harden identification of a stale device · 770c79fb
      Anand Jain authored
      Identifying and removing the stale device from the fs_uuids list is done
      by btrfs_free_stale_devices().  btrfs_free_stale_devices() in turn
      depends on device_path_matched() to check if the device appears in more
      than one btrfs_device structure.
      
      The matching of the device happens by its path, the device path. However,
      when device mapper is in use, the dm device paths are nothing but a link
      to the actual block device, which leads to the device_path_matched()
      failing to match.
      
      Fix this by matching the dev_t as provided by lookup_bdev() instead of
      plain string compare of the device paths.
      Reported-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      770c79fb
    • Anand Jain's avatar
      btrfs: simplify fs_devices member access in btrfs_init_dev_replace_tgtdev · bef16b52
      Anand Jain authored
      In btrfs_init_dev_replace_tgtdev() we dereference fs_info to get
      fs_devices many times, instead save a point to the fs_devices.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      bef16b52
    • Sahil Kang's avatar
      btrfs: reuse existing inode from btrfs_ioctl · 9ad12305
      Sahil Kang authored
      btrfs_ioctl extracts inode from file so we can pass that into the
      callbacks.
      Signed-off-by: default avatarSahil Kang <sahil.kang@asilaycomputing.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9ad12305
    • Nikolay Borisov's avatar
      btrfs: move missing device handling in a dedicate function · ff37c89f
      Nikolay Borisov authored
      This simplifies the code flow in read_one_chunk and makes error handling
      when handling missing devices a bit simpler by reducing it to a single
      check if something went wrong. No functional changes.
      Reviewed-by: default avatarSu Yue <l@damenly.su>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ff37c89f
    • Filipe Manana's avatar
      btrfs: stop trying to log subdirectories created in past transactions · de6bc7f5
      Filipe Manana authored
      When logging a directory we are trying to log subdirectories that were
      changed in the current transaction and created in a past transaction.
      This type of behaviour was introduced by commit 2f2ff0ee ("Btrfs:
      fix metadata inconsistencies after directory fsync"), to fix some metadata
      inconsistencies that in the meanwhile no longer need this behaviour due to
      numerous other changes that happened throughout the years.
      
      This behaviour, besides not needed anymore, it's also undesirable because:
      
      1) It's not reliable because it's only triggered for the directories
         of dentries (dir items) that happen to be present on a leaf that
         was changed in the current transaction. If a dentry that points to
         a directory resides on a leaf that was not changed in the current
         transaction, then it's not logged, as at log_dir_items() and
         log_new_dir_dentries() we use btrfs_search_forward();
      
      2) It's not required by posix or any standard, it's undefined territory.
         The only way to guarantee a subdirectory is logged, it to explicitly
         fsync it;
      
      Making the behaviour guaranteed would require scanning all directory
      items, check which point to a directory, and then fsync each subdirectory
      which was modified in the current transaction. This could be very
      expensive for large directories with many subdirectories and/or large
      subdirectories.
      
      So remove that obsolete logic.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      de6bc7f5
    • Filipe Manana's avatar
      btrfs: stop copying old dir items when logging a directory · 732d591a
      Filipe Manana authored
      When logging a directory, we go over every leaf of the subvolume tree that
      was changed in the current transaction and copy all its dir index keys to
      the log tree.
      
      That includes copying dir index keys created in past transactions. This is
      done mostly for simplicity, as after logging the keys we log an item that
      specifies the start and end ranges of the keys we logged. That item is
      then used during log replay to figure out which keys need to be deleted -
      every key in that range that we find in the subvolume tree and is not in
      the log tree, needs to be deleted.
      
      Now that we log only dir index keys, and not dir item keys anymore, when
      we remove dentries from a directory (due to unlink and rename operations),
      we can get entire leaves that we changed only for deleting old dir index
      keys, or that have few dir index keys that are new - this is due to the
      fact that the offset for new index keys comes from a monotonically
      increasing counter.
      
      We can avoid logging dir index keys from past transactions, and in order
      to track the deletions, only log range items (BTRFS_DIR_LOG_INDEX_KEY key
      type) when we find gaps between consecutive index keys. This massively
      reduces the amount of logged metadata when we have deleted directory
      entries, even if it's a small percentage of the total number of entries.
      The reduction comes from both less items that are logged and instead of
      logging many dir index items (struct btrfs_dir_item), which have a size
      of 30 bytes plus a file name, we typically log just a few range items
      (struct btrfs_dir_log_item), which take only 8 bytes each.
      
      Even if no entries were deleted from a directory and only new entries
      were added, we typically still get a reduction on the amount of logged
      metadata, because it's very likely the first leaf that got the new
      dir index entries also has several old dir index entries.
      
      So change the logging logic to not log dir index keys created in past
      transactions and log a range item for every gap it finds between each
      pair of consecutive index keys, to ensure deletions are tracked and
      replayed on log replay.
      
      This patch is part of a patchset comprised of the following patches:
      
       1/4 btrfs: don't log unnecessary boundary keys when logging directory
       2/4 btrfs: put initial index value of a directory in a constant
       3/4 btrfs: stop copying old dir items when logging a directory
       4/4 btrfs: stop trying to log subdirectories created in past transactions
      
      The following test was run on a branch without this patchset and on a
      branch with the first three patches applied:
      
        $ cat test.sh
        #!/bin/bash
      
        DEV=/dev/nvme0n1
        MNT=/mnt/nvme0n1
      
        NUM_FILES=1000000
        NUM_FILE_DELETES=10000
      
        MKFS_OPTIONS="-O no-holes -R free-space-tree"
        MOUNT_OPTIONS="-o ssd"
      
        mkfs.btrfs -f $MKFS_OPTIONS $DEV
        mount $MOUNT_OPTIONS $DEV $MNT
      
        mkdir $MNT/testdir
        for ((i = 1; i <= $NUM_FILES; i++)); do
            echo -n > $MNT/testdir/file_$i
        done
      
        sync
      
        del_inc=$(( $NUM_FILES / $NUM_FILE_DELETES ))
        for ((i = 1; i <= $NUM_FILES; i += $del_inc)); do
            rm -f $MNT/testdir/file_$i
        done
      
        start=$(date +%s%N)
        xfs_io -c "fsync" $MNT/testdir
        end=$(date +%s%N)
      
        dur=$(( (end - start) / 1000000 ))
        echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
        echo
      
        umount $MNT
      
      The test was run on a non-debug kernel (Debian's default kernel config),
      and the results were the following for various values of NUM_FILES and
      NUM_FILE_DELETES:
      
      ** before, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **
      
      dir fsync took 585 ms after deleting 10000 files
      
      ** after, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **
      
      dir fsync took 34 ms after deleting 10000 files   (-94.2%)
      
      ** before, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **
      
      dir fsync took 50 ms after deleting 1000 files
      
      ** after, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **
      
      dir fsync took 7 ms after deleting 1000 files    (-86.0%)
      
      ** before, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **
      
      dir fsync took 9 ms after deleting 100 files
      
      ** after, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **
      
      dir fsync took 5 ms after deleting 100 files     (-44.4%)
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      732d591a
    • Filipe Manana's avatar
      btrfs: put initial index value of a directory in a constant · 528ee697
      Filipe Manana authored
      At btrfs_set_inode_index_count() we refer twice to the number 2 as the
      initial index value for a directory (when it's empty), with a proper
      comment explaining the reason for that value. In the next patch I'll
      have to use that magic value in the directory logging code, so put
      the value in a #define at btrfs_inode.h, to avoid hardcoding the
      magic value again at tree-log.c.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      528ee697
    • Filipe Manana's avatar
      btrfs: don't log unnecessary boundary keys when logging directory · a450a4af
      Filipe Manana authored
      Before we start to log dir index keys from a leaf, we check if there is a
      previous index key, which normally is at the end of a leaf that was not
      changed in the current transaction. Then we log that key and set the start
      of logged range (item of type BTRFS_DIR_LOG_INDEX_KEY) to the offset of
      that key. This is to ensure that if there were deleted index keys between
      that key and the first key we are going to log, those deletions are
      replayed in case we need to replay to the log after a power failure.
      However we really don't need to log that previous key, we can just set the
      start of the logged range to that key's offset plus 1. This achieves the
      same and avoids logging one dir index key.
      
      The same logic is performed when we finish logging the index keys of a
      leaf and we find that the next leaf has index keys and was not changed in
      the current transaction. We are logging the first key of that next leaf
      and use its offset as the end of range we log. This is just to ensure that
      if there were deleted index keys between the last index key we logged and
      the first key of that next leaf, those index keys are deleted if we end
      up replaying the log. However that is not necessary, we can avoid logging
      that first index key of the next leaf and instead set the end of the
      logged range to match the offset of that index key minus 1.
      
      So avoid logging those index keys at the boundaries and adjust the start
      and end offsets of the logged ranges as described above.
      
      This patch is part of a patchset comprised of the following patches:
      
        1/4 btrfs: don't log unnecessary boundary keys when logging directory
        2/4 btrfs: put initial index value of a directory in a constant
        3/4 btrfs: stop copying old dir items when logging a directory
        4/4 btrfs: stop trying to log subdirectories created in past transactions
      
      Performance test results are listed in the changelog of patch 3/4.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a450a4af
    • Sahil Kang's avatar
      btrfs: reuse existing pointers from btrfs_ioctl · dc408ccd
      Sahil Kang authored
      btrfs_ioctl already contains pointers to the inode and btrfs_root
      structs, so we can pass them into the subfunctions instead of the
      toplevel struct file.
      Signed-off-by: default avatarSahil Kang <sahil.kang@asilaycomputing.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      dc408ccd
    • Filipe Manana's avatar
      btrfs: remove write and wait of struct walk_control · c816d705
      Filipe Manana authored
      The ->write and ->wait fields of struct walk_control, used for log trees,
      are not used since 2008, more specifically since commit d0c803c4
      ("Btrfs: Record dirty pages tree-log pages in an extent_io tree") and
      since commit d0c803c4 ("Btrfs: Record dirty pages tree-log pages in
      an extent_io tree"). So just remove them, along with the function
      btrfs_write_tree_block(), which is also not used anymore after removing
      the ->write member.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c816d705
  2. 13 Mar, 2022 2 commits
  3. 12 Mar, 2022 8 commits
  4. 11 Mar, 2022 15 commits