• Filipe Manana's avatar
    btrfs: use a read/write lock for protecting the block groups tree · 16b0c258
    Filipe Manana authored
    Currently we use a spin lock to protect the red black tree that we use to
    track block groups. Most accesses to that tree are actually read only and
    for large filesystems, with thousands of block groups, it actually has
    a bad impact on performance, as concurrent read only searches on the tree
    are serialized.
    
    Read only searches on the tree are very frequent and done when:
    
    1) Pinning and unpinning extents, as we need to lookup the respective
       block group from the tree;
    
    2) Freeing the last reference of a tree block, regardless if we pin the
       underlying extent or add it back to free space cache/tree;
    
    3) During NOCOW writes, both buffered IO and direct IO, we need to check
       if the block group that contains an extent is read only or not and to
       increment the number of NOCOW writers in the block group. For those
       operations we need to search for the block group in the tree.
       Similarly, after creating the ordered extent for the NOCOW write, we
       need to decrement the number of NOCOW writers from the same block
       group, which requires searching for it in the tree;
    
    4) Decreasing the number of extent reservations in a block group;
    
    5) When allocating extents and freeing reserved extents;
    
    6) Adding and removing free space to the free space tree;
    
    7) When releasing delalloc bytes during ordered extent completion;
    
    8) When relocating a block group;
    
    9) During fitrim, to iterate over the block groups;
    
    10) etc;
    
    Write accesses to the tree, to add or remove block groups, are much less
    frequent as they happen only when allocating a new block group or when
    deleting a block group.
    
    We also use the same spin lock to protect the list of currently caching
    block groups. Additions to this list are made when we need to cache a
    block group, because we don't have a free space cache for it (or we have
    but it's invalid), and removals from this list are done when caching of
    the block group's free space finishes. These cases are also not very
    common, but when they happen, they happen only once when the filesystem
    is mounted.
    
    So switch the lock that protects the tree of block groups from a spinning
    lock to a read/write lock.
    Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    16b0c258
transaction.c 73.2 KB