• Filipe Manana's avatar
    btrfs: fix race setting file private on concurrent lseek using same fd · 7ee85f55
    Filipe Manana authored
    When doing concurrent lseek(2) system calls against the same file
    descriptor, using multiple threads belonging to the same process, we have
    a short time window where a race happens and can result in a memory leak.
    
    The race happens like this:
    
    1) A program opens a file descriptor for a file and then spawns two
       threads (with the pthreads library for example), lets call them
       task A and task B;
    
    2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at
       file.c:find_desired_extent() while holding a read lock on the inode;
    
    3) At the start of find_desired_extent(), it extracts the file's
       private_data pointer into a local variable named 'private', which has
       a value of NULL;
    
    4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode
       in shared mode and enters file.c:find_desired_extent(), where it also
       extracts file->private_data into its local variable 'private', which
       has a NULL value;
    
    5) Because it saw a NULL file private, task A allocates a private
       structure and assigns to the file structure;
    
    6) Task B also saw a NULL file private so it also allocates its own file
       private and then assigns it to the same file structure, since both
       tasks are using the same file descriptor.
    
       At this point we leak the private structure allocated by task A.
    
    Besides the memory leak, there's also the detail that both tasks end up
    using the same cached state record in the private structure (struct
    btrfs_file_private::llseek_cached_state), which can result in a
    use-after-free problem since one task can free it while the other is
    still using it (only one task took a reference count on it). Also, sharing
    the cached state is not a good idea since it could result in incorrect
    results in the future - right now it should not be a problem because it
    end ups being used only in extent-io-tree.c:count_range_bits() where we do
    range validation before using the cached state.
    
    Fix this by protecting the private assignment and check of a file while
    holding the inode's spinlock and keep track of the task that allocated
    the private, so that it's used only by that task in order to prevent
    user-after-free issues with the cached state record as well as potentially
    using it incorrectly in the future.
    
    Fixes: 3c32c721 ("btrfs: use cached state when looking for delalloc ranges with lseek")
    CC: stable@vger.kernel.org # 6.6+
    Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    7ee85f55
file.c 107 KB