1. 01 Oct, 2024 2 commits
    • Johannes Thumshirn's avatar
      btrfs: also add stripe entries for NOCOW writes · 97f97822
      Johannes Thumshirn authored
      NOCOW writes do not generate stripe_extent entries in the RAID stripe
      tree, as the RAID stripe-tree feature initially was designed with a
      zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW
      writes. But the RAID stripe-tree feature is independent from the zoned
      feature, so we must also do NOCOW writes for RAID stripe-tree filesystems.
      Reviewed-by: default avatarNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      97f97822
    • Filipe Manana's avatar
      btrfs: send: fix buffer overflow detection when copying path to cache entry · 96c6ca71
      Filipe Manana authored
      Starting with commit c0247d28 ("btrfs: send: annotate struct
      name_cache_entry with __counted_by()") we annotated the variable length
      array "name" from the name_cache_entry structure with __counted_by() to
      improve overflow detection. However that alone was not correct, because
      the length of that array does not match the "name_len" field - it matches
      that plus 1 to include the NUL string terminator, so that makes a
      fortified kernel think there's an overflow and report a splat like this:
      
        strcpy: detected buffer overflow: 20 byte write of buffer size 19
        WARNING: CPU: 3 PID: 3310 at __fortify_report+0x45/0x50
        CPU: 3 UID: 0 PID: 3310 Comm: btrfs Not tainted 6.11.0-prnet #1
        Hardware name: CompuLab Ltd.  sbc-ihsw/Intense-PC2 (IPC2), BIOS IPC2_3.330.7 X64 03/15/2018
        RIP: 0010:__fortify_report+0x45/0x50
        Code: 48 8b 34 (...)
        RSP: 0018:ffff97ebc0d6f650 EFLAGS: 00010246
        RAX: 7749924ef60fa600 RBX: ffff8bf5446a521a RCX: 0000000000000027
        RDX: 00000000ffffdfff RSI: ffff97ebc0d6f548 RDI: ffff8bf84e7a1cc8
        RBP: ffff8bf548574080 R08: ffffffffa8c40e10 R09: 0000000000005ffd
        R10: 0000000000000004 R11: ffffffffa8c70e10 R12: ffff8bf551eef400
        R13: 0000000000000000 R14: 0000000000000013 R15: 00000000000003a8
        FS:  00007fae144de8c0(0000) GS:ffff8bf84e780000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fae14691690 CR3: 00000001027a2003 CR4: 00000000001706f0
        Call Trace:
         <TASK>
         ? __warn+0x12a/0x1d0
         ? __fortify_report+0x45/0x50
         ? report_bug+0x154/0x1c0
         ? handle_bug+0x42/0x70
         ? exc_invalid_op+0x1a/0x50
         ? asm_exc_invalid_op+0x1a/0x20
         ? __fortify_report+0x45/0x50
         __fortify_panic+0x9/0x10
        __get_cur_name_and_parent+0x3bc/0x3c0
         get_cur_path+0x207/0x3b0
         send_extent_data+0x709/0x10d0
         ? find_parent_nodes+0x22df/0x25d0
         ? mas_nomem+0x13/0x90
         ? mtree_insert_range+0xa5/0x110
         ? btrfs_lru_cache_store+0x5f/0x1e0
         ? iterate_extent_inodes+0x52d/0x5a0
         process_extent+0xa96/0x11a0
         ? __pfx_lookup_backref_cache+0x10/0x10
         ? __pfx_store_backref_cache+0x10/0x10
         ? __pfx_iterate_backrefs+0x10/0x10
         ? __pfx_check_extent_item+0x10/0x10
         changed_cb+0x6fa/0x930
         ? tree_advance+0x362/0x390
         ? memcmp_extent_buffer+0xd7/0x160
         send_subvol+0xf0a/0x1520
         btrfs_ioctl_send+0x106b/0x11d0
         ? __pfx___clone_root_cmp_sort+0x10/0x10
         _btrfs_ioctl_send+0x1ac/0x240
         btrfs_ioctl+0x75b/0x850
         __se_sys_ioctl+0xca/0x150
         do_syscall_64+0x85/0x160
         ? __count_memcg_events+0x69/0x100
         ? handle_mm_fault+0x1327/0x15c0
         ? __se_sys_rt_sigprocmask+0xf1/0x180
         ? syscall_exit_to_user_mode+0x75/0xa0
         ? do_syscall_64+0x91/0x160
         ? do_user_addr_fault+0x21d/0x630
        entry_SYSCALL_64_after_hwframe+0x76/0x7e
        RIP: 0033:0x7fae145eeb4f
        Code: 00 48 89 (...)
        RSP: 002b:00007ffdf1cb09b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fae145eeb4f
        RDX: 00007ffdf1cb0ad0 RSI: 0000000040489426 RDI: 0000000000000004
        RBP: 00000000000078fe R08: 00007fae144006c0 R09: 00007ffdf1cb0927
        R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffdf1cb1ce8
        R13: 0000000000000003 R14: 000055c499fab2e0 R15: 0000000000000004
         </TASK>
      
      Fix this by not storing the NUL string terminator since we don't actually
      need it for name cache entries, this way "name_len" corresponds to the
      actual size of the "name" array. This requires marking the "name" array
      field with __nonstring and using memcpy() instead of strcpy() as
      recommended by the guidelines at:
      
         https://github.com/KSPP/linux/issues/90Reported-by: default avatarDavid Arendt <admin@prnet.org>
      Link: https://lore.kernel.org/linux-btrfs/cee4591a-3088-49ba-99b8-d86b4242b8bd@prnet.org/
      Fixes: c0247d28 ("btrfs: send: annotate struct name_cache_entry with __counted_by()")
      CC: stable@vger.kernel.org # 6.11
      Tested-by: default avatarDavid Arendt <admin@prnet.org>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      96c6ca71
  2. 17 Sep, 2024 3 commits
    • Filipe Manana's avatar
      btrfs: fix use-after-free on rbtree that tracks inodes for auto defrag · 7f1b63f9
      Filipe Manana authored
      When cleaning up defrag inodes at btrfs_cleanup_defrag_inodes(), called
      during remount and unmount, we are freeing every node from the rbtree
      that tracks inodes for auto defrag using
      rbtree_postorder_for_each_entry_safe(), which doesn't modify the tree
      itself. So once we unlock the lock that protects the rbtree, we have a
      tree pointing to a root that was freed (and a root pointing to freed
      nodes, and their children pointing to other freed nodes, and so on).
      This makes further access to the tree result in a use-after-free with
      unpredictable results.
      
      Fix this by initializing the rbtree to an empty root after the call to
      rbtree_postorder_for_each_entry_safe() and before unlocking.
      
      Fixes: 27694091 ("btrfs: clear defragmented inodes using postorder in btrfs_cleanup_defrag_inodes()")
      Reported-by: syzbot+ad7966ca1f5dd8b001b3@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/linux-btrfs/000000000000f9aad406223eabff@google.com/Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7f1b63f9
    • Qu Wenruo's avatar
      btrfs: tree-checker: fix the wrong output of data backref objectid · b0b595e6
      Qu Wenruo authored
      [BUG]
      There are some reports about invalid data backref objectids, the report
      looks like this:
      
        BTRFS critical (device sda): corrupt leaf: block=333654787489792 slot=110 extent bytenr=333413935558656 len=65536 invalid data ref objectid value 2543
      
      The data ref objectid is the inode number inside the subvolume.
      
      But in above case, the value is completely sane, not really showing the
      problem.
      
      [CAUSE]
      The root cause of the problem is the deprecated feature, inode cache.
      
      This feature results a special inode number, -12ULL, and it's no longer
      recognized by tree-checker, triggering the error.
      
      The direct problem here is the output of data ref objectid. The value
      shown is in fact the dref_root (subvolume id), not the dref_objectid
      (inode number).
      
      [FIX]
      Fix the output to use dref_objectid instead.
      Reported-by: default avatarNeil Parton <njparton@gmail.com>
      Reported-by: default avatarArchange <archange@archlinux.org>
      Link: https://lore.kernel.org/linux-btrfs/CAAYHqBbrrgmh6UmW3ANbysJX9qG9Pbg3ZwnKsV=5mOpv_qix_Q@mail.gmail.com/
      Link: https://lore.kernel.org/linux-btrfs/9541deea-9056-406e-be16-a996b549614d@archlinux.org/
      Fixes: f333a3c7 ("btrfs: tree-checker: validate dref root and objectid")
      CC: stable@vger.kernel.org # 6.11
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b0b595e6
    • Filipe Manana's avatar
      btrfs: fix race setting file private on concurrent lseek using same fd · 7ee85f55
      Filipe Manana authored
      When doing concurrent lseek(2) system calls against the same file
      descriptor, using multiple threads belonging to the same process, we have
      a short time window where a race happens and can result in a memory leak.
      
      The race happens like this:
      
      1) A program opens a file descriptor for a file and then spawns two
         threads (with the pthreads library for example), lets call them
         task A and task B;
      
      2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at
         file.c:find_desired_extent() while holding a read lock on the inode;
      
      3) At the start of find_desired_extent(), it extracts the file's
         private_data pointer into a local variable named 'private', which has
         a value of NULL;
      
      4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode
         in shared mode and enters file.c:find_desired_extent(), where it also
         extracts file->private_data into its local variable 'private', which
         has a NULL value;
      
      5) Because it saw a NULL file private, task A allocates a private
         structure and assigns to the file structure;
      
      6) Task B also saw a NULL file private so it also allocates its own file
         private and then assigns it to the same file structure, since both
         tasks are using the same file descriptor.
      
         At this point we leak the private structure allocated by task A.
      
      Besides the memory leak, there's also the detail that both tasks end up
      using the same cached state record in the private structure (struct
      btrfs_file_private::llseek_cached_state), which can result in a
      use-after-free problem since one task can free it while the other is
      still using it (only one task took a reference count on it). Also, sharing
      the cached state is not a good idea since it could result in incorrect
      results in the future - right now it should not be a problem because it
      end ups being used only in extent-io-tree.c:count_range_bits() where we do
      range validation before using the cached state.
      
      Fix this by protecting the private assignment and check of a file while
      holding the inode's spinlock and keep track of the task that allocated
      the private, so that it's used only by that task in order to prevent
      user-after-free issues with the cached state record as well as potentially
      using it incorrectly in the future.
      
      Fixes: 3c32c721 ("btrfs: use cached state when looking for delalloc ranges with lseek")
      CC: stable@vger.kernel.org # 6.6+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7ee85f55
  3. 10 Sep, 2024 35 commits