1. 08 Feb, 2021 27 commits
    • Qu Wenruo's avatar
      btrfs: rework the order of btrfs_ordered_extent::flags · 3c198fe0
      Qu Wenruo authored
      [BUG]
      There is a long existing bug in the last parameter of
      btrfs_add_ordered_extent(), in commit 771ed689 ("Btrfs: Optimize
      compressed writeback and reads") back to 2008.
      
      In that ancient commit btrfs_add_ordered_extent() expects the @type
      parameter to be one of the following:
      
      - BTRFS_ORDERED_REGULAR
      - BTRFS_ORDERED_NOCOW
      - BTRFS_ORDERED_PREALLOC
      - BTRFS_ORDERED_COMPRESSED
      
      But we pass 0 in cow_file_range(), which means BTRFS_ORDERED_IO_DONE.
      
      Ironically extra check in __btrfs_add_ordered_extent() won't set the bit
      if we see (type == IO_DONE || type == IO_COMPLETE), and avoid any
      obvious bug.
      
      But this still leads to regular COW ordered extent having no bit to
      indicate its type in various trace events, rendering REGULAR bit
      useless.
      
      [FIX]
      Change the following aspects to avoid such problem:
      
      - Reorder btrfs_ordered_extent::flags
        Now the type bits go first (REGULAR/NOCOW/PREALLCO/COMPRESSED), then
        DIRECT bit, finally extra status bits like IO_DONE/COMPLETE/IOERR.
      
      - Add extra ASSERT() for btrfs_add_ordered_extent_*()
      
      - Remove @type parameter for btrfs_add_ordered_extent_compress()
        As the only valid @type here is BTRFS_ORDERED_COMPRESSED.
      
      - Remove the unnecessary special check for IO_DONE/COMPLETE in
        __btrfs_add_ordered_extent()
        This is just to make the code work, with extra ASSERT(), there are
        limited values can be passed in.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3c198fe0
    • Yang Li's avatar
      btrfs: remove redundant NULL check before kvfree · fe3b7bb0
      Yang Li authored
      Fix below warnings reported by coccicheck:
      ./fs/btrfs/raid56.c:237:2-8: WARNING: NULL check before some freeing
      functions is not needed.
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarYang Li <abaci-bugfix@linux.alibaba.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fe3b7bb0
    • Josef Bacik's avatar
      btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node · 7e2a870a
      Josef Bacik authored
      Zygo reported the following panic when testing my error handling patches
      for relocation:
      
        kernel BUG at fs/btrfs/backref.c:2545!
        invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G        W 14
        Hardware name: QEMU Standard PC (i440FX + PIIX,
      
        Call Trace:
         btrfs_backref_error_cleanup+0x4df/0x530
         build_backref_tree+0x1a5/0x700
         ? _raw_spin_unlock+0x22/0x30
         ? release_extent_buffer+0x225/0x280
         ? free_extent_buffer.part.52+0xd7/0x140
         relocate_tree_blocks+0x2a6/0xb60
         ? kasan_unpoison_shadow+0x35/0x50
         ? do_relocation+0xc10/0xc10
         ? kasan_kmalloc+0x9/0x10
         ? kmem_cache_alloc_trace+0x6a3/0xcb0
         ? free_extent_buffer.part.52+0xd7/0x140
         ? rb_insert_color+0x342/0x360
         ? add_tree_block.isra.36+0x236/0x2b0
         relocate_block_group+0x2eb/0x780
         ? merge_reloc_roots+0x470/0x470
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x18f0
         ? pvclock_clocksource_read+0xeb/0x190
         ? btrfs_relocate_chunk+0x120/0x120
         ? lock_contended+0x620/0x6e0
         ? do_raw_spin_lock+0x1e0/0x1e0
         ? do_raw_spin_unlock+0xa8/0x140
         btrfs_ioctl_balance+0x1f9/0x460
         btrfs_ioctl+0x24c8/0x4380
         ? __kasan_check_read+0x11/0x20
         ? check_chain_key+0x1f4/0x2f0
         ? __asan_loadN+0xf/0x20
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? kvm_sched_clock_read+0x18/0x30
         ? check_chain_key+0x1f4/0x2f0
         ? lock_downgrade+0x3f0/0x3f0
         ? handle_mm_fault+0xad6/0x2150
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags+0x26/0x30
         ? lock_is_held_type+0xc3/0xf0
         ? syscall_enter_from_user_mode+0x1b/0x60
         ? do_syscall_64+0x13/0x80
         ? rcu_read_lock_sched_held+0xa1/0xd0
         ? __kasan_check_read+0x11/0x20
         ? __fget_light+0xae/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurs because of this check
      
        if (RB_EMPTY_NODE(&upper->rb_node))
      	  BUG_ON(!list_empty(&node->upper));
      
      As we are dropping the backref node, if we discover that our upper node
      in the edge we just cleaned up isn't linked into the cache that we are
      now done with this node, thus the BUG_ON().
      
      However this is an erroneous assumption, as we will look up all the
      references for a node first, and then process the pending edges.  All of
      the 'upper' nodes in our pending edges won't be in the cache's rb_tree
      yet, because they haven't been processed.  We could very well have many
      edges still left to cleanup on this node.
      
      The fact is we simply do not need this check, we can just process all of
      the edges only for this node, because below this check we do the
      following
      
        if (list_empty(&upper->lower)) {
      	  list_add_tail(&upper->lower, &cache->leaves);
      	  upper->lowest = 1;
        }
      
      If the upper node truly isn't used yet, then we add it to the
      cache->leaves list to be cleaned up later.  If it is still used then the
      last child node that has it linked into its node will add it to the
      leaves list and then it will be cleaned up.
      
      Fix this problem by dropping this logic altogether.  With this fix I no
      longer see the panic when testing with error injection in the backref
      code.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7e2a870a
    • Josef Bacik's avatar
      btrfs: keep track of the root owner for relocation reads · f7ba2d37
      Josef Bacik authored
      While testing the error paths in relocation, I hit the following lockdep
      splat:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.10.0-rc3+ #206 Not tainted
        ------------------------------------------------------
        btrfs-balance/1571 is trying to acquire lock:
        ffff8cdbcc8f77d0 (&head_ref->mutex){+.+.}-{3:3}, at: btrfs_lookup_extent_info+0x156/0x3b0
      
        but task is already holding lock:
        ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (btrfs-tree-00){++++}-{3:3}:
      	 down_write_nested+0x43/0x80
      	 __btrfs_tree_lock+0x27/0x100
      	 btrfs_search_slot+0x248/0x890
      	 relocate_tree_blocks+0x490/0x650
      	 relocate_block_group+0x1ba/0x5d0
      	 kretprobe_trampoline+0x0/0x50
      
        -> #1 (btrfs-csum-01){++++}-{3:3}:
      	 down_read_nested+0x43/0x130
      	 __btrfs_tree_read_lock+0x27/0x100
      	 btrfs_read_lock_root_node+0x31/0x40
      	 btrfs_search_slot+0x5ab/0x890
      	 btrfs_del_csums+0x10b/0x3c0
      	 __btrfs_free_extent+0x49d/0x8e0
      	 __btrfs_run_delayed_refs+0x283/0x11f0
      	 btrfs_run_delayed_refs+0x86/0x220
      	 btrfs_start_dirty_block_groups+0x2ba/0x520
      	 kretprobe_trampoline+0x0/0x50
      
        -> #0 (&head_ref->mutex){+.+.}-{3:3}:
      	 __lock_acquire+0x1167/0x2150
      	 lock_acquire+0x116/0x3e0
      	 __mutex_lock+0x7e/0x7b0
      	 btrfs_lookup_extent_info+0x156/0x3b0
      	 walk_down_proc+0x1c3/0x280
      	 walk_down_tree+0x64/0xe0
      	 btrfs_drop_subtree+0x182/0x260
      	 do_relocation+0x52e/0x660
      	 relocate_tree_blocks+0x2ae/0x650
      	 relocate_block_group+0x1ba/0x5d0
      	 kretprobe_trampoline+0x0/0x50
      
        other info that might help us debug this:
      
        Chain exists of:
          &head_ref->mutex --> btrfs-csum-01 --> btrfs-tree-00
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-tree-00);
      				 lock(btrfs-csum-01);
      				 lock(btrfs-tree-00);
          lock(&head_ref->mutex);
      
         *** DEADLOCK ***
      
        5 locks held by btrfs-balance/1571:
         #0: ffff8cdb89749ff8 (&fs_info->delete_unused_bgs_mutex){+.+.}-{3:3}, at: btrfs_balance+0x563/0xf40
         #1: ffff8cdb89748838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x156/0x300
         #2: ffff8cdbc2c16650 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x413/0x5c0
         #3: ffff8cdbc135f538 (btrfs-treloc-01){+.+.}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
         #4: ffff8cdbc54adbf8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x100
      
        stack backtrace:
        CPU: 1 PID: 1571 Comm: btrfs-balance Not tainted 5.10.0-rc3+ #206
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x8b/0xb0
         check_noncircular+0xcf/0xf0
         ? trace_call_bpf+0x139/0x260
         __lock_acquire+0x1167/0x2150
         lock_acquire+0x116/0x3e0
         ? btrfs_lookup_extent_info+0x156/0x3b0
         __mutex_lock+0x7e/0x7b0
         ? btrfs_lookup_extent_info+0x156/0x3b0
         ? btrfs_lookup_extent_info+0x156/0x3b0
         ? release_extent_buffer+0x124/0x170
         ? _raw_spin_unlock+0x1f/0x30
         ? release_extent_buffer+0x124/0x170
         btrfs_lookup_extent_info+0x156/0x3b0
         walk_down_proc+0x1c3/0x280
         walk_down_tree+0x64/0xe0
         btrfs_drop_subtree+0x182/0x260
         do_relocation+0x52e/0x660
         relocate_tree_blocks+0x2ae/0x650
         ? add_tree_block+0x149/0x1b0
         relocate_block_group+0x1ba/0x5d0
         elfcorehdr_read+0x40/0x40
         ? elfcorehdr_read+0x40/0x40
         ? btrfs_balance+0x796/0xf40
         ? __kthread_parkme+0x66/0x90
         ? btrfs_balance+0xf40/0xf40
         ? balance_kthread+0x37/0x50
         ? kthread+0x137/0x150
         ? __kthread_bind_mask+0x60/0x60
         ? ret_from_fork+0x1f/0x30
      
      As you can see this is bogus, we never take another tree's lock under
      the csum lock.  This happens because sometimes we have to read tree
      blocks from disk without knowing which root they belong to during
      relocation.  We defaulted to an owner of 0, which translates to an fs
      tree.  This is fine as all fs trees have the same class, but obviously
      isn't fine if the block belongs to a COW only tree.
      
      Thankfully COW only trees only have their owners root as a reference to
      them, and since we already look up the extent information during
      relocation, go ahead and check and see if this block might belong to a
      COW only tree, and if so save the owner in the tree_block struct.  This
      allows us to read_tree_block with the proper owner, which gets rid of
      this lockdep splat.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f7ba2d37
    • Qu Wenruo's avatar
      btrfs: introduce helper to grab an existing extent buffer from a page · c0f0a9e7
      Qu Wenruo authored
      This patch will extract the code to grab an extent buffer from a page
      into a helper, grab_extent_buffer_from_page().
      
      This reduces one indent level, and provides the work place for later
      expansion for subapge support.
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c0f0a9e7
    • Qu Wenruo's avatar
      btrfs: update comment for btrfs_dirty_pages · c0fab480
      Qu Wenruo authored
      The original comment is from the initial merge, which has several
      problems:
      
      - No holes check any more
      - No inline decision is made
      
      Update the out-of-date comment with more correct one.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c0fab480
    • Qu Wenruo's avatar
      btrfs: refactor __extent_writepage_io() to improve readability · 6bc5636a
      Qu Wenruo authored
      The refactoring involves the following modifications:
      
      - iosize alignment
        In fact we don't really need to manually do alignment at all.
        All extent maps should already be aligned, thus basic ASSERT() check
        would be enough.
      
      - redundant variables
        We have extra variable like blocksize/pg_offset/end.
        They are all unnecessary.
      
        @blocksize can be replaced by sectorsize size directly, and it's only
        used to verify the em start/size is aligned.
      
        @pg_offset can be easily calculated using @cur and page_offset(page).
      
        @end is just assigned from @page_end and never modified, use
        "start + PAGE_SIZE - 1" directly and remove @page_end.
      
      - remove some BUG_ON()s
        The BUG_ON()s are for extent map, which we have tree-checker to check
        on-disk extent data item and runtime check.
        ASSERT() should be enough.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6bc5636a
    • Qu Wenruo's avatar
      btrfs: rename parameter offset to disk_bytenr in submit_extent_page · 0c64c33c
      Qu Wenruo authored
      The parameter offset is confusing, it's supposed to be the disk bytenr
      of metadata/data.  Rename it to disk_bytenr and update the comment.
      
      Also rename each offset passed to submit_extent_page() as @disk_bytenr
      so they're consistent.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0c64c33c
    • Qu Wenruo's avatar
      btrfs: refactor btrfs_dec_test_* functions for ordered extents · 58f74b22
      Qu Wenruo authored
      The refactoring involves the following modifications:
      
      - Return bool instead of int
      
      - Parameter update for @cached of btrfs_dec_test_first_ordered_pending()
        For btrfs_dec_test_first_ordered_pending(), @cached is only used to
        return the finished ordered extent.
        Rename it to @finished_ret.
      
      - Comment updates
      
        * Change one stale comment
          Which still refers to btrfs_dec_test_ordered_pending(), but the
          context is calling  btrfs_dec_test_first_ordered_pending().
        * Follow the common comment style for both functions
          Add more detailed descriptions for parameters and the return value
        * Move the reason why test_and_set_bit() is used into the call sites
      
      - Change how the return value is calculated
        The most anti-human part of the return value is:
      
          if (...)
      	ret = 1;
          ...
          return ret == 0;
      
        This means, when we set ret to 1, the function returns 0.
        Change the local variable name to @finished, and directly return the
        value of it.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      58f74b22
    • Qu Wenruo's avatar
      btrfs: make btrfs_dio_private::bytes u32 · 523929f1
      Qu Wenruo authored
      btrfs_dio_private::bytes is only assigned from bio::bi_iter::bi_size,
      which is never larger than U32.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      523929f1
    • Nikolay Borisov's avatar
      btrfs: remove always true condition in btrfs_start_delalloc_roots · d7830b71
      Nikolay Borisov authored
      Following the rework in e076ab2a ("btrfs: shrink delalloc pages
      instead of full inodes") the nr variable is no longer passed by
      reference to start_delalloc_inodes hence it cannot change. Additionally
      we are always guaranteed for it to be positive number hence it's
      redundant to have it as a condition in the loop. Simply remove that
      usage.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d7830b71
    • Nikolay Borisov's avatar
      btrfs: make btrfs_start_delalloc_root's nr argument a long · 9db4dc24
      Nikolay Borisov authored
      It's currently u64 which gets instantly translated either to LONG_MAX
      (if U64_MAX is passed) or cast to an unsigned long (which is in fact,
      wrong because writeback_control::nr_to_write is a signed, long type).
      
      Just convert the function's argument to be long time which obviates the
      need to manually convert u64 value to a long. Adjust all call sites
      which pass U64_MAX to pass LONG_MAX. Finally ensure that in
      shrink_delalloc the u64 is converted to a long without overflowing,
      resulting in a negative number.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9db4dc24
    • Filipe Manana's avatar
      btrfs: send: remove stale code when checking for shared extents · 9c4a062a
      Filipe Manana authored
      After commit 040ee612 ("Btrfs: send, improve clone range") we do not
      use anymore the data_offset field of struct backref_ctx, as after that we
      do all the necessary checks for the data offset of file extent items at
      clone_range(). Since there are no more users of data_offset from that
      structure, remove it.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      9c4a062a
    • Nikolay Borisov's avatar
      btrfs: consolidate btrfs_previous_item ret val handling in btrfs_shrink_device · 7056bf69
      Nikolay Borisov authored
      Instead of having three 'if' to handle non-NULL return value consolidate
      this in one 'if (ret)'. That way the code is more obvious:
      
       - Always drop delete_unused_bgs_mutex if ret is not NULL
       - If ret is negative -> goto done
       - If it's 1 -> reset ret to 0, release the path and finish the loop.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      7056bf69
    • Josef Bacik's avatar
      btrfs: ref-verify: make sure owner is set for all refs · 1478143a
      Josef Bacik authored
      I noticed that shared ref entries in ref-verify didn't have the proper
      owner set, which caused me to think there was something seriously wrong.
      However the problem is if we have a parent we simply weren't filling out
      the owner part of the reference, even though we have it.
      
      Fix this by making sure we set all the proper fields when we modify a
      reference, this way we'll have the proper owner if a problem happens and
      we don't waste time thinking we're updating the wrong level.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1478143a
    • Josef Bacik's avatar
      btrfs: ref-verify: pass down tree block level when building refs · 0d73a11c
      Josef Bacik authored
      I noticed that sometimes I would have the wrong level printed out with
      ref-verify while testing some error injection related problems.  This is
      because we only get the level from the main extent item, but our
      references could go off the current leaf into another, and at that point
      we lose our level.
      
      Fix this by keeping track of the last tree block level that we found,
      the same way we keep track of our bytenr and num_bytes, in case we
      happen to wander into another leaf while still processing the references
      for a bytenr.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0d73a11c
    • Josef Bacik's avatar
      btrfs: noinline btrfs_should_cancel_balance · 1fec12a5
      Josef Bacik authored
      I was attempting to reproduce a problem that Zygo hit, but my error
      injection wasn't firing for a few of the common calls to
      btrfs_should_cancel_balance.  This is because the compiler decided to
      inline it at these spots.  Keep this from happening by explicitly
      marking the function as noinline so that error injection will always
      work.
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1fec12a5
    • Josef Bacik's avatar
      btrfs: allow error injection for btrfs_search_slot and btrfs_cow_block · f75e2b79
      Josef Bacik authored
      The following patches are going to address error handling in relocation,
      in order to test those patches I need to be able to inject errors in
      btrfs_search_slot and btrfs_cow_block, as we call both of these pretty
      often in different cases during relocation.
      Reviewed-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f75e2b79
    • Nikolay Borisov's avatar
      btrfs: remove new_dirid argument from btrfs_create_subvol_root · 69948022
      Nikolay Borisov authored
      It's no longer used. While at it also remove new_dirid in create_subvol
      as it's used in a single place and open code it. No functional changes.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      69948022
    • Nikolay Borisov's avatar
      btrfs: make btrfs_root::free_objectid hold the next available objectid · 23125104
      Nikolay Borisov authored
      Adjust the way free_objectid is being initialized, it now stores
      BTRFS_FIRST_FREE_OBJECTID rather than the, somewhat arbitrary,
      BTRFS_FIRST_FREE_OBJECTID - 1. This change also has the added benefit
      that now it becomes unnecessary to explicitly initialize free_objectid
      for a newly create fs root.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      23125104
    • Nikolay Borisov's avatar
      btrfs: rename btrfs_root::highest_objectid to free_objectid · 6b8fad57
      Nikolay Borisov authored
      This reflects the true purpose of the member as it's being used solely
      in context where a new objectid is being allocated. Future changes will
      also change the way it's being used to closely follow this semantics.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6b8fad57
    • Nikolay Borisov's avatar
      btrfs: rename btrfs_find_free_objectid to btrfs_get_free_objectid · 543068a2
      Nikolay Borisov authored
      This better reflects the semantics of the function i.e no search is
      performed whatsoever.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      543068a2
    • Nikolay Borisov's avatar
      btrfs: rename btrfs_find_highest_objectid to btrfs_init_root_free_objectid · 453e4873
      Nikolay Borisov authored
      This function is used to initialize the in-memory
      btrfs_root::highest_objectid member, which is used to get an available
      objectid. Rename it to better reflect its semantics.
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      453e4873
    • Nikolay Borisov's avatar
      btrfs: cleanup local variables in btrfs_file_write_iter · 14971657
      Nikolay Borisov authored
      First replace all inode instances with a pointer to btrfs_inode. This
      removes multiple invocations of the BTRFS_I macro, subsequently remove
      2 local variables as they are called only once and simply refer to
      them directly.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      14971657
    • Zhihao Cheng's avatar
      btrfs: clarify error returns values in __load_free_space_cache · 3cc64e7e
      Zhihao Cheng authored
      Return value in __load_free_space_cache is not properly set after
      (unlikely) memory allocation failures and 0 is returned instead.
      This is not a problem for the caller load_free_space_cache because only
      value 1 is considered as 'cache loaded' but for clarity it's better
      to set the errors accordingly.
      
      Fixes: a67509c3 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3cc64e7e
    • Josef Bacik's avatar
      btrfs: fix error handling in commit_fs_roots · 4f4317c1
      Josef Bacik authored
      While doing error injection I would sometimes get a corrupt file system.
      This is because I was injecting errors at btrfs_search_slot, but would
      only do it one time per stack.  This uncovered a problem in
      commit_fs_roots, where if we get an error we would just break.  However
      we're in a nested loop, the first loop being a loop to find all the
      dirty fs roots, and then subsequent root updates would succeed clearing
      the error value.
      
      This isn't likely to happen in real scenarios, however we could
      potentially get a random ENOMEM once and then not again, and we'd end up
      with a corrupted file system.  Fix this by moving the error checking
      around a bit to the main loop, as this is the only place where something
      will fail, and return the error as soon as it occurs.
      
      With this patch my reproducer no longer corrupts the file system.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4f4317c1
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · e0756cfc
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Fix output of top level event tracing 'enable' file.
      
        When writing a tool for enabling events in the tracing system, an
        anomaly was discovered. The top level event 'enable' file would never
        show '1' when all events were enabled.
      
        The system and event 'enable' files worked as expected.
      
        The reason was because the top level event 'enable' file included the
        'ftrace' tracer events, which are not controlled by the 'enable' file
        and would cause the output to be wrong. This appears to have been a
        bug since it was created"
      
      * tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Do not count ftrace events in top level enable output
      e0756cfc
  2. 07 Feb, 2021 9 commits
    • Linus Torvalds's avatar
      Linux 5.11-rc7 · 92bf2261
      Linus Torvalds authored
      92bf2261
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · b75dba7f
      Linus Torvalds authored
      Pull libnvdimm fixes from Dan Williams:
       "A fix for a crash scenario that has been present since the initial
        merge, a minor regression in sysfs attribute visibility, and a fix for
        some flexible array warnings.
      
        The bulk of this pull is an update to the libnvdimm unit test
        infrastructure to test non-ACPI platforms. Given there is zero
        regression risk for test updates, and the tests enable validation of
        bits headed towards the next merge window, I saw no reason to hold the
        new tests back. Santosh originally submitted this before the v5.11
        window opened.
      
        Summary:
      
         - Fix a crash when sysfs accesses race 'dimm' driver probe/remove.
      
         - Fix a regression in 'resource' attribute visibility necessary for
           mapping badblocks and other physical address interrogations.
      
         - Fix some flexible array warnings
      
         - Expand the unit test infrastructure for non-ACPI platforms"
      
      * tag 'libnvdimm-fixes-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm/dimm: Avoid race between probe and available_slots_show()
        ndtest: Add papr health related flags
        ndtest: Add nvdimm control functions
        ndtest: Add regions and mappings to the test buses
        ndtest: Add dimm attributes
        ndtest: Add dimms to the two buses
        ndtest: Add compatability string to treat it as PAPR family
        testing/nvdimm: Add test module for non-nfit platforms
        libnvdimm/namespace: Fix visibility of namespace resource attribute
        libnvdimm/pmem: Remove unused header
        ACPI: NFIT: Fix flexible_array.cocci warnings
      b75dba7f
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping · ff92acb2
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix a 32 vs 64-bit padding issue in the new benchmark code (Barry
        Song)"
      
      * tag 'dma-mapping-5.11-2' of git://git.infradead.org/users/hch/dma-mapping:
        dma-mapping: benchmark: use u8 for reserved field in uAPI structure
      ff92acb2
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · fc6c0ae5
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Prevent device managed IRQ allocation helpers from returning IRQ 0
      
       - A fix for MSI activation of PCI endpoints with multiple MSIs
      
      * tag 'irq_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Prevent [devm_]irq_alloc_desc from returning irq 0
        genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set
      fc6c0ae5
    • Linus Torvalds's avatar
      Merge tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c6792d44
      Linus Torvalds authored
      Pull syscall entry fixes from Borislav Petkov:
      
       - For syscall user dispatch, separate prctl operation from syscall
         redirection range specification before the API has been made official
         in 5.11.
      
       - Ensure tasks using the generic syscall code do trap after returning
         from a syscall when single-stepping is requested.
      
      * tag 'core_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        entry: Use different define for selector variable in SUD
        entry: Ensure trap after single-step on system call return
      c6792d44
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6fed85df
      Linus Torvalds authored
      Pull scheduler fix from Borislav Petkov:
       "Revert an attempt to not spread IRQ threads on isolated CPUs which has
        a bunch of problems"
      
      * tag 'sched_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        Revert "lib: Restrict cpumask_local_spread to houskeeping CPUs"
      6fed85df
    • Linus Torvalds's avatar
      Merge tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 814daadb
      Linus Torvalds authored
      Pull timer fixes from Borislav Petkov:
       "Two more timers-related fixes for v5.11:
      
         - Use a freezable workqueue for RTC sync because the sync can happen
           at any time and trigger suspend assertion checks in the i2c
           subsystem.
      
         - Correct a previous RTC validation change to check only bit 6 in
           register D because some Intel machines use bits 0-5"
      
      * tag 'timers_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        ntp: Use freezable workqueue for RTC synchronization
        rtc: mc146818: Dont test for bit 0-5 in Register D
      814daadb
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e24f9c5f
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "I hope this is the last batch of x86/urgent updates for this round:
      
         - Remove superfluous EFI PGD range checks which lead to those
           assertions failing with certain kernel configs and LLVM.
      
         - Disable setting breakpoints on facilities involved in #DB exception
           handling to avoid infinite loops.
      
         - Add extra serialization to non-serializing MSRs (IA32_TSC_DEADLINE
           and x2 APIC MSRs) to adhere to SDM's recommendation and avoid any
           theoretical issues.
      
         - Re-add the EPB MSR reading on turbostat so that it works on older
           kernels which don't have the corresponding EPB sysfs file.
      
         - Add Alder Lake to the list of CPUs which support split lock.
      
         - Fix %dr6 register handling in order to be able to set watchpoints
           with gdb again.
      
         - Disable CET instrumentation in the kernel so that gcc doesn't add
           ENDBR64 to kernel code and thus confuse tracing"
      
      * tag 'x86_urgent_for_v5.11_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/efi: Remove EFI PGD build time checks
        x86/debug: Prevent data breakpoints on cpu_dr7
        x86/debug: Prevent data breakpoints on __per_cpu_offset
        x86/apic: Add extra serialization for non-serializing MSRs
        tools/power/turbostat: Fallback to an MSR read for EPB
        x86/split_lock: Enable the split lock feature on another Alder Lake CPU
        x86/debug: Fix DR6 handling
        x86/build: Disable CET instrumentation in the kernel
      e24f9c5f
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.11-2' of... · 2db138bb
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Use the 'python3' command to invoke python scripts because some
         distributions do not provide the 'python' command any more.
      
       - Clean-up and update documents
      
       - Use pkg-config to search libcrypto
      
       - Fix duplicated debug flags
      
       - Ignore some more stubs in scripts/kallsyms.c
      
      * tag 'kbuild-fixes-v5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kallsyms: fix nonconverging kallsyms table with lld
        kbuild: fix duplicated flags in DEBUG_CFLAGS
        scripts/clang-tools: switch explicitly to Python 3
        kbuild: remove PYTHON variable
        Documentation/llvm: Add a section about supported architectures
        Revert "checkpatch: add check for keyword 'boolean' in Kconfig definitions"
        scripts: use pkg-config to locate libcrypto
        kconfig: mconf: fix HOSTCC call
        doc: gcc-plugins: update gcc-plugins.rst
        kbuild: simplify GCC_PLUGINS enablement in dummy-tools/gcc
        Documentation/Kbuild: Remove references to gcc-plugin.sh
        scripts: switch explicitly to Python 3
      2db138bb
  3. 06 Feb, 2021 4 commits
    • Linus Torvalds's avatar
      Merge tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 825b5991
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Three small smb3 fixes for stable"
      
      * tag '5.11-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: report error instead of invalid when revalidating a dentry fails
        smb3: fix crediting for compounding when only one request in flight
        smb3: Fix out-of-bounds bug in SMB2_negotiate()
      825b5991
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · f7455e5d
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
       "A handful of fixes for this week:
      
         - A fix to avoid evalating the VA twice in virt_addr_valid, which
           fixes some WARNs under DEBUG_VIRTUAL.
      
         - Two fixes related to STRICT_KERNEL_RWX: one that fixes some
           permissions when strict is disabled, and one to fix some alignment
           issues when strict is enabled.
      
         - A fix to disallow the selection of MAXPHYSMEM_2GB on RV32, which
           isn't valid any more but may still show up in some oldconfigs.
      
        We still have the HiFive Unleashed ethernet phy reset regression, so
        there will likely be something coming next week"
      
      * tag 'riscv-for-linus-5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Define MAXPHYSMEM_1GB only for RV32
        riscv: Align on L1_CACHE_BYTES when STRICT_KERNEL_RWX
        RISC-V: Fix .init section permission update
        riscv: virt_addr_valid must check the address belongs to linear mapping
      f7455e5d
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · f06279ea
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - A fix for a change we made to __kernel_sigtramp_rt64() which confused
         glibc's backtrace logic, and also changed the semantics of that
         symbol, which was arguably an ABI break.
      
       - A fix for a stack overwrite in our VSX instruction emulation.
      
       - A couple of fixes for the Makefile logic in the new C VDSO.
      
      Thanks to Masahiro Yamada, Naveen N.  Rao, Raoni Fassina Firmino, and
      Ravi Bangoria.
      
      * tag 'powerpc-5.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64/signal: Fix regression in __kernel_sigtramp_rt64() semantics
        powerpc/vdso64: remove meaningless vgettimeofday.o build rule
        powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o
        powerpc/sstep: Fix array out of bound warning
      f06279ea
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 4a7859ea
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
      
       - Fix latent bug with DC21285 (Footbridge PCI bridge) configuration
         accessors that affects GCC >= 4.9.2
      
       - Fix misplaced tegra_uart_config in decompressor
      
       - Ensure signal page contents are initialised
      
       - Fix kexec oops
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: kexec: fix oops after TLB are invalidated
        ARM: ensure the signal page contains defined contents
        ARM: 9043/1: tegra: Fix misplaced tegra_uart_config in decompressor
        ARM: footbridge: fix dc21285 PCI configuration accessors
      4a7859ea