1. 31 Mar, 2018 1 commit
  2. 30 Mar, 2018 39 commits
    • Qu Wenruo's avatar
      btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595
      Qu Wenruo authored
      Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
      preallocated meta rsv, making it quite easy to underflow qgroup meta
      reservation.
      
      Since we have the new qgroup meta rsv types, apply it to delalloc
      reservation.
      
      Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
      rsv type.
      
      And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
      they will convert corresponding META_PREALLOC reservation to
      META_PERTRANS.
      
      This is mainly due to the fact that current qgroup numbers will only be
      updated in btrfs_commit_transaction(), that's to say if we don't keep
      such placeholder reservation, we can exceed qgroup limitation.
      
      And for callers freeing outstanding extent in error handler, we will
      just free META_PREALLOC bytes.
      
      This behavior makes callers of btrfs_qgroup_release_meta() or
      btrfs_qgroup_convert_meta() to be aware of which type they are.
      So in this patch, btrfs_delalloc_release_metadata() and its callers get
      an extra parameter to info qgroup to do correct meta convert/release.
      
      The good news is, even we use the wrong type (convert or free), it won't
      cause obvious bug, as prealloc type is always in good shape, and the
      type only affects how per-trans meta is increased or not.
      
      So the worst case will be at most metadata limitation can be sometimes
      exceeded (no convert at all) or metadata limitation is reached too soon
      (no free at all).
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      43b18595
    • Qu Wenruo's avatar
      btrfs: qgroup: Introduce function to convert META_PREALLOC into META_PERTRANS · 64cfaef6
      Qu Wenruo authored
      For meta_prealloc reservation users, after btrfs_join_transaction()
      caller will modify tree so part (or even all) meta_prealloc reservation
      should be converted to meta_pertrans until transaction commit time.
      
      This patch introduces a new function,
      btrfs_qgroup_convert_reserved_meta() to do this for META_PREALLOC
      reservation user.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      64cfaef6
    • Qu Wenruo's avatar
      btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup · e1211d0e
      Qu Wenruo authored
      Since qgroup has seperate metadata reservation types now, we can
      completely get rid of the old root->qgroup_meta_rsv, which mostly acts
      as current META_PERTRANS reservation type.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      e1211d0e
    • Qu Wenruo's avatar
      btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans · 733e03a0
      Qu Wenruo authored
      Btrfs uses 2 different methods to reseve metadata qgroup space.
      
      1) Reserve at btrfs_start_transaction() time
         This is quite straightforward, caller will use the trans handler
         allocated to modify b-trees.
      
         In this case, reserved metadata should be kept until qgroup numbers
         are updated.
      
      2) Reserve by using block_rsv first, and later btrfs_join_transaction()
         This is more complicated, caller will reserve space using block_rsv
         first, and then later call btrfs_join_transaction() to get a trans
         handle.
      
         In this case, before we modify trees, the reserved space can be
         modified on demand, and after btrfs_join_transaction(), such reserved
         space should also be kept until qgroup numbers are updated.
      
      Since these two types behave differently, split the original "META"
      reservation type into 2 sub-types:
      
        META_PERTRANS:
          For above case 1)
      
        META_PREALLOC:
          For reservations that happened before btrfs_join_transaction() of
          case 2)
      
      NOTE: This patch will only convert existing qgroup meta reservation
      callers according to its situation, not ensuring all callers are at
      correct timing.
      Such fix will be added in later patches.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      [ update comments ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      733e03a0
    • Qu Wenruo's avatar
      btrfs: qgroup: Cleanup the remaining old reservation counters · 5c40507f
      Qu Wenruo authored
      So qgroup is switched to new separate types reservation system.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5c40507f
    • Qu Wenruo's avatar
      64ee4e75
    • Qu Wenruo's avatar
      btrfs: qgroup: Fix wrong qgroup reservation update for relationship modification · 429d6275
      Qu Wenruo authored
      When modifying qgroup relationship, for qgroup which only owns exclusive
      extents, we will go through quick update path.
      
      In this path, we will add/subtract exclusive and reference number for
      parent qgroup, since the source (child) qgroup only has exclusive
      extents, destination (parent) qgroup will also own or lose those extents
      exclusively.
      
      The same should be the same for reservation, since later reservation
      adding/releasing will also affect parent qgroup, without the reservation
      carried from child, parent will underflow reservation or have dead
      reservation which will never be freed.
      
      However original code doesn't do the same thing for reservation.
      It handles qgroup reservation quite differently:
      
      It removes qgroup reservation, as it's allocating space from the
      reserved qgroup for relationship adding.
      But does nothing for qgroup reservation if we're removing a qgroup
      relationship.
      
      According to the original code, it looks just like because we're adding
      qgroup->rfer, the code assumes we're writing new data, so it's follows
      the normal write routine, by reducing qgroup->reserved and adding
      qgroup->rfer/excl.
      
      This old behavior is wrong, and should be fixed to follow the same
      excl/rfer behavior.
      
      Just fix it by using the correct behavior described above.
      
      Fixes: 31193213 ("Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use.")
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      429d6275
    • Qu Wenruo's avatar
      btrfs: qgroup: Make qgroup_reserve and its callers to use separate reservation type · dba21324
      Qu Wenruo authored
      Since most callers of qgroup_reserve() are already defined by type,
      converting qgroup_reserve() is quite an easy work.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      dba21324
    • Qu Wenruo's avatar
      btrfs: qgroup: Introduce helpers to update and access new qgroup rsv · f59c0347
      Qu Wenruo authored
      Introduce helpers to:
      
      1) Get total reserved space
         For limit calculation
      2) Add/release reserved space for given type
         With underflow detection and warning
      3) Add/release reserved space according to child qgroup
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f59c0347
    • Qu Wenruo's avatar
      btrfs: qgroup: Skeleton to support separate qgroup reservation type · d4e5c920
      Qu Wenruo authored
      Instead of single qgroup->reserved, use a new structure btrfs_qgroup_rsv
      to store different types of reservation.
      
      This patch only updates the header needed to compile.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d4e5c920
    • Omar Sandoval's avatar
      Btrfs: delete dead code in btrfs_orphan_add() · 0a0d4415
      Omar Sandoval authored
      btrfs_orphan_add() has had this case commented out since it was first
      introduced in commit d68fc57b ("Btrfs: Metadata reservation for
      orphan inodes"). Most of the orphan cleanup code has been rewritten
      since then, so it's safe to say that this code isn't needed.
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      [ switch to bool ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0a0d4415
    • Misono, Tomohiro's avatar
    • Jeff Mahoney's avatar
      btrfs: defer adding raid type kobject until after chunk relocation · 75cb379d
      Jeff Mahoney authored
      Any time the first block group of a new type is created, we add a new
      kobject to sysfs to hold the attributes for that type.  Kobject-internal
      allocations always use GFP_KERNEL, making them prone to fs-reclaim races.
      While it appears as if this can occur any time a block group is created,
      the only times the first block group of a new type can be created in
      memory is at mount and when we create the first new block group during
      raid conversion.
      
      This patch adds a new list to track pending kobject additions and then
      handles them after we do chunk relocation.  Between relocating the
      target chunk (or forcing allocation of a new chunk in the case of data)
      and removing the old chunk, we're in a safe place for fs-reclaim to
      occur.  We're holding the volume mutex, which is already held across
      page faults, and the delete_unused_bgs_mutex, which will only stall
      the cleaner thread.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      75cb379d
    • Jeff Mahoney's avatar
      btrfs: remove dead create_space_info calls · dc2d3005
      Jeff Mahoney authored
      Since commit 2be12ef7 (btrfs: Separate space_info create/update), we've
      separated out the creation and updating of the space info structures.
      That commit was a straightforward refactoring of the two parts of
      update_space_info, but we can go a step further.  Since commits
      c59021f8 (Btrfs: fix OOPS of empty filesystem after balance) and
      b742bb82 (Btrfs: Link block groups of different raid types), we know
      that the space_info structures will be created at mount and there will
      only ever be, at most, three of them.
      
      This patch cleans out the create_space_info calls after __find_space_info
      returns NULL since __find_space_info *can't* return NULL.
      
      The initial cause for reviewing this was the kobject_add calls from
      create_space_info occuring in sites where fs-reclaim wasn't allowed.  Now
      we are certain they occur only early in the mount process and are safe.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      dc2d3005
    • Liu Bo's avatar
      Btrfs: replace: cache rbio when rebuild data on missing device · 580c6efa
      Liu Bo authored
      Rebuild on missing device is as same as recover, after it's done, rbio
      has data which is consistent with on-disk data, so it can be cached to
      avoid further reads.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      580c6efa
    • Jeff Mahoney's avatar
      btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers · 8a5a916d
      Jeff Mahoney authored
      While running btrfs/011, I hit the following lockdep splat.
      
      This is the important bit:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
      
      The percpu_counter_init call in btrfs_alloc_subvolume_writers
      uses GFP_KERNEL, which we can't do during transaction commit.
      
      This switches it to GFP_NOFS.
      
      ========================================================
      WARNING: possible irq lock inversion dependency detected
      4.12.14-kvmsmall #8 Tainted: G        W
      --------------------------------------------------------
      kswapd0/50 just changed the state of lock:
       (&delayed_node->mutex){+.+.-.}, at: [<ffffffffc06994fa>] __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
      but this lock took another, RECLAIM_FS-unsafe lock in the past:
       (pcpu_alloc_mutex){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> pcpu_alloc_mutex
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(pcpu_alloc_mutex);
                                     local_irq_disable();
                                     lock(&delayed_node->mutex);
                                     lock(&found->groups_sem);
        <Interrupt>
          lock(&delayed_node->mutex);
      
       *** DEADLOCK ***
      
      2 locks held by kswapd0/50:
       #0:  (shrinker_rwsem){++++..}, at: [<ffffffff811dc11f>] shrink_slab+0x7f/0x5b0
       #1:  (&type->s_umount_key#30){+++++.}, at: [<ffffffff8126dec6>] trylock_super+0x16/0x50
      
      the shortest dependencies between 2nd lock and 1st lock:
         -> (pcpu_alloc_mutex){+.+.+.} ops: 4904 {
            HARDIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            SOFTIRQ-ON-W at:
                                __mutex_lock+0x4e/0x8c0
                                pcpu_alloc+0x1ac/0x5e0
                                alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                __do_tune_cpucache+0x2c/0x220
                                do_tune_cpucache+0x26/0xc0
                                enable_cpucache+0x6d/0xf0
                                kmem_cache_init_late+0x42/0x75
                                start_kernel+0x343/0x4cb
                                x86_64_start_kernel+0x127/0x134
                                secondary_startup_64+0xa5/0xb0
            RECLAIM_FS-ON-W at:
                                   __kmalloc+0x47/0x310
                                   pcpu_extend_area_map+0x2b/0xc0
                                   pcpu_alloc+0x3ec/0x5e0
                                   alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                                   __do_tune_cpucache+0x2c/0x220
                                   do_tune_cpucache+0x26/0xc0
                                   enable_cpucache+0x6d/0xf0
                                   __kmem_cache_create+0x1bf/0x390
                                   create_cache+0xba/0x1b0
                                   kmem_cache_create+0x1f8/0x2b0
                                   ksm_init+0x6f/0x19d
                                   do_one_initcall+0x50/0x1b0
                                   kernel_init_freeable+0x201/0x289
                                   kernel_init+0xa/0x100
                                   ret_from_fork+0x3a/0x50
            INITIAL USE at:
                               __mutex_lock+0x4e/0x8c0
                               pcpu_alloc+0x1ac/0x5e0
                               alloc_kmem_cache_cpus.isra.70+0x25/0xa0
                               setup_cpu_cache+0x2f/0x1f0
                               __kmem_cache_create+0x1bf/0x390
                               create_boot_cache+0x8b/0xb1
                               kmem_cache_init+0xa1/0x19e
                               start_kernel+0x270/0x4cb
                               x86_64_start_kernel+0x127/0x134
                               secondary_startup_64+0xa5/0xb0
          }
          ... key      at: [<ffffffff821d8e70>] pcpu_alloc_mutex+0x70/0xa0
          ... acquired at:
         pcpu_alloc+0x1ac/0x5e0
         __percpu_counter_init+0x4e/0xb0
         btrfs_init_fs_root+0x99/0x1c0 [btrfs]
         btrfs_get_fs_root.part.54+0x5b/0x150 [btrfs]
         resolve_indirect_refs+0x130/0x830 [btrfs]
         find_parent_nodes+0x69e/0xff0 [btrfs]
         btrfs_find_all_roots_safe+0xa0/0x110 [btrfs]
         btrfs_find_all_roots+0x50/0x70 [btrfs]
         btrfs_qgroup_prepare_account_extents+0x53/0x90 [btrfs]
         btrfs_commit_transaction+0x3ce/0x9b0 [btrfs]
         transaction_kthread+0x176/0x1b0 [btrfs]
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
        -> (&fs_info->commit_root_sem){++++..} ops: 1566382 {
           HARDIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           HARDIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           SOFTIRQ-ON-W at:
                              down_write+0x3e/0xa0
                              cache_block_group+0x287/0x420 [btrfs]
                              find_free_extent+0x106c/0x12d0 [btrfs]
                              btrfs_reserve_extent+0xd8/0x170 [btrfs]
                              cow_file_range.isra.66+0x133/0x470 [btrfs]
                              run_delalloc_range+0x121/0x410 [btrfs]
                              writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                              __extent_writepage+0x19a/0x360 [btrfs]
                              extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                              extent_writepages+0x4d/0x60 [btrfs]
                              do_writepages+0x1a/0x70
                              __filemap_fdatawrite_range+0xa7/0xe0
                              btrfs_rename+0x5ee/0xdb0 [btrfs]
                              vfs_rename+0x52a/0x7e0
                              SyS_rename+0x351/0x3b0
                              do_syscall_64+0x79/0x1e0
                              entry_SYSCALL_64_after_hwframe+0x42/0xb7
           SOFTIRQ-ON-R at:
                              down_read+0x35/0x90
                              caching_thread+0x57/0x560 [btrfs]
                              normal_work_helper+0x1c0/0x5e0 [btrfs]
                              process_one_work+0x1e0/0x5c0
                              worker_thread+0x44/0x390
                              kthread+0x102/0x140
                              ret_from_fork+0x3a/0x50
           INITIAL USE at:
                             down_write+0x3e/0xa0
                             cache_block_group+0x287/0x420 [btrfs]
                             find_free_extent+0x106c/0x12d0 [btrfs]
                             btrfs_reserve_extent+0xd8/0x170 [btrfs]
                             cow_file_range.isra.66+0x133/0x470 [btrfs]
                             run_delalloc_range+0x121/0x410 [btrfs]
                             writepage_delalloc.isra.50+0xfe/0x180 [btrfs]
                             __extent_writepage+0x19a/0x360 [btrfs]
                             extent_write_cache_pages.constprop.56+0x249/0x3e0 [btrfs]
                             extent_writepages+0x4d/0x60 [btrfs]
                             do_writepages+0x1a/0x70
                             __filemap_fdatawrite_range+0xa7/0xe0
                             btrfs_rename+0x5ee/0xdb0 [btrfs]
                             vfs_rename+0x52a/0x7e0
                             SyS_rename+0x351/0x3b0
                             do_syscall_64+0x79/0x1e0
                             entry_SYSCALL_64_after_hwframe+0x42/0xb7
         }
         ... key      at: [<ffffffffc0729578>] __key.61970+0x0/0xfffffffffff9aa88 [btrfs]
         ... acquired at:
         cache_block_group+0x287/0x420 [btrfs]
         find_free_extent+0x106c/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         btrfs_create_tree+0xbb/0x2a0 [btrfs]
         btrfs_create_uuid_tree+0x37/0x140 [btrfs]
         open_ctree+0x23c0/0x2660 [btrfs]
         btrfs_mount+0xd36/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         btrfs_mount+0x18c/0xf90 [btrfs]
         mount_fs+0x3a/0x160
         vfs_kern_mount+0x66/0x150
         do_mount+0x1c1/0xcc0
         SyS_mount+0x7e/0xd0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
       -> (&found->groups_sem){++++..} ops: 2134587 {
          HARDIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          HARDIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-W at:
                            down_write+0x3e/0xa0
                            __link_block_group+0x34/0x130 [btrfs]
                            btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                            open_ctree+0x2054/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          SOFTIRQ-ON-R at:
                            down_read+0x35/0x90
                            btrfs_calc_num_tolerated_disk_barrier_failures+0x113/0x1f0 [btrfs]
                            open_ctree+0x207b/0x2660 [btrfs]
                            btrfs_mount+0xd36/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            btrfs_mount+0x18c/0xf90 [btrfs]
                            mount_fs+0x3a/0x160
                            vfs_kern_mount+0x66/0x150
                            do_mount+0x1c1/0xcc0
                            SyS_mount+0x7e/0xd0
                            do_syscall_64+0x79/0x1e0
                            entry_SYSCALL_64_after_hwframe+0x42/0xb7
          INITIAL USE at:
                           down_write+0x3e/0xa0
                           __link_block_group+0x34/0x130 [btrfs]
                           btrfs_read_block_groups+0x33d/0x7b0 [btrfs]
                           open_ctree+0x2054/0x2660 [btrfs]
                           btrfs_mount+0xd36/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           btrfs_mount+0x18c/0xf90 [btrfs]
                           mount_fs+0x3a/0x160
                           vfs_kern_mount+0x66/0x150
                           do_mount+0x1c1/0xcc0
                           SyS_mount+0x7e/0xd0
                           do_syscall_64+0x79/0x1e0
                           entry_SYSCALL_64_after_hwframe+0x42/0xb7
        }
        ... key      at: [<ffffffffc0729488>] __key.59101+0x0/0xfffffffffff9ab78 [btrfs]
        ... acquired at:
         find_free_extent+0xcb4/0x12d0 [btrfs]
         btrfs_reserve_extent+0xd8/0x170 [btrfs]
         btrfs_alloc_tree_block+0x12f/0x4c0 [btrfs]
         __btrfs_cow_block+0x110/0x5b0 [btrfs]
         btrfs_cow_block+0xd7/0x290 [btrfs]
         btrfs_search_slot+0x1f6/0x960 [btrfs]
         btrfs_lookup_inode+0x2a/0x90 [btrfs]
         __btrfs_update_delayed_inode+0x65/0x210 [btrfs]
         btrfs_commit_inode_delayed_inode+0x121/0x130 [btrfs]
         btrfs_evict_inode+0x3fe/0x6a0 [btrfs]
         evict+0xc4/0x190
         __dentry_kill+0xbf/0x170
         dput+0x2ae/0x2f0
         SyS_rename+0x2a6/0x3b0
         do_syscall_64+0x79/0x1e0
         entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> (&delayed_node->mutex){+.+.-.} ops: 5580204 {
         HARDIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         SOFTIRQ-ON-W at:
                          __mutex_lock+0x4e/0x8c0
                          btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                          btrfs_update_inode+0x83/0x110 [btrfs]
                          btrfs_dirty_inode+0x62/0xe0 [btrfs]
                          touch_atime+0x8c/0xb0
                          do_generic_file_read+0x818/0xb10
                          __vfs_read+0xdc/0x150
                          vfs_read+0x8a/0x130
                          SyS_read+0x45/0xa0
                          do_syscall_64+0x79/0x1e0
                          entry_SYSCALL_64_after_hwframe+0x42/0xb7
         IN-RECLAIM_FS-W at:
                             __mutex_lock+0x4e/0x8c0
                             __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
                             btrfs_evict_inode+0x22c/0x6a0 [btrfs]
                             evict+0xc4/0x190
                             dispose_list+0x35/0x50
                             prune_icache_sb+0x42/0x50
                             super_cache_scan+0x139/0x190
                             shrink_slab+0x262/0x5b0
                             shrink_node+0x2eb/0x2f0
                             kswapd+0x2eb/0x890
                             kthread+0x102/0x140
                             ret_from_fork+0x3a/0x50
         INITIAL USE at:
                         __mutex_lock+0x4e/0x8c0
                         btrfs_delayed_update_inode+0x46/0x6e0 [btrfs]
                         btrfs_update_inode+0x83/0x110 [btrfs]
                         btrfs_dirty_inode+0x62/0xe0 [btrfs]
                         touch_atime+0x8c/0xb0
                         do_generic_file_read+0x818/0xb10
                         __vfs_read+0xdc/0x150
                         vfs_read+0x8a/0x130
                         SyS_read+0x45/0xa0
                         do_syscall_64+0x79/0x1e0
                         entry_SYSCALL_64_after_hwframe+0x42/0xb7
       }
       ... key      at: [<ffffffffc072d488>] __key.56935+0x0/0xfffffffffff96b78 [btrfs]
       ... acquired at:
         __lock_acquire+0x264/0x11c0
         lock_acquire+0xbd/0x1e0
         __mutex_lock+0x4e/0x8c0
         __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
         btrfs_evict_inode+0x22c/0x6a0 [btrfs]
         evict+0xc4/0x190
         dispose_list+0x35/0x50
         prune_icache_sb+0x42/0x50
         super_cache_scan+0x139/0x190
         shrink_slab+0x262/0x5b0
         shrink_node+0x2eb/0x2f0
         kswapd+0x2eb/0x890
         kthread+0x102/0x140
         ret_from_fork+0x3a/0x50
      
      stack backtrace:
      CPU: 1 PID: 50 Comm: kswapd0 Tainted: G        W        4.12.14-kvmsmall #8 SLE15 (unreleased)
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0x78/0xb7
       print_irq_inversion_bug.part.38+0x19f/0x1aa
       check_usage_forwards+0x102/0x120
       ? ret_from_fork+0x3a/0x50
       ? check_usage_backwards+0x110/0x110
       mark_lock+0x16c/0x270
       __lock_acquire+0x264/0x11c0
       ? pagevec_lookup_entries+0x1a/0x30
       ? truncate_inode_pages_range+0x2b3/0x7f0
       lock_acquire+0xbd/0x1e0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       __mutex_lock+0x4e/0x8c0
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       ? btrfs_evict_inode+0x1f6/0x6a0 [btrfs]
       __btrfs_release_delayed_node+0x3a/0x1f0 [btrfs]
       btrfs_evict_inode+0x22c/0x6a0 [btrfs]
       evict+0xc4/0x190
       dispose_list+0x35/0x50
       prune_icache_sb+0x42/0x50
       super_cache_scan+0x139/0x190
       shrink_slab+0x262/0x5b0
       shrink_node+0x2eb/0x2f0
       kswapd+0x2eb/0x890
       kthread+0x102/0x140
       ? mem_cgroup_shrink_node+0x2c0/0x2c0
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x3a/0x50
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8a5a916d
    • Anand Jain's avatar
      btrfs: drop optimal argument from find_live_mirror() · 8ba0ae78
      Anand Jain authored
      Drop optimal argument from the function find_live_mirror() as we can
      deduce it in the function itself. Also rename optimal to
      preferred_mirror.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8ba0ae78
    • Anand Jain's avatar
      btrfs: drop num argument from find_live_mirror() · 99f92a7c
      Anand Jain authored
      Obtain the stripes info from the map directly and so no need
      to pass it as an argument.
      Signed-off-by: default avatarAnand Jain <anand.jain@oracle.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      99f92a7c
    • Nikolay Borisov's avatar
      btrfs: Drop fs_info parameter from __btrfs_run_delayed_refs · 0a1e458a
      Nikolay Borisov authored
      It's provided by transaction handle.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0a1e458a
    • Nikolay Borisov's avatar
      btrfs: Drop fs_info parameter from btrfs_finish_extent_commit · 5ead2dd0
      Nikolay Borisov authored
      It's provided by the transaction handle.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      5ead2dd0
    • Nikolay Borisov's avatar
      btrfs: Drop fs_info parameter from btrfs_qgroup_account_extents · 460fb20a
      Nikolay Borisov authored
      It's provided by the transaction handle.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      460fb20a
    • Nikolay Borisov's avatar
      btrfs: drop fs_info parameter from btrfs_run_delayed_refs · c79a70b1
      Nikolay Borisov authored
      It's provided by the transaction handle.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c79a70b1
    • Nikolay Borisov's avatar
      btrfs: Remove unused flush var in shrink_delalloc · 39d7d09d
      Nikolay Borisov authored
      Added by 08e007d2 ("Btrfs: improve the noflush reservation") and
      made redundant by 17024ad0 ("Btrfs: fix early ENOSPC due to
      delalloc").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      39d7d09d
    • Nikolay Borisov's avatar
      btrfs: Remove unused extent_root var from caching_thread · 101d2dc0
      Nikolay Borisov authored
      Added by b4570aa9 ("btrfs: fix compiling with CONFIG_BTRFS_DEBUG
      enabled.") and obsoleted by 2ff7e61e ("btrfs: take an fs_info
      directly when the root is not used otherwise").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      101d2dc0
    • Nikolay Borisov's avatar
      btrfs: remove max_active var from open_ctree · 338dae1a
      Nikolay Borisov authored
      Introduced by 5cdc7ad3 ("btrfs: Replace fs_info->workers with
      btrfs_workqueue.") but obsoleted by 2a458198 ("btrfs: factor
      btrfs_init_workqueues() out of open_ctree()").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      338dae1a
    • Nikolay Borisov's avatar
      btrfs: Remove unused root var from relink_file_extents · 8535dc19
      Nikolay Borisov authored
      Added in 38c227d8 ("Btrfs: snapshot-aware defrag") but subsequently
      made redundant by 0b246afa ("btrfs: root->fs_info cleanup, add
      fs_info convenience variables").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8535dc19
    • Nikolay Borisov's avatar
      btrfs: Remove unused tot_len var from lzo_decompress · 4eeb97c6
      Nikolay Borisov authored
      Added already unused in a6fa6fae ("btrfs: Add lzo compression
      support").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      4eeb97c6
    • Nikolay Borisov's avatar
      btrfs: Remove unused length var from scrub_handle_errored_block · d6e823a5
      Nikolay Borisov authored
      Added in b5d67f64 ("Btrfs: change scrub to support big blocks") but
      rendered redundant by be50a8dd ("Btrfs: Simplify
      scrub_setup_recheck_block()'s argument").
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d6e823a5
    • Nikolay Borisov's avatar
      btrfs: Remove unused op_key var from add_delayed_refs · a6dbceaf
      Nikolay Borisov authored
      Added as part of 86d5f994 ("btrfs: convert prelimary reference
      tracking to use rbtrees") but never used. tmp_op_key essentially
      subsumed that variable.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a6dbceaf
    • Qu Wenruo's avatar
      btrfs: volumes: Remove the meaningless condition of minimal nr_devs when allocating a chunk · ba89b802
      Qu Wenruo authored
      When checking the minimal nr_devs, there is one dead and meaningless
      condition:
      
      if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) {
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      This condition is meaningless, @devs_increment has nothing to do with
      @sub_stripes.
      
      In fact, in btrfs_raid_array[], profile with sub_stripes larger than 1
      (RAID10) already has the @devs_increment set to 2.
      So no need to multiple it by @sub_stripes.
      
      And above condition is also dead.
      For RAID10, @devs_increment * @sub_stripes equals 4, which is also the
      @devs_min of RAID10.
      For other profiles, @sub_stripes is always 1, and since @ndevs is
      rounded down to @devs_increment, the condition will always be true.
      
      Remove the meaningless condition to make later reader wander less.
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      ba89b802
    • Nikolay Borisov's avatar
      btrfs: Document parameters of btrfs_reserve_extent · 6f47c706
      Nikolay Borisov authored
      This function is the entry to the extent allocator and as such has
      quite a number of parameters. Some of those have subtle effects on the
      allocation algorithm. Document the parameters.
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarAnand Jain <anand.jain@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6f47c706
    • Nikolay Borisov's avatar
      btrfs: Handle error from btrfs_uuid_tree_rem call in _btrfs_ioctl_set_received_subvol · d87ff758
      Nikolay Borisov authored
      As with every function which deals with modifying the btree
      btrfs_uuid_tree_rem can fail for any number of reasons (ie. EIO/ENOMEM).
      Handle return error value from this function gracefully by aborting the
      transaction.
      
      Fixes: dd5f9615 ("Btrfs: maintain subvolume items in the UUID tree")
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d87ff758
    • Nikolay Borisov's avatar
      btrfs: Use sizeof directly instead of a constant variable · 776c4a7c
      Nikolay Borisov authored
      The kernel would like to have all stack VLA usage removed[1].
      Unfortunately using an integer constant variable as the size of an
      array is still considered a VLA. Instead let's use directly sizeof(var)
      which removes the VLA usage. Use the occasion to remove csum_size
      altogether and use sizeof() also for the size passed to memcmp
      
      [1]: https://lkml.org/lkml/2018/3/7/621Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      776c4a7c
    • David Sterba's avatar
    • David Sterba's avatar
      btrfs: remove unused parameters from extent_submit_bio_done_t · 6c553435
      David Sterba authored
      Remove parameters not used by any of the callbacks.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6c553435
    • David Sterba's avatar
      btrfs: remove unused parameters from extent_submit_bio_start_t · d0779291
      David Sterba authored
      Remove parameters not used by any of the callbacks.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d0779291
    • David Sterba's avatar
      btrfs: separate types for submit_bio_start and submit_bio_done · a758781d
      David Sterba authored
      The callbacks make use of different parameters that are passed to the
      other type unnecessarily. This patch adds separate types for each and
      the unused parameters will be removed.
      
      The type extent_submit_bio_hook_t keeps all parameters and can be used
      where the start/done types are not appropriate.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a758781d
    • David Sterba's avatar
      btrfs: kill tree_mod_log_set_root_pointer helper · d9d19a01
      David Sterba authored
      A useless wrapper around tree_mod_log_insert_root that hides missing
      error handling. Move it to the callers.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      d9d19a01
    • David Sterba's avatar
      btrfs: kill tree_mod_log_set_node_key helper · 0e82bcfe
      David Sterba authored
      A trivial wrapper that can be simply opencoded and makes the GFP
      allocation request more visible. The error handling is now moved to the
      callers.
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0e82bcfe