1. 28 Jan, 2014 40 commits
    • Qu Wenruo's avatar
      btrfs: Add treelog mount option. · a88998f2
      Qu Wenruo authored
      Add treelog mount option to enable tree log with
      remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a88998f2
    • Qu Wenruo's avatar
      btrfs: Add datasum mount option. · d399167d
      Qu Wenruo authored
      Add datasum mount option to enable checksum with
      remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      d399167d
    • Qu Wenruo's avatar
      btrfs: Add datacow mount option. · a258af7a
      Qu Wenruo authored
      Add datacow mount option to enable copy-on-write with
      remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a258af7a
    • Qu Wenruo's avatar
      btrfs: Add acl mount option. · bd0330ad
      Qu Wenruo authored
      Add acl mount option to enable acl with remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      bd0330ad
    • Qu Wenruo's avatar
      btrfs: Add noflushoncommit mount option. · 2c9ee856
      Qu Wenruo authored
      Add noflushoncommit mount option to disable flush on commit with
      remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      2c9ee856
    • Qu Wenruo's avatar
      btrfs: Add noenospc_debug mount option. · 53036293
      Qu Wenruo authored
      Add noenospc_debug mount option to disable ENOSPC debug with
      remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      53036293
    • Qu Wenruo's avatar
      btrfs: Add nodiscard mount option. · e07a2ade
      Qu Wenruo authored
      Add nodiscard mount option to disable discard with remount option.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e07a2ade
    • Qu Wenruo's avatar
      btrfs: Add noautodefrag mount option. · fc0ca9af
      Qu Wenruo authored
      Btrfs has autodefrag mount option but no pairing noautodefrag option,
      which makes it impossible to disable autodefrag without umount.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      fc0ca9af
    • Qu Wenruo's avatar
      btrfs: Add "barrier" option to support "-o remount,barrier" · 842bef58
      Qu Wenruo authored
      Btrfs can be remounted without barrier, but there is no "barrier" option
      so nobody can remount btrfs back with barrier on. Only umount and
      mount again can re-enable barrier.(Quite awkward)
      
      Also the mount options in the document is also changed slightly for the
      further pairing options changes.
      Reported-by: default avatarDaniel Blueman <daniel@quora.org>
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarMike Fleetwood <mike.fleetwood@googlemail.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      842bef58
    • Wang Shilong's avatar
      Btrfs: only fua the first superblock when writting supers · e8117c26
      Wang Shilong authored
      We only intent to fua the first superblock in every device from
      comments, fix it.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e8117c26
    • Liu Bo's avatar
      Btrfs: return free space to global_rsv as much as possible · 17504584
      Liu Bo authored
      @full is not protected within global_rsv.lock, so we may think global_rsv
      is already full but in fact it's not, so we miss the opportunity to return
      free space to global_rsv directly when we release other block_rsvs.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      17504584
    • Wang Shilong's avatar
      Btrfs: fix an oops when we fail to relocate tree blocks · 1708cc57
      Wang Shilong authored
      During balance test, we hit an oops:
      [ 2013.841551] kernel BUG at fs/btrfs/relocation.c:1174!
      
      The problem is that if we fail to relocate tree blocks, we should
      update backref cache, otherwise, some pending nodes are not updated
      while snapshot check @cache->last_trans is within one transaction
      and won't update it and then oops happen.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      1708cc57
    • Miao Xie's avatar
      Btrfs: fix the wrong nocow range check · e77751aa
      Miao Xie authored
      The following warning message was outputed when running the 274th case
      of xfstests with nodatacow option:
       BUG: Bad page state in process kswapd0  pfn:1c66f
       page:ffffea0000636848 count:0 mapcount:0 mapping:(null) index:0x78000
       page flags: 0x1000000000100a(error|uptodate|private_2)
      
      It is because the check of nocow range was wrong, we should compare the
      start and end position of the extent with the write position to verify
      if the write position was in the extent, but the current code just used
      the start postion to do the check, so we got the wrong extent and told
      the caller that it was a nocow write. And then when we write back the
      dirty pages, we found we should cow the extent, but at that time, there
      was no space in the fs, we had to the error flag for the page. When
      someone reclaimed that page, the above warning outputed. Fix it.
      Reported-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e77751aa
    • Wang Shilong's avatar
      Btrfs: fix an oops when we fail to merge reloc roots · 25e293c2
      Wang Shilong authored
      Previously, we will free reloc root memory and then force filesystem
      to be readonly. The problem is that there may be another thread commiting
      transaction which will try to access freed reloc root during merging reloc
      roots process.
      
      To keep consistency snapshots shared space, we should allow snapshot
      finished if possible, so here we don't free reloc root memory.
      signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      25e293c2
    • Wang Shilong's avatar
      Btrfs: remove unused argument from select_reloc_root() · dc4103f9
      Wang Shilong authored
      @nr is no longer used, remove it from select_reloc_root()
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      dc4103f9
    • Filipe David Borba Manana's avatar
      Btrfs: reduce btree node locking duration on item update · eb653de1
      Filipe David Borba Manana authored
      If we do a btree search with the goal of updating an existing item
      without changing its size (ins_len == 0 and cow == 1), then we never
      need to hold locks on upper level nodes (even when slot == 0) after we
      COW their child nodes/leaves, as we won't have node splits or merges
      in this scenario (that is, no key additions, removals or shifts on any
      nodes or leaves).
      
      Therefore release the locks immediately after COWing the child nodes/leaves
      while navigating the btree, even if their parent slot is 0, instead of
      returning a path to the caller with those nodes locked, which would get
      released only when the caller releases or frees the path (or if it calls
      btrfs_unlock_up_safe).
      
      This is a common scenario, for example when updating inode items in fs
      trees and block group items in the extent tree.
      
      The following benchmarks were performed on a quad core machine with 32Gb
      of ram, using a leaf/node size of 4Kb (to generate deeper fs trees more
      quickly).
      
        sysbench --test=fileio --file-num=131072 --file-total-size=8G \
          --file-test-mode=seqwr --num-threads=512 --file-block-size=8192 \
          --max-requests=100000 --file-io-mode=sync [prepare|run]
      
      Before this change:  49.85Mb/s (average of 5 runs)
      After this change:   50.38Mb/s (average of 5 runs)
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      eb653de1
    • Wenliang Fan's avatar
      fs/btrfs: Integer overflow in btrfs_ioctl_resize() · eb8052e0
      Wenliang Fan authored
      The local variable 'new_size' comes from userspace. If a large number
      was passed, there would be an integer overflow in the following line:
      	new_size = old_size + new_size;
      Signed-off-by: default avatarWenliang Fan <fanwlexca@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      eb8052e0
    • Josef Bacik's avatar
      Btrfs: stop caching thread if extent_commit_sem is contended · c9ea7b24
      Josef Bacik authored
      We can starve out the transaction commit with a bunch of caching threads all
      running at the same time.  This is because we will only drop the
      extent_commit_sem if we need_resched(), which isn't likely to happen since we
      will be reading a lot from the disk so have already schedule()'ed plenty.  Alex
      observed that he could starve out a transaction commit for up to a minute with
      32 caching threads all running at once.  This will allow us to drop the
      extent_commit_sem to allow the transaction commit to swap the commit_root out
      and then all the cachers will start back up. Here is an explanation provided by
      Igno
      
      So, just to fill in what happens in this loop:
      
                                      mutex_unlock(&caching_ctl->mutex);
                                      cond_resched();
                                      goto again;
      
      where 'again:' takes caching_ctl->mutex and fs_info->extent_commit_sem
      again:
      
              again:
                      mutex_lock(&caching_ctl->mutex);
                      /* need to make sure the commit_root doesn't disappear */
                      down_read(&fs_info->extent_commit_sem);
      
      So, if I'm reading the code correct, there can be a fair amount of
      concurrency here: there may be multiple 'caching kthreads' per filesystem
      active, while there's one fs_info->extent_commit_sem per filesystem
      AFAICS.
      
      So, what happens if there are a lot of CPUs all busy holding the
      ->extent_commit_sem rwsem read-locked and a writer arrives? They'd all
      rush to try to release the fs_info->extent_commit_sem, and they'd block in
      the down_read() because there's a writer waiting.
      
      So there's a guarantee of forward progress. This should answer akpm's
      concern I think.
      
      Thanks,
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      c9ea7b24
    • Josef Bacik's avatar
      rwsem: add rwsem_is_contended · 4a444b1f
      Josef Bacik authored
      Btrfs needs a simple way to know if it needs to let go of it's read lock on a
      rwsem.  Introduce rwsem_is_contended to check to see if there are any waiters on
      this rwsem currently.  This is just a hueristic, it is meant to be light and not
      100% accurate and called by somebody already holding on to the rwsem in either
      read or write.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      4a444b1f
    • Miao Xie's avatar
      Btrfs: introduce the delayed inode ref deletion for the single link inode · 67de1176
      Miao Xie authored
      The inode reference item is close to inode item, so we insert it simultaneously
      with the inode item insertion when we create a file/directory.. In fact, we also
      can handle the inode reference deletion by the same way. So we made this patch to
      introduce the delayed inode reference deletion for the single link inode(At most
      case, the file doesn't has hard link, so we don't take the hard link into account).
      
      This function is based on the delayed inode mechanism. After applying this patch,
      we can reduce the time of the file/directory deletion by ~10%.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      67de1176
    • Miao Xie's avatar
      7cf35d91
    • Miao Xie's avatar
      Btrfs: remove btrfs_end_transaction_dmeta() · a56dbd89
      Miao Xie authored
      Two reasons:
      - btrfs_end_transaction_dmeta() is the same as btrfs_end_transaction_throttle()
        so it is unnecessary.
      - All the delayed items should be dealt in the current transaction, so the
        workers should not commit the transaction, instead, deal with the delayed
        items as many as possible.
      
      So we can remove btrfs_end_transaction_dmeta()
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a56dbd89
    • Miao Xie's avatar
      Btrfs: cleanup code of btrfs_balance_delayed_items() · 0353808c
      Miao Xie authored
      - move the condition check for wait into a function
      - use wait_event_interruptible instead of prepare-schedule-finish process
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      0353808c
    • Miao Xie's avatar
      Btrfs: don't run delayed nodes again after all nodes flush · 4dd466d3
      Miao Xie authored
      If the number of the delayed items is greater than the upper limit, we will
      try to flush all the delayed items. After that, it is unnecessary to run
      them again because they are being dealt with by the wokers or the number of
      them is less than the lower limit.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      4dd466d3
    • Miao Xie's avatar
      Btrfs: remove residual code in delayed inode async helper · 74c40f92
      Miao Xie authored
      Before applying the patch
        commit de3cb945
        title: Btrfs: improve the delayed inode throttling
      
      We need requeue the async work after the current work was done, it
      introduced a deadlock problem. So we wrote the code that this patch
      removes to avoid the above problem. But after applying the above
      patch, the deadlock problem didn't exist. So we should remove that
      fix code.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      74c40f92
    • Frank Holton's avatar
      Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0
      Frank Holton authored
      Convert all applicable cases of printk and pr_* to the btrfs_* macros.
      
      Fix all uses of the BTRFS prefix.
      Signed-off-by: default avatarFrank Holton <fholton@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      efe120a0
    • Filipe David Borba Manana's avatar
      Btrfs: fix tree mod logging · 5de865ee
      Filipe David Borba Manana authored
      While running the test btrfs/004 from xfstests in a loop, it failed
      about 1 time out of 20 runs in my desktop. The failure happened in
      the backref walking part of the test, and the test's error message was
      like this:
      
        btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
            --- tests/btrfs/004.out	2013-11-26 18:25:29.263333714 +0000
            +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad	2013-12-10 15:25:10.327518516 +0000
            @@ -1,3 +1,8 @@
             QA output created by 004
             *** test backref walking
            -*** done
            +unexpected output from
            +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
            +expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
            +
             ...
             (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
        Ran: btrfs/004
        Failures: btrfs/004
        Failed 1 of 1 tests
      
      But immediately after the test finished, the btrfs inspect-internal command
      returned the expected output:
      
        $ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
        inode 405 offset 454656 root 258
        inode 405 offset 454656 root 5
      
      It turned out this was because the btrfs_search_old_slot() calls performed
      during backref walking (backref.c:__resolve_indirect_ref) were not finding
      anything. The reason for this turned out to be that the tree mod logging
      code was not logging some node multi-step operations atomically, therefore
      btrfs_search_old_slot() callers iterated often over an incomplete tree that
      wasn't fully consistent with any tree state from the past. Besides missing
      items, this often (but not always) resulted in -EIO errors during old slot
      searches, reported in dmesg like this:
      
      [ 4299.933936] ------------[ cut here ]------------
      [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
      [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
      [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
      [ 4299.933979]  000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
      [ 4299.933982]  0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
      [ 4299.933984]  ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
      [ 4299.933987] Call Trace:
      [ 4299.933991]  [<ffffffff8176d284>] dump_stack+0x55/0x76
      [ 4299.933994]  [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
      [ 4299.933997]  [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
      [ 4299.934003]  [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
      [ 4299.934005]  [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
      [ 4299.934010]  [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
      [ 4299.934019]  [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
      [ 4299.934027]  [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
      [ 4299.934034]  [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
      [ 4299.934042]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934048]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934056]  [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
      [ 4299.934058]  [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
      [ 4299.934065]  [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
      [ 4299.934071]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934078]  [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
      [ 4299.934080]  [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
      [ 4299.934083]  [<ffffffff81075563>] ? up_read+0x23/0x40
      [ 4299.934085]  [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
      [ 4299.934088]  [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
      [ 4299.934090]  [<ffffffff81776e23>] ? error_sti+0x5/0x6
      [ 4299.934093]  [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [ 4299.934096]  [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
      [ 4299.934098]  [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
      [ 4299.934100]  [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
      [ 4299.934378] btrfs bad fsid on block 0
      
      These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
      tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
      be performed atomically before the following commit:
      
        c8cc6341
        (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
      
      That change removed the atomicity of such operations. This patch restores the
      atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
      structures, so it has to do the allocations using GFP_NOFS before acquiring
      the mod log lock.
      
      This issue has been experienced by several users recently, such as for example:
      
        http://www.spinics.net/lists/linux-btrfs/msg28574.html
      
      After running the btrfs/004 test for 679 consecutive iterations with this
      patch applied, I didn't ran into the issue anymore.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5de865ee
    • David Sterba's avatar
      btrfs: check balance of send_in_progress · 66ef7d65
      David Sterba authored
      Warn if the balance goes below zero, which appears to be unlikely
      though. Otherwise cleans up the code a bit.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      66ef7d65
    • Wang Shilong's avatar
      Btrfs: remove transaction from btrfs send · 41ce9970
      Wang Shilong authored
      Since daivd did the work that force us to use readonly snapshot,
      we can safely remove transaction protection from btrfs send.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      41ce9970
    • Miao Xie's avatar
      Btrfs: fix double initialization of the raid kobject · 536cd964
      Miao Xie authored
      We met the following oops when doing space balance:
       kobject (ffff88081b590278): tried to init an initialized object, something is seriously wrong.
       ...
       Call Trace:
        [<ffffffff81937262>] dump_stack+0x49/0x5f
        [<ffffffff8137d259>] kobject_init+0x89/0xa0
        [<ffffffff8137d36a>] kobject_init_and_add+0x2a/0x70
        [<ffffffffa009bd79>] ? clear_extent_bit+0x199/0x470 [btrfs]
        [<ffffffffa005e82c>] __link_block_group+0xfc/0x120 [btrfs]
        [<ffffffffa006b9db>] btrfs_make_block_group+0x24b/0x370 [btrfs]
        [<ffffffffa00a899b>] __btrfs_alloc_chunk+0x54b/0x7e0 [btrfs]
        [<ffffffffa00a8c6f>] btrfs_alloc_chunk+0x3f/0x50 [btrfs]
        [<ffffffffa0060123>] do_chunk_alloc+0x363/0x440 [btrfs]
        [<ffffffffa00633d4>] btrfs_check_data_free_space+0x104/0x310 [btrfs]
        [<ffffffffa0069f4d>] btrfs_write_dirty_block_groups+0x48d/0x600 [btrfs]
        [<ffffffffa007aad4>] commit_cowonly_roots+0x184/0x250 [btrfs]
        ...
      
      Steps to reproduce:
       # mkfs.btrfs -f <dev>
       # mount -o nospace_cache <dev> <mnt>
       # btrfs balance start <mnt>
       # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1
      
      The reason of this problem is that we initialized the raid kobject when we added
      a block group into a empty raid list. As we know, when we mounted a btrfs filesystem,
      the raid list was empty, we would initialize the raid kobject when we added the first
      block group. But if there was not data stored in the block group, the block group
      would be freed when doing balance, and the raid list would be empty. And then if we
      allocated a new block group and added it into the raid list, we would initialize
      the raid kobject again, the oops happened.
      
      Fix this problem by initializing the raid kobject just when mounting the fs.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reported-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      536cd964
    • Wang Shilong's avatar
      Btrfs: fix a warning when iput a file · 180589ef
      Wang Shilong authored
      See the warning below:
      
      [ 1209.102076]  [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 [btrfs]
      [ 1209.102084]  [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 [btrfs]
      [ 1209.102089]  [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40
      [ 1209.102092]  [<ffffffff8118ab2e>] evict+0x9e/0x190
      [ 1209.102094]  [<ffffffff8118b313>] iput+0xf3/0x180
      [ 1209.102101]  [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 [btrfs]
      [ 1209.102107]  [<ffffffffa045d358>] __btrfs_end_transaction+0x268/0x350 [btrfs]
      
      clear extent bit here to avoid triggering WARN_ON() in remove_extent_mapping()
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      180589ef
    • David Sterba's avatar
      btrfs: Check read-only status of roots during send · 2c686537
      David Sterba authored
      All the subvolues that are involved in send must be read-only during the
      whole operation. The ioctl SUBVOL_SETFLAGS could be used to change the
      status to read-write and the result of send stream is undefined if the
      data change unexpectedly.
      
      Fix that by adding a refcount for all involved roots and verify that
      there's no send in progress during SUBVOL_SETFLAGS ioctl call that does
      read-only -> read-write transition.
      
      We need refcounts because there are no restrictions on number of send
      parallel operations currently run on a single subvolume, be it source,
      parent or one of the multiple clone sources.
      
      Kernel is silent when the RO checks fail and returns EPERM. The same set
      of checks is done already in userspace before send starts.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      2c686537
    • David Sterba's avatar
      btrfs: remove unused mnt from send_ctx · a8d89f5b
      David Sterba authored
      Unused since ed259095
      "Btrfs: stop using vfs_read in send".
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a8d89f5b
    • David Sterba's avatar
      btrfs: send: clean up dead code · 95bc79d5
      David Sterba authored
      Remove ifdefed code:
      
      - tlv_put for 8, 16 and 32, add a generic tempalte if needed in future
      - tlv_put_timespec - the btrfs_timespec fields are used
      - fs_path_remove obsoleted long ago
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      95bc79d5
    • Filipe David Borba Manana's avatar
      Btrfs: fix deadlock when iterating inode refs and running delayed inodes · 3fe81ce2
      Filipe David Borba Manana authored
      While running btrfs/004 from xfstests, after 503 iterations, dmesg reported
      a deadlock between tasks iterating inode refs and tasks running delayed inodes
      (during a transaction commit).
      
      It turns out that iterating inode refs implies doing one tree search and
      release all nodes in the path except the leaf node, and then passing that
      leaf node to btrfs_ref_to_path(), which in turn does another tree search
      without releasing the lock on the leaf node it received as parameter.
      
      This is a problem when other task wants to write to the btree as well and
      ends up updating the leaf that is read locked - the writer task locks the
      parent of the leaf and then blocks waiting for the leaf's lock to be
      released - at the same time, the task executing btrfs_ref_to_path()
      does a second tree search, without releasing the lock on the first leaf,
      and wants to access a leaf (the same or another one) that is a child of
      the same parent, resulting in a deadlock.
      
      The trace reported by lockdep follows.
      
      [84314.936373] INFO: task fsstress:11930 blocked for more than 120 seconds.
      [84314.936381]       Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [84314.936383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [84314.936386] fsstress        D ffff8806e1bf8000     0 11930  11926 0x00000000
      [84314.936393]  ffff8804d6d89b78 0000000000000046 ffff8804d6d89b18 ffffffff810bd8bd
      [84314.936399]  ffff8806e1bf8000 ffff8804d6d89fd8 ffff8804d6d89fd8 ffff8804d6d89fd8
      [84314.936405]  ffff880806308000 ffff8806e1bf8000 ffff8804d6d89c08 ffff8804deb8f190
      [84314.936410] Call Trace:
      [84314.936421]  [<ffffffff810bd8bd>] ? trace_hardirqs_on+0xd/0x10
      [84314.936428]  [<ffffffff81774269>] schedule+0x29/0x70
      [84314.936451]  [<ffffffffa0715bf5>] btrfs_tree_lock+0x75/0x270 [btrfs]
      [84314.936457]  [<ffffffff810715c0>] ? __init_waitqueue_head+0x60/0x60
      [84314.936470]  [<ffffffffa06ba231>] btrfs_search_slot+0x7f1/0x930 [btrfs]
      [84314.936489]  [<ffffffffa0731c2a>] ? __btrfs_run_delayed_items+0x13a/0x1e0 [btrfs]
      [84314.936504]  [<ffffffffa06d2e1f>] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
      [84314.936510]  [<ffffffff810bd6ef>] ? trace_hardirqs_on_caller+0x1f/0x1e0
      [84314.936528]  [<ffffffffa073173c>] __btrfs_update_delayed_inode+0x4c/0x1d0 [btrfs]
      [84314.936543]  [<ffffffffa0731c2a>] ? __btrfs_run_delayed_items+0x13a/0x1e0 [btrfs]
      [84314.936558]  [<ffffffffa0731c2a>] ? __btrfs_run_delayed_items+0x13a/0x1e0 [btrfs]
      [84314.936573]  [<ffffffffa0731c82>] __btrfs_run_delayed_items+0x192/0x1e0 [btrfs]
      [84314.936589]  [<ffffffffa0731d03>] btrfs_run_delayed_items+0x13/0x20 [btrfs]
      [84314.936604]  [<ffffffffa06dbcd4>] btrfs_flush_all_pending_stuffs+0x24/0x80 [btrfs]
      [84314.936620]  [<ffffffffa06ddc13>] btrfs_commit_transaction+0x223/0xa20 [btrfs]
      [84314.936630]  [<ffffffffa06ae5ae>] btrfs_sync_fs+0x6e/0x110 [btrfs]
      [84314.936635]  [<ffffffff811d0b50>] ? __sync_filesystem+0x60/0x60
      [84314.936639]  [<ffffffff811d0b50>] ? __sync_filesystem+0x60/0x60
      [84314.936643]  [<ffffffff811d0b70>] sync_fs_one_sb+0x20/0x30
      [84314.936648]  [<ffffffff811a3541>] iterate_supers+0xf1/0x100
      [84314.936652]  [<ffffffff811d0c45>] sys_sync+0x55/0x90
      [84314.936658]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [84314.936660] INFO: lockdep is turned off.
      [84314.936663] INFO: task btrfs:11955 blocked for more than 120 seconds.
      [84314.936666]       Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [84314.936668] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [84314.936670] btrfs           D ffff880541729a88     0 11955  11608 0x00000000
      [84314.936674]  ffff880541729a38 0000000000000046 ffff8805417299d8 ffffffff810bd8bd
      [84314.936680]  ffff88075430c8a0 ffff880541729fd8 ffff880541729fd8 ffff880541729fd8
      [84314.936685]  ffffffff81c104e0 ffff88075430c8a0 ffff8804de8b00b8 ffff8804de8b0000
      [84314.936690] Call Trace:
      [84314.936695]  [<ffffffff810bd8bd>] ? trace_hardirqs_on+0xd/0x10
      [84314.936700]  [<ffffffff81774269>] schedule+0x29/0x70
      [84314.936717]  [<ffffffffa0715815>] btrfs_tree_read_lock+0xd5/0x140 [btrfs]
      [84314.936721]  [<ffffffff810715c0>] ? __init_waitqueue_head+0x60/0x60
      [84314.936733]  [<ffffffffa06ba201>] btrfs_search_slot+0x7c1/0x930 [btrfs]
      [84314.936746]  [<ffffffffa06bd505>] btrfs_find_item+0x55/0x160 [btrfs]
      [84314.936763]  [<ffffffffa06ff689>] ? free_extent_buffer+0x49/0xc0 [btrfs]
      [84314.936780]  [<ffffffffa073c9ca>] btrfs_ref_to_path+0xba/0x1e0 [btrfs]
      [84314.936797]  [<ffffffffa06f9719>] ? release_extent_buffer+0xb9/0xe0 [btrfs]
      [84314.936813]  [<ffffffffa06ff689>] ? free_extent_buffer+0x49/0xc0 [btrfs]
      [84314.936830]  [<ffffffffa073cb50>] inode_to_path+0x60/0xd0 [btrfs]
      [84314.936846]  [<ffffffffa073d365>] paths_from_inode+0x115/0x3c0 [btrfs]
      [84314.936851]  [<ffffffff8118dd44>] ? kmem_cache_alloc_trace+0x114/0x200
      [84314.936868]  [<ffffffffa0714494>] btrfs_ioctl+0xf14/0x2030 [btrfs]
      [84314.936873]  [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
      [84314.936877]  [<ffffffff8116598f>] ? handle_mm_fault+0x34f/0xb00
      [84314.936882]  [<ffffffff81075563>] ? up_read+0x23/0x40
      [84314.936886]  [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
      [84314.936892]  [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
      [84314.936896]  [<ffffffff81776e23>] ? error_sti+0x5/0x6
      [84314.936901]  [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [84314.936906]  [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
      [84314.936910]  [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
      [84314.936915]  [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [84314.936920]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [84314.936922] INFO: lockdep is turned off.
      [84434.866873] INFO: task btrfs-transacti:11921 blocked for more than 120 seconds.
      [84434.866881]       Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [84434.866883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [84434.866886] btrfs-transacti D ffff880755b6a478     0 11921      2 0x00000000
      [84434.866893]  ffff8800735b9ce8 0000000000000046 ffff8800735b9c88 ffffffff810bd8bd
      [84434.866899]  ffff8805a1b848a0 ffff8800735b9fd8 ffff8800735b9fd8 ffff8800735b9fd8
      [84434.866904]  ffffffff81c104e0 ffff8805a1b848a0 ffff880755b6a478 ffff8804cece78f0
      [84434.866910] Call Trace:
      [84434.866920]  [<ffffffff810bd8bd>] ? trace_hardirqs_on+0xd/0x10
      [84434.866927]  [<ffffffff81774269>] schedule+0x29/0x70
      [84434.866948]  [<ffffffffa06dd2ef>] wait_current_trans.isra.33+0xbf/0x120 [btrfs]
      [84434.866954]  [<ffffffff810715c0>] ? __init_waitqueue_head+0x60/0x60
      [84434.866970]  [<ffffffffa06dec18>] start_transaction+0x388/0x5a0 [btrfs]
      [84434.866985]  [<ffffffffa06db9b5>] ? transaction_kthread+0xb5/0x280 [btrfs]
      [84434.866999]  [<ffffffffa06dee97>] btrfs_attach_transaction+0x17/0x20 [btrfs]
      [84434.867012]  [<ffffffffa06dba9e>] transaction_kthread+0x19e/0x280 [btrfs]
      [84434.867026]  [<ffffffffa06db900>] ? open_ctree+0x2260/0x2260 [btrfs]
      [84434.867030]  [<ffffffff81070dad>] kthread+0xed/0x100
      [84434.867035]  [<ffffffff81070cc0>] ? flush_kthread_worker+0x190/0x190
      [84434.867040]  [<ffffffff8177ee6c>] ret_from_fork+0x7c/0xb0
      [84434.867044]  [<ffffffff81070cc0>] ? flush_kthread_worker+0x190/0x190
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3fe81ce2
    • Wang Shilong's avatar
      Btrfs: remove dead comments for read_csums() · 663df053
      Wang Shilong authored
      Chris introduced hleper function  read_csums() and this function
      has been removed, but we forgot to remove its corresponding comments.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      663df053
    • Filipe David Borba Manana's avatar
      Btrfs: remove field tree_mod_seq_elem from btrfs_fs_info struct · e223cfcd
      Filipe David Borba Manana authored
      It's not used anywhere, so just drop it.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e223cfcd
    • Filipe David Borba Manana's avatar
      Btrfs: fix use of uninitialized err variable · fc28b62d
      Filipe David Borba Manana authored
      fs/btrfs/file.c: In function ‘prepare_pages.isra.18’:
      fs/btrfs/file.c:1265:6: warning: ‘err’ may be used uninitialized in this function [-Wuninitialized]
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      fc28b62d
    • Wang Shilong's avatar
      Btrfs: remove unnecessary filemap writting and waiting after block group relocation · 54eb72c0
      Wang Shilong authored
      We have commited transaction before, remove redundant filemap writting and
      waiting here, it can speed up balance relocation process.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      54eb72c0
    • Tsutomu Itoh's avatar
      Btrfs: fix error check of btrfs_lookup_dentry() · 5662344b
      Tsutomu Itoh authored
      Clean up btrfs_lookup_dentry() to never return NULL, but PTR_ERR(-ENOENT)
      instead. This keeps the return value convention consistent.
      
      Callers who use btrfs_lookup_dentry() require a trivial update.
      
      create_snapshot() in particular looks like it can also lose a BUG_ON(!inode)
      which is not really needed - there seems less harm in returning ENOENT to
      userspace at that point in the stack than there is to crash the machine.
      Signed-off-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5662344b