1. 19 Jul, 2014 1 commit
    • Liu Bo's avatar
      Btrfs: fix abnormal long waiting in fsync · 98ce2ded
      Liu Bo authored
      xfstests generic/127 detected this problem.
      
      With commit 7fc34a62, now fsync will only flush
      data within the passed range.  This is the cause of the above problem,
      -- btrfs's fsync has a stage called 'sync log' which will wait for all the
      ordered extents it've recorded to finish.
      
      In xfstests/generic/127, with mixed operations such as truncate, fallocate,
      punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
      mmap, and then msync.  And I find that msync will wait for quite a long time
      (about 20s in my case), thanks to ftrace, it turns out that the previous
      fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
      range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
      there can be some ordered extents created but not getting corresponding pages
      flushed, then they're left in memory until we fsync which runs into the
      stage 'sync log', and fsync will just wait for the system writeback thread
      to flush those pages and get ordered extents finished, so the latency is
      inevitable.
      
      This adds a flush similar to btrfs_start_ordered_extent() in
      btrfs_wait_logged_extents() to fix that.
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      98ce2ded
  2. 03 Jul, 2014 11 commits
    • Filipe Manana's avatar
      Btrfs: fix crash when starting transaction · abdd2e80
      Filipe Manana authored
      Often when starting a transaction we commit the currently running transaction,
      which can end up writing block group caches when the current process has its
      journal_info set to NULL (and not to a transaction). This makes our assertion
      at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
      in a crash/hang. Therefore fix it by setting journal_info.
      
      Two different traces of this issue follow below.
      
      1)
      
          [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
          [51502.242213] ------------[ cut here ]------------
          [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
          [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
          (...)
          [51502.244010] Call Trace:
          [51502.244010]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
          [51502.244010]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
          [51502.244010]  [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs]
          [51502.244010]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
          [51502.244010]  [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40
          [51502.244010]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
          [51502.244010]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
          [51502.244010]  [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs]
          [51502.244010]  [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs]
          [51502.244010]  [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0
          [51502.244010]  [<ffffffff811baebc>] vfs_unlink+0xcc/0x150
          [51502.244010]  [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0
          [51502.244010]  [<ffffffff811a9ef4>] ? filp_close+0x64/0x90
          [51502.244010]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
          [51502.244010]  [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f
          [51502.244010]  [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40
          [51502.244010]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
          [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
          [51502.244010] RIP  [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs]
      
      2)
      
          [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
          [25405.097488] ------------[ cut here ]------------
          [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
          [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
          (...)
          [25405.100008] Call Trace:
          [25405.100008]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
          [25405.100008]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
          [25405.100008]  [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs]
          [25405.100008]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
          [25405.100008]  [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0
          [25405.100008]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
          [25405.100008]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
          [25405.100008]  [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs]
          [25405.100008]  [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs]
          [25405.100008]  [<ffffffff811bc63b>] vfs_create+0x9b/0x130
          [25405.100008]  [<ffffffff811bcf19>] do_last+0x849/0xe20
          [25405.100008]  [<ffffffff811b9409>] ? link_path_walk+0x79/0x820
          [25405.100008]  [<ffffffff811bd5b5>] path_openat+0xc5/0x690
          [25405.100008]  [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10
          [25405.100008]  [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0
          [25405.100008]  [<ffffffff811be2a3>] do_filp_open+0x43/0xa0
          [25405.100008]  [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0
          [25405.100008]  [<ffffffff811abcfc>] do_sys_open+0x13c/0x230
          [25405.100008]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
          [25405.100008]  [<ffffffff811abe12>] SyS_open+0x22/0x30
          [25405.100008]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
          [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
          [25405.100008] RIP  [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs]
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      abdd2e80
    • Josef Bacik's avatar
      Btrfs: fix btrfs_print_leaf for skinny metadata · be2c765d
      Josef Bacik authored
      We wouldn't actuall print the extent information if we had a skinny metadata
      item, this fixes that.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      be2c765d
    • Liu Bo's avatar
      Btrfs: fix race of using total_bytes_pinned · d288db5d
      Liu Bo authored
      This percpu counter @total_bytes_pinned is introduced to skip unnecessary
      operations of 'commit transaction', it accounts for those space we may free
      but are stuck in delayed refs.
      
      And we zero out @space_info->total_bytes_pinned every transaction period so
      we have a better idea of how much space we'll actually free up by committing
      this transaction.  However, we do the 'zero out' part a little earlier, before
      we actually unpin space, so we end up returning ENOSPC when we actually have
      free space that's just unpinned from committing transaction.
      
      xfstests/generic/074 complained then.
      
      This fixes it by actually accounting the percpu pinned number when 'unpin',
      and since it's protected by space_info->lock, the race is gone now.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      d288db5d
    • David Sterba's avatar
      btrfs: use E2BIG instead of EIO if compression does not help · 130d5b41
      David Sterba authored
      Return codes got updated in 60e1975a
      (btrfs: return errno instead of -1 from compression)
      lzo wrapper returns E2BIG in this case, do the same for zlib.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      130d5b41
    • David Sterba's avatar
      btrfs: remove stale comment from btrfs_flush_all_pending_stuffs · 0a4eaea8
      David Sterba authored
      Commit fcebe456 (Btrfs: rework qgroup
      accounting) removed the qgroup accounting after delayed refs.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      0a4eaea8
    • Filipe Manana's avatar
      Btrfs: fix use-after-free when cloning a trailing file hole · 14f59796
      Filipe Manana authored
      The transaction handle was being used after being freed.
      
      Cc: Chris Mason <clm@fb.com>
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      14f59796
    • Anand Jain's avatar
      btrfs: fix null pointer dereference in btrfs_show_devname when name is null · 0aeb8a6e
      Anand Jain authored
      dev->name is null but missing flag is not set.
      Strictly speaking the missing flag should have been set, but there
      are more places where code just checks if name is null. For now this
      patch does the same.
      
      stack:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000064
      IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs]
      
      [<ffffffff81198879>] show_vfsmnt+0x39/0x130
      [<ffffffff81178056>] m_show+0x16/0x20
      [<ffffffff8117d706>] seq_read+0x296/0x390
      [<ffffffff8115aa7d>] vfs_read+0x9d/0x160
      [<ffffffff8115b549>] SyS_read+0x49/0x90
      [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
      
      reproducer:
      mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
      btrfstune -S 1 /dev/sdg1
      modprobe -r btrfs && modprobe btrfs
      mount -o degraded /dev/sdg1 /btrfs
      btrfs dev add /dev/sdg3 /btrfs
      Signed-off-by: default avatarAnand Jain <Anand.Jain@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      0aeb8a6e
    • Anand Jain's avatar
      btrfs: fix null pointer dereference in clone_fs_devices when name is null · e755f780
      Anand Jain authored
      when one of the device path is missing btrfs_device name is null. So this
      patch will check for that.
      
      stack:
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff812e18c0>] strlen+0x0/0x30
      [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
      [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
      [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
      [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
      [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
      [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
      [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
      [<ffffffff81193426>] ? blkdev_put+0x106/0x110
      [<ffffffff81179175>] ? mntput+0x35/0x40
      [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
      [<ffffffff8115c72e>] ? ____fput+0xe/0x10
      [<ffffffff81068033>] ? task_work_run+0xb3/0xd0
      [<ffffffff8116d547>] SyS_ioctl+0x57/0x90
      [<ffffffff817a793e>] ? do_page_fault+0xe/0x10
      [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
      
      reproducer:
      mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
      btrfstune -S 1 /dev/sdg1
      modprobe -r btrfs && modprobe btrfs
      mount -o degraded /dev/sdg1 /btrfs
      btrfs dev add /dev/sdg3 /btrfs
      Signed-off-by: default avatarAnand Jain <Anand.Jain@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e755f780
    • Eric Sandeen's avatar
      btrfs: fix nossd and ssd_spread mount option regression · 2aa06a35
      Eric Sandeen authored
      The commit
      
      07802534 btrfs: Cleanup the btrfs_parse_options for remount.
      
      broke ssd options quite badly; it stopped making ssd_spread
      imply ssd, and it made "nossd" unsettable.
      
      Put things back at least as well as they were before
      (though ssd mount option handling is still pretty odd:
      # mount -o "nossd,ssd_spread" works?)
      Reported-by: default avatarRoman Mamedov <rm@romanrm.net>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      2aa06a35
    • Wang Shilong's avatar
      Btrfs: fix race between balance recovery and root deletion · 5f316481
      Wang Shilong authored
      Balance recovery is called when RW mounting or remounting from
      RO to RW, it is called to finish roots merging.
      
      When doing balance recovery, relocation root's corresponding
      fs root(whose root refs is 0) might be destroyed by cleaner
      thread, this will make btrfs fail to mount.
      
      Fix this problem by holding @cleaner_mutex when doing balance
      recovery.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5f316481
    • Filipe Manana's avatar
      Btrfs: atomically set inode->i_flags in btrfs_update_iflags · 3cc79392
      Filipe Manana authored
      This change is based on the corresponding recent change for ext4:
      
        ext4: atomically set inode->i_flags in ext4_set_inode_flags()
      
      That has the following commit message that applies to btrfs as well:
      
        "Use cmpxchg() to atomically set i_flags instead of clearing out the
         S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
         EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
         where an immutable file has the immutable flag cleared for a brief
         window of time."
      
      Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE
      and BTRFS_INODE_APPEND, respectively.
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.cz>
      Reviewed-by: default avatarSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3cc79392
  3. 28 Jun, 2014 9 commits
  4. 19 Jun, 2014 9 commits
    • Miao Xie's avatar
      Btrfs: fix wrong error handle when the device is missing or is not writeable · 8408c716
      Miao Xie authored
      The original bio might be submitted, so we shoud increase bi_remaining to
      account for it when we deal with the error that the device is missing or
      is not writeable, or we would skip the endio handle.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      8408c716
    • Miao Xie's avatar
      Btrfs: fix deadlock when mounting a degraded fs · c55f1396
      Miao Xie authored
      The deadlock happened when we mount degraded filesystem, the reproduced
      steps are following:
       # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
       # echo 1 > /sys/block/`basename <dev0>`/device/delete
       # mount -o degraded <dev1> <mnt>
      
      The reason was that the counter -- bi_remaining was wrong. If the missing
      or unwriteable device was the last device in the mapping array, we would
      not submit the original bio, so we shouldn't increase bi_remaining of it
      in btrfs_end_bio(), or we would skip the final endio handle.
      
      Fix this problem by adding a flag into btrfs bio structure. If we submit
      the original bio, we will set the flag, and we increase bi_remaining counter,
      or we don't.
      
      Though there is another way to fix it -- decrease bi_remaining counter of the
      original bio when we make sure the original bio is not submitted, this method
      need add more check and is easy to make mistake.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      c55f1396
    • Miao Xie's avatar
      Btrfs: use bio_endio_nodec instead of open code · e990f167
      Miao Xie authored
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e990f167
    • Wang Shilong's avatar
      Btrfs: fix NULL pointer crash when running balance and scrub concurrently · 298a8f9c
      Wang Shilong authored
      While running balance, scrub, fsstress concurrently we hit the
      following kernel crash:
      
      [56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132
      [56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
      [56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
      [56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0
      [56561.524325] Oops: 0000 [#1] SMP
      [....]
      [56561.527237] Call Trace:
      [56561.527309]  [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs]
      [56561.527392]  [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0
      [56561.527476]  [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs]
      [56561.527561]  [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs]
      [56561.527639]  [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0
      [56561.527712]  [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60
      [56561.527788]  [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0
      [56561.527870]  [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0
      [56561.527941]  [<ffffffff815707f7>] tracesys+0xdd/0xe2
      [...]
      [56561.528304] RIP  [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
      [56561.528395]  RSP <ffff88004c0f5be8>
      [56561.528454] CR2: 0000000000000078
      
      This is because in btrfs_relocate_chunk(), we will free @bdev directly while
      scrub may still hold extent mapping, and may access freed memory.
      
      Fix this problem by wrapping freeing @bdev work into free_extent_map() which
      is based on reference count.
      Reported-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      298a8f9c
    • Qu Wenruo's avatar
      btrfs: Skip scrubbing removed chunks to avoid -ENOENT. · ced96edc
      Qu Wenruo authored
      When run scrub with balance, sometimes -ENOENT will be returned, since
      in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but
      btrfs_lookup_block_group() will search block group in *MEMORY*, so if a
      chunk is removed but not committed, -ENOENT will be returned.
      
      However, there is no need to stop scrubbing since other chunks may be
      scrubbed without problem.
      
      So this patch changes the behavior to skip removed chunks and continue
      to scrub the rest.
      Signed-off-by: default avatarQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      ced96edc
    • Miao Xie's avatar
      Btrfs: fix broken free space cache after the system crashed · e570fd27
      Miao Xie authored
      When we mounted the filesystem after the crash, we got the following
      message:
        BTRFS error (device xxx): block group xxxx has wrong amount of free space
        BTRFS error (device xxx): failed to load free space cache for block group xxx
      
      It is because we didn't update the metadata of the allocated space (in extent
      tree) until the file data was written into the disk. During this time, there was
      no information about the allocated spaces in either the extent tree nor the
      free space cache. when we wrote out the free space cache at this time (commit
      transaction), those spaces were lost. In fact, only the free space that is
      used to store the file data had this problem, the others didn't because
      the metadata of them is updated in the same transaction context.
      
      There are many methods which can fix the above problem
      - track the allocated space, and write it out when we write out the free
        space cache
      - account the size of the allocated space that is used to store the file
        data, if the size is not zero, don't write out the free space cache.
      
      The first one is complex and may make the performance drop down.
      This patch chose the second method, we use a per-block-group variant to
      account the size of that allocated space. Besides that, we also introduce
      a per-block-group read-write semaphore to avoid the race between
      the allocation and the free space cache write out.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      e570fd27
    • Miao Xie's avatar
      Btrfs: make free space cache write out functions more readable · 5349d6c3
      Miao Xie authored
      This patch makes the free space cache write out functions more readable,
      and beisdes that, it also reduces the stack space that the function --
      __btrfs_write_out_cache uses from 194bytes to 144bytes.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5349d6c3
    • Filipe Manana's avatar
      Btrfs: remove unused wait queue in struct extent_buffer · 46fefe41
      Filipe Manana authored
      The lock_wq wait queue is not used anywhere, therefore just remove it.
      On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320
      bytes down to 296 bytes, which means a 4Kb page can now be used for
      13 extent buffers instead of 12.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      46fefe41
    • Chris Mason's avatar
      Btrfs: fix deadlocks with trylock on tree nodes · ea4ebde0
      Chris Mason authored
      The Btrfs tree trylock function is poorly named.  It always takes
      the spinlock and backs off if the blocking lock is held.  This
      can lead to surprising lockups because people expect it to really be a
      trylock.
      
      This commit makes it a pure trylock, both for the spinlock and the
      blocking lock.  It also reworks the nested lock handling slightly to
      avoid taking the read lock while a spinning write lock might be held.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      ea4ebde0
  5. 13 Jun, 2014 10 commits
    • Eric Sandeen's avatar
      btrfs: fix error handling in create_pending_snapshot · 47a306a7
      Eric Sandeen authored
      fcebe456 cut and pasted some code to a later point
      in create_pending_snapshot(), but didn't switch
      to the appropriate error handling for this stage
      of the function.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      47a306a7
    • Eric Sandeen's avatar
      btrfs: fix use of uninit "ret" in end_extent_writepage() · 3e2426bd
      Eric Sandeen authored
      If this condition in end_extent_writepage() is false:
      
      	if (tree->ops && tree->ops->writepage_end_io_hook)
      
      we will then test an uninitialized "ret" at:
      
      	ret = ret < 0 ? ret : -EIO;
      
      The test for ret is for the case where ->writepage_end_io_hook
      failed, and we'd choose that ret as the error; but if
      there is no ->writepage_end_io_hook, nothing sets ret.
      
      Initializing ret to 0 should be sufficient; if
      writepage_end_io_hook wasn't set, (!uptodate) means
      non-zero err was passed in, so we choose -EIO in that case.
      Signed-of-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      3e2426bd
    • Eric Sandeen's avatar
      btrfs: free ulist in qgroup_shared_accounting() error path · d7372780
      Eric Sandeen authored
      If tmp = ulist_alloc(GFP_NOFS) fails, we return without
      freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
      and cause a memory leak.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      d7372780
    • Filipe Manana's avatar
      Btrfs: fix qgroups sanity test crash or hang · b050f9f6
      Filipe Manana authored
      Often when running the qgroups sanity test, a crash or a hang happened.
      This is because the extent buffer the test uses for the root node doesn't
      have an header level explicitly set, making it have a random level value.
      This is a problem when it's not zero for the btrfs_search_slot() calls
      the test ends up doing, resulting in crashes or hangs such as the following:
      
      [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on
      (...)
      [ 6454.127760] BTRFS: selftest: Running qgroup tests
      [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup
      [ 6454.127966] BTRFS: selftest: Qgroup basic add
      [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383]
      [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs]
      [ 6480.152005] irq event stamp: 188448
      [ 6480.152005] hardirqs last  enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30
      [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80
      [ 6480.152005] softirqs last  enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450
      [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0
      [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4
      [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000
      [ 6480.152005] RIP: 0010:[<ffffffff81349a63>]  [<ffffffff81349a63>] __write_lock_failed+0x13/0x20
      [ 6480.152005] RSP: 0018:ffff8800d0d038e8  EFLAGS: 00000287
      [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852
      [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8
      [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb
      [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858
      [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0
      [ 6480.152005] FS:  00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000
      [ 6480.152005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0
      [ 6480.152005] Stack:
      [ 6480.152005]  ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8
      [ 6480.152005]  ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007
      [ 6480.152005]  ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16
      [ 6480.152005] Call Trace:
      [ 6480.152005]  [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0
      [ 6480.152005]  [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80
      [ 6480.152005]  [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
      [ 6480.152005]  [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs]
      [ 6480.152005]  [<ffffffff811a727f>] ? create_object+0x23f/0x300
      [ 6480.152005]  [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
      [ 6480.152005]  [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs]
      [ 6480.152005]  [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs]
      [ 6480.152005]  [<ffffffff8108cad6>] ? local_clock+0x16/0x30
      [ 6480.152005]  [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs]
      [ 6480.152005]  [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
      [ 6480.152005]  [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs]
      [ 6480.152005]  [<ffffffff81000352>] do_one_initcall+0x102/0x150
      [ 6480.152005]  [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50
      [ 6480.152005]  [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74
      [ 6480.152005]  [<ffffffff810d91cc>] load_module+0x1cdc/0x2630
      (...)
      
      Therefore initialize the extent buffer as an empty leaf (level 0).
      
      Issue easy to reproduce when btrfs is built as a module via:
      
          $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      b050f9f6
    • Sasha Levin's avatar
      btrfs: prevent RCU warning when dereferencing radix tree slot · f1e3c289
      Sasha Levin authored
      Mark the dereference as protected by lock. Not doing so triggers
      an RCU warning since the radix tree assumed that RCU is in use.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      f1e3c289
    • Wang Shilong's avatar
      Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting · 5fbc7c59
      Wang Shilong authored
      Steps to reproduce:
      
       # mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5
       # mkfs.ext4 /dev/sdc --->corrupt one of btrfs device
       # mount /dev/sdb /mnt -o degraded
       # btrfs scrub start -BRd /mnt
      
      This is because readahead would skip missing device, this is not true
      for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block
      mapping. If expected data locates in missing device, readahead thread
      would not call __readahead_hook() which makes event @rc->elems=0
      wait forever.
      
      Fix this problem by checking return value of btrfs_map_block(),we
      can only skip missing device safely if there are several mirrors.
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      5fbc7c59
    • Gerhard Heift's avatar
      btrfs: new ioctl TREE_SEARCH_V2 · cc68a8a5
      Gerhard Heift authored
      This new ioctl call allows the user to supply a buffer of varying size in which
      a tree search can store its results. This is much more flexible if you want to
      receive items which are larger than the current fixed buffer of 3992 bytes or
      if you want to fetch more items at once. Items larger than this buffer are for
      example some of the type EXTENT_CSUM.
      Signed-off-by: default avatarGerhard Heift <Gerhard@Heift.Name>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarDavid Sterba <dsterba@suse.cz>
      cc68a8a5
    • Gerhard Heift's avatar
      btrfs: tree_search, search_ioctl: direct copy to userspace · ba346b35
      Gerhard Heift authored
      By copying each found item seperatly to userspace, we do not need extra
      buffer in the kernel.
      Signed-off-by: default avatarGerhard Heift <Gerhard@Heift.Name>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarDavid Sterba <dsterba@suse.cz>
      ba346b35
    • Gerhard Heift's avatar
      btrfs: new function read_extent_buffer_to_user · 550ac1d8
      Gerhard Heift authored
      This new function reads the content of an extent directly to user memory.
      Signed-off-by: default avatarGerhard Heift <Gerhard@Heift.Name>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarDavid Sterba <dsterba@suse.cz>
      550ac1d8
    • Gerhard Heift's avatar
      btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW · 9b6e817d
      Gerhard Heift authored
      If an item in tree_search is too large to be stored in the given buffer, return
      the needed size (including the header).
      Signed-off-by: default avatarGerhard Heift <Gerhard@Heift.Name>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarDavid Sterba <dsterba@suse.cz>
      9b6e817d