1. 20 Jan, 2016 4 commits
    • Filipe Manana's avatar
      Btrfs: fix deadlock running delayed iputs at transaction commit time · c2d6cb16
      Filipe Manana authored
      While running a stress test I ran into a deadlock when running the delayed
      iputs at transaction time, which produced the following report and trace:
      
      [  886.399989] =============================================
      [  886.400871] [ INFO: possible recursive locking detected ]
      [  886.401663] 4.4.0-rc6-btrfs-next-18+ #1 Not tainted
      [  886.402384] ---------------------------------------------
      [  886.403182] fio/8277 is trying to acquire lock:
      [  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] but task is already holding lock:
      [  886.403568]  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] other info that might help us debug this:
      [  886.403568]  Possible unsafe locking scenario:
      [  886.403568]
      [  886.403568]        CPU0
      [  886.403568]        ----
      [  886.403568]   lock(&fs_info->delayed_iput_sem);
      [  886.403568]   lock(&fs_info->delayed_iput_sem);
      [  886.403568]
      [  886.403568]  *** DEADLOCK ***
      [  886.403568]
      [  886.403568]  May be due to missing lock nesting notation
      [  886.403568]
      [  886.403568] 3 locks held by fio/8277:
      [  886.403568]  #0:  (sb_writers#11){.+.+.+}, at: [<ffffffff81174c4c>] __sb_start_write+0x5f/0xb0
      [  886.403568]  #1:  (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffffa054620d>] btrfs_file_write_iter+0x73/0x408 [btrfs]
      [  886.403568]  #2:  (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.403568]
      [  886.403568] stack backtrace:
      [  886.403568] CPU: 6 PID: 8277 Comm: fio Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [  886.403568] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [  886.403568]  0000000000000000 ffff88009f80f770 ffffffff8125d4fd ffffffff82af1fc0
      [  886.403568]  ffff88009f80f830 ffffffff8108e5f9 0000000200000000 ffff88009fd92290
      [  886.403568]  0000000000000000 ffffffff82af1fc0 ffffffff829cfb01 00042b216d008804
      [  886.403568] Call Trace:
      [  886.403568]  [<ffffffff8125d4fd>] dump_stack+0x4e/0x79
      [  886.403568]  [<ffffffff8108e5f9>] __lock_acquire+0xd42/0xf0b
      [  886.403568]  [<ffffffff810c22db>] ? __module_address+0xdf/0x108
      [  886.403568]  [<ffffffff8108eb77>] lock_acquire+0x10d/0x194
      [  886.403568]  [<ffffffff8108eb77>] ? lock_acquire+0x10d/0x194
      [  886.403568]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffff8148556b>] down_read+0x3e/0x4d
      [  886.489542]  [<ffffffffa0538823>] ? btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0538823>] btrfs_run_delayed_iputs+0x36/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
      [  886.489542]  [<ffffffffa0521d7a>] flush_space+0x435/0x44a [btrfs]
      [  886.489542]  [<ffffffffa052218b>] ? reserve_metadata_bytes+0x26a/0x384 [btrfs]
      [  886.489542]  [<ffffffffa05221ae>] reserve_metadata_bytes+0x28d/0x384 [btrfs]
      [  886.489542]  [<ffffffffa052256c>] ? btrfs_block_rsv_refill+0x58/0x96 [btrfs]
      [  886.489542]  [<ffffffffa0522584>] btrfs_block_rsv_refill+0x70/0x96 [btrfs]
      [  886.489542]  [<ffffffffa053d747>] btrfs_evict_inode+0x394/0x55a [btrfs]
      [  886.489542]  [<ffffffff81188e31>] evict+0xa7/0x15c
      [  886.489542]  [<ffffffff81189878>] iput+0x1d3/0x266
      [  886.489542]  [<ffffffffa053887c>] btrfs_run_delayed_iputs+0x8f/0xbf [btrfs]
      [  886.489542]  [<ffffffffa0533953>] btrfs_commit_transaction+0x8f5/0x96e [btrfs]
      [  886.489542]  [<ffffffff81085096>] ? signal_pending_state+0x31/0x31
      [  886.489542]  [<ffffffffa0521191>] btrfs_alloc_data_chunk_ondemand+0x1d7/0x288 [btrfs]
      [  886.489542]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
      [  886.489542]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
      [  886.489542]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
      [  886.489542]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
      [  886.489542]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
      [  886.489542]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
      [  886.489542]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
      [  886.489542]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
      [  886.489542]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
      [  886.489542]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [ 1081.852335] INFO: task fio:8244 blocked for more than 120 seconds.
      [ 1081.854348]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [ 1081.857560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1081.863227] fio        D ffff880213f9bb28     0  8244   8240 0x00000000
      [ 1081.868719]  ffff880213f9bb28 00ffffff810fc6b0 ffffffff0000000a ffff88023ed55240
      [ 1081.872499]  ffff880206b5d400 ffff880213f9c000 ffff88020a4d5318 ffff880206b5d400
      [ 1081.876834]  ffffffff00000001 ffff880206b5d400 ffff880213f9bb40 ffffffff81482ba4
      [ 1081.880782] Call Trace:
      [ 1081.881793]  [<ffffffff81482ba4>] schedule+0x7f/0x97
      [ 1081.883340]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
      [ 1081.895525]  [<ffffffff8108d48d>] ? trace_hardirqs_on_caller+0x16/0x1ab
      [ 1081.897419]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
      [ 1081.899251]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
      [ 1081.901063]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
      [ 1081.902365]  [<ffffffff814855bd>] down_write+0x43/0x57
      [ 1081.903846]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1081.906078]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1081.908846]  [<ffffffff8108d461>] ? mark_held_locks+0x56/0x6c
      [ 1081.910409]  [<ffffffffa0521282>] btrfs_check_data_free_space+0x40/0x59 [btrfs]
      [ 1081.912482]  [<ffffffffa05228f5>] btrfs_delalloc_reserve_space+0x1e/0x4e [btrfs]
      [ 1081.914597]  [<ffffffffa053620a>] btrfs_direct_IO+0x10c/0x27e [btrfs]
      [ 1081.919037]  [<ffffffff8111d9a1>] generic_file_direct_write+0xb3/0x128
      [ 1081.920754]  [<ffffffffa05463c3>] btrfs_file_write_iter+0x229/0x408 [btrfs]
      [ 1081.922496]  [<ffffffff8108ae38>] ? __lock_is_held+0x38/0x50
      [ 1081.923922]  [<ffffffff8117279e>] __vfs_write+0x7c/0xa5
      [ 1081.925275]  [<ffffffff81172cda>] vfs_write+0xa0/0xe4
      [ 1081.926584]  [<ffffffff811734cc>] SyS_write+0x50/0x7e
      [ 1081.927968]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      [ 1081.985293] INFO: lockdep is turned off.
      [ 1081.986132] INFO: task fio:8249 blocked for more than 120 seconds.
      [ 1081.987434]       Not tainted 4.4.0-rc6-btrfs-next-18+ #1
      [ 1081.988534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 1081.990147] fio        D ffff880218febbb8     0  8249   8240 0x00000000
      [ 1081.991626]  ffff880218febbb8 00ffffff81486b8e ffff88020000000b ffff88023ed75240
      [ 1081.993258]  ffff8802120a9a00 ffff880218fec000 ffff88020a4d5318 ffff8802120a9a00
      [ 1081.994850]  ffffffff00000001 ffff8802120a9a00 ffff880218febbd0 ffffffff81482ba4
      [ 1081.996485] Call Trace:
      [ 1081.997037]  [<ffffffff81482ba4>] schedule+0x7f/0x97
      [ 1081.998017]  [<ffffffff81485eb5>] rwsem_down_write_failed+0x2d5/0x325
      [ 1081.999241]  [<ffffffff810852a5>] ? finish_wait+0x6d/0x76
      [ 1082.000306]  [<ffffffff81269723>] call_rwsem_down_write_failed+0x13/0x20
      [ 1082.001533]  [<ffffffff81269723>] ? call_rwsem_down_write_failed+0x13/0x20
      [ 1082.002776]  [<ffffffff81089fae>] ? __down_write_nested.isra.0+0x1f/0x21
      [ 1082.003995]  [<ffffffff814855bd>] down_write+0x43/0x57
      [ 1082.005000]  [<ffffffffa05211b0>] ? btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1082.007403]  [<ffffffffa05211b0>] btrfs_alloc_data_chunk_ondemand+0x1f6/0x288 [btrfs]
      [ 1082.008988]  [<ffffffffa0545064>] btrfs_fallocate+0x7c1/0xc2f [btrfs]
      [ 1082.010193]  [<ffffffff8108a1ba>] ? percpu_down_read+0x4e/0x77
      [ 1082.011280]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
      [ 1082.012265]  [<ffffffff81174c4c>] ? __sb_start_write+0x5f/0xb0
      [ 1082.013021]  [<ffffffff811712e4>] vfs_fallocate+0x170/0x1ff
      [ 1082.013738]  [<ffffffff81181ebb>] ioctl_preallocate+0x89/0x9b
      [ 1082.014778]  [<ffffffff811822d7>] do_vfs_ioctl+0x40a/0x4ea
      [ 1082.015778]  [<ffffffff81176ea7>] ? SYSC_newfstat+0x25/0x2e
      [ 1082.016806]  [<ffffffff8118b4de>] ? __fget_light+0x4d/0x71
      [ 1082.017789]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
      [ 1082.018706]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      This happens because we can recursively acquire the semaphore
      fs_info->delayed_iput_sem when attempting to allocate space to satisfy
      a file write request as shown in the first trace above - when committing
      a transaction we acquire (down_read) the semaphore before running the
      delayed iputs, and when running a delayed iput() we can end up calling
      an inode's eviction handler, which in turn commits another transaction
      and attempts to acquire (down_read) again the semaphore to run more
      delayed iput operations.
      This results in a deadlock because if a task acquires multiple times a
      semaphore it should invoke down_read_nested() with a different lockdep
      class for each level of recursion.
      
      Fix this by simplifying the implementation and use a mutex instead that
      is acquired by the cleaner kthread before it runs the delayed iputs
      instead of always acquiring a semaphore before delayed references are
      run from anywhere.
      
      Fixes: d7c15171 (btrfs: Fix NO_SPACE bug caused by delayed-iput)
      Cc: stable@vger.kernel.org   # 4.1+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      c2d6cb16
    • Filipe Manana's avatar
      Btrfs: fix typo in log message when starting a balance · fedc0045
      Filipe Manana authored
      The recent change titled "Btrfs: Check metadata redundancy on balance"
      (already in linux-next) left a typo in a message for users:
      metatdata -> metadata.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      fedc0045
    • Chris Mason's avatar
      Merge branch 'misc-for-4.5' of... · 326f7842
      Chris Mason authored
      Merge branch 'misc-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
      326f7842
    • Chris Mason's avatar
      Merge branch 'misc-cleanups-4.5' of... · acc30855
      Chris Mason authored
      Merge branch 'misc-cleanups-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
      acc30855
  2. 19 Jan, 2016 1 commit
  3. 15 Jan, 2016 6 commits
    • Sebastian Andrzej Siewior's avatar
      btrfs: initialize the seq counter in struct btrfs_device · 546bed63
      Sebastian Andrzej Siewior authored
      I managed to trigger this:
      | INFO: trying to register non-static key.
      | the code is fine but needs lockdep annotation.
      | turning off the locking correctness validator.
      | CPU: 1 PID: 781 Comm: systemd-gpt-aut Not tainted 4.4.0-rt2+ #14
      | Hardware name: ARM-Versatile Express
      | [<80307cec>] (dump_stack)
      | [<80070e98>] (__lock_acquire)
      | [<8007184c>] (lock_acquire)
      | [<80287800>] (btrfs_ioctl)
      | [<8012a8d4>] (do_vfs_ioctl)
      | [<8012ac14>] (SyS_ioctl)
      
      so I think that btrfs_device_data_ordered_init() is not invoked behind
      a macro somewhere.
      
      Fixes: 7cc8e58d ("Btrfs: fix unprotected device's variants on 32bits machine")
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      546bed63
    • Dan Carpenter's avatar
      Btrfs: clean up an error code in btrfs_init_space_info() · 0dc924c5
      Dan Carpenter authored
      If we return 1 here, then the caller treats it as an error and returns
      -EINVAL.  It causes a static checker warning to treat positive returns
      as an error.
      
      Fixes: 1aba86d6 ('Btrfs: fix easily get into ENOSPC in mixed case')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0dc924c5
    • Geliang Tang's avatar
      btrfs: fix iterator with update error in backref.c · 8e217858
      Geliang Tang authored
      Fix the following error:
      
      fs/btrfs/backref.c:565:1-20: iterator with update on line 577
      
      Fixes: a7ca4225('btrfs: use list_for_each_entry* in backref.c')
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8e217858
    • Tsutomu Itoh's avatar
      Btrfs: fix output of compression message in btrfs_parse_options() · b7c47bbb
      Tsutomu Itoh authored
      The compression message might not be correctly output.
      Fix it.
      
      [[before fix]]
      
      # mount -o compress /dev/sdb3 /test3
      [  996.874264] BTRFS info (device sdb3): disk space caching is enabled
      [  996.874268] BTRFS: has skinny extents
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress-force /dev/sdb3 /test3
      [ 1035.075017] BTRFS info (device sdb3): force zlib compression
      [ 1035.075021] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress /dev/sdb3 /test3
      [ 1053.679092] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      [[after fix]]
      
      # mount -o compress /dev/sdb3 /test3
      [  401.021753] BTRFS info (device sdb3): use zlib compression
      [  401.021758] BTRFS info (device sdb3): disk space caching is enabled
      [  401.021760] BTRFS: has skinny extents
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress-force /dev/sdb3 /test3
      [  439.824624] BTRFS info (device sdb3): force zlib compression
      [  439.824629] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress-force=zlib,space_cache,subvolid=5,subvol=/)
      
      # mount -o remount,compress /dev/sdb3 /test3
      [  459.918430] BTRFS info (device sdb3): use zlib compression
      [  459.918434] BTRFS info (device sdb3): disk space caching is enabled
      # mount | grep /test3
      /dev/sdb3 on /test3 type btrfs (rw,relatime,compress=zlib,space_cache,subvolid=5,subvol=/)
      Signed-off-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b7c47bbb
    • Chandan Rajendra's avatar
      Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots · f32e48e9
      Chandan Rajendra authored
      The following call trace is seen when btrfs/031 test is executed in a loop,
      
      [  158.661848] ------------[ cut here ]------------
      [  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
      [  158.664102] BTRFS: Transaction aborted (error -2)
      [  158.664774] Modules linked in:
      [  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711af #2
      [  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
      [  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
      [  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
      [  158.670769] Call Trace:
      [  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
      [  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
      [  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
      [  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
      [  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
      [  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
      [  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
      [  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
      [  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
      [  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
      [  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
      [  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
      [  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
      [  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
      [  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
      [  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
      [  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
      [  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
      [  158.709508] BTRFS info (device loop0): forced readonly
      [  158.737113] BTRFS info (device loop0): disk space caching is enabled
      [  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
      [  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30
      
      This occurs because,
      
      Mount filesystem
      Create subvol with ID 257
      Unmount filesystem
      Mount filesystem
      Delete subvol with ID 257
        btrfs_drop_snapshot()
          Add root corresponding to subvol 257 into
          btrfs_transaction->dropped_roots list
      Create new subvol (i.e. create_subvol())
        257 is returned as the next free objectid
        btrfs_read_fs_root_no_name()
          Finds the btrfs_root instance corresponding to the old subvol with ID 257
          in btrfs_fs_info->fs_roots_radix.
          Returns error since btrfs_root_item->refs has the value of 0.
      
      To fix the issue the commit initializes tree root's and subvolume root's
      highest_objectid when loading the roots from disk.
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      f32e48e9
    • Jeff Mahoney's avatar
      btrfs: cleanup, stop casting for extent_map->lookup everywhere · 95617d69
      Jeff Mahoney authored
      Overloading extent_map->bdev to struct map_lookup * might have started out
      as a means to an end, but it's a pattern that's used all over the place
      now. Let's get rid of the casting and just add a union instead.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      95617d69
  4. 11 Jan, 2016 3 commits
  5. 07 Jan, 2016 26 commits