1. 26 Feb, 2016 7 commits
  2. 23 Feb, 2016 1 commit
    • Liu Bo's avatar
      Btrfs: fix lockdep deadlock warning due to dev_replace · 73beece9
      Liu Bo authored
      Xfstests btrfs/011 complains about a deadlock warning,
      
      [ 1226.649039] =========================================================
      [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
      [ 1226.649039] 4.1.0+ #270 Not tainted
      [ 1226.649039] ---------------------------------------------------------
      [ 1226.652955] kswapd0/46 just changed the state of lock:
      [ 1226.652955]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [ 1226.652955]  (&fs_info->dev_replace.lock){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      [ 1226.652955]
      other info that might help us debug this:
      [ 1226.652955] Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
      
      [ 1226.652955]  Possible interrupt unsafe locking scenario:
      
      [ 1226.652955]        CPU0                    CPU1
      [ 1226.652955]        ----                    ----
      [ 1226.652955]   lock(&fs_info->dev_replace.lock);
      [ 1226.652955]                                local_irq_disable();
      [ 1226.652955]                                lock(&delayed_node->mutex);
      [ 1226.652955]                                lock(&found->groups_sem);
      [ 1226.652955]   <Interrupt>
      [ 1226.652955]     lock(&delayed_node->mutex);
      [ 1226.652955]
       *** DEADLOCK ***
      
      Commit 084b6e7c ("btrfs: Fix a lockdep warning when running xfstest.") tried
      to fix a similar one that has the exactly same warning, but with that, we still
      run to this.
      
      The above lock chain comes from
      btrfs_commit_transaction
        ->btrfs_run_delayed_items
          ...
          ->__btrfs_update_delayed_inode
            ...
            ->__btrfs_cow_block
               ...
               ->find_free_extent
                  ->cache_block_group
                    ->load_free_space_cache
                      ->btrfs_readpages
                        ->submit_one_bio
                          ...
                          ->__btrfs_map_block
                            ->btrfs_dev_replace_lock
      
      However, with high memory pressure, tasks which hold dev_replace.lock can
      be interrupted by kswapd and then kswapd is intended to release memory occupied
      by superblock, inodes and dentries, where we may call evict_inode, and it comes
      to
      
      [ 1226.652955]  [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955]  [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
      [ 1226.652955]  [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700
      
      delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
      to a ABBA deadlock.
      
      To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
      things are simpler here since we only needs read's spinlock to blocking lock.
      
      With this, btrfs/011 no more produces warnings in dmesg.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      73beece9
  3. 18 Feb, 2016 17 commits
  4. 16 Feb, 2016 4 commits
    • Zhao Lei's avatar
      btrfs: reada: Avoid many times of empty loop · 97d5f0e6
      Zhao Lei authored
      We can see following loop(10000 times) in trace_log:
       [   75.416137] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [   75.417413] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
       [   75.418611] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [   75.419793] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
      
       [   75.421016] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [   75.422324] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
       [   75.423661] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [   75.424882] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
      
       ...(10000 times)
      
       [  124.101672] ZL_DEBUG: reada_start_machine_dev:730: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [  124.102850] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
       [  124.104008] ZL_DEBUG: __readahead_hook:129: pid=771 comm=kworker/u2:3 re->ref_cnt ffff88003741e0c0 1 -> 2
       [  124.105121] ZL_DEBUG: reada_extent_put:524: pid=771 comm=kworker/u2:3 re = ffff88003741e0c0, refcnt = 2 -> 1
      
      Reason:
       If more than one user trigger reada in same extent, the first task
       finished setting of reada data struct and call reada_start_machine()
       to start, and the second task only add a ref_count but have not
       add reada_extctl struct completely, the reada_extent can not finished
       all jobs, and will be selected in __reada_start_machine() for 10000
       times(total times in __reada_start_machine()).
      
      Fix:
       For a reada_extent without job, we don't need to run it, just return
       0 to let caller break.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      97d5f0e6
    • Zhao Lei's avatar
      btrfs: reada: Add missed segment checking in reada_find_zone · 8e9aa51f
      Zhao Lei authored
      In rechecking zone-in-tree, we still need to check zone include
      our logical address.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8e9aa51f
    • Zhao Lei's avatar
      btrfs: reada: reduce additional fs_info->reada_lock in reada_find_zone · c37f49c7
      Zhao Lei authored
      We can avoid additional locking-acquirment and one pair of
      kref_get/put by combine two condition.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      c37f49c7
    • Zhao Lei's avatar
      btrfs: reada: Fix in-segment calculation for reada · 50378530
      Zhao Lei authored
      reada_zone->end is end pos of segment:
       end = start + cache->key.offset - 1;
      
      So we need to use "<=" in condition to judge is a pos in the
      segment.
      
      The problem happened rearly, because logical pos rarely pointed
      to last 4k of a blockgroup, but we need to fix it to make code
      right in logic.
      Signed-off-by: default avatarZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      50378530
  5. 14 Feb, 2016 11 commits