1. 02 Jul, 2014 29 commits
  2. 27 Jun, 2014 11 commits
    • Goldwyn Rodrigues's avatar
      ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock · fc597a30
      Goldwyn Rodrigues authored
      commit 8ed6b237 upstream.
      
      The following patches are reverted in this patch because these patches
      caused performance regression in the remote unlink() calls.
      
        ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq
        f7b1aa69 - ocfs2: Fix deadlock on umount
        5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
      
      Previous patches in this series removed the possible deadlocks from
      downconvert thread so the above patches shouldn't be needed anymore.
      
      The regression is caused because these patches delay the iput() in case
      of dentry unlocks.  This also delays the unlocking of the open lockres.
      The open lockresource is required to test if the inode can be wiped from
      disk or not.  When the deleting node does not get the open lock, it
      marks it as orphan (even though it is not in use by another
      node/process) and causes a journal checkpoint.  This delays operations
      following the inode eviction.  This also moves the inode to the orphaned
      inode which further causes more I/O and a lot of unneccessary orphans.
      
      The following script can be used to generate the load causing issues:
      
        declare -a create
        declare -a remove
        declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
        unique="`mktemp -u XXXXX`"
        script="/tmp/idontknow-${unique}.sh"
        cat <<EOF > "${script}"
        for n in {1..8}; do mkdir -p test/dir\${n}
          eval touch test/dir\${n}/foo{1.."\$1"}
        done
        EOF
        chmod 700 "${script}"
      
        function fcreate ()
        {
          exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
        }
      
        function fremove ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
        }
      
        function fcp ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
        }
      
        echo -------------------------------------------------
        echo "| # files | create #s | copy #s | remove #s |"
        echo -------------------------------------------------
        for ((x=0; x < ${#iterations[*]} ; x++)) do
          create[$x]="`fcreate ${iterations[$x]}`"
          copy[$x]="`fcp ${iterations[$x]}`"
          remove[$x]="`fremove`"
          printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
        done
        rm "${script}"
        echo "------------------------"
      Signed-off-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fc597a30
    • Jan Kara's avatar
      ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert thread · 8f265718
      Jan Kara authored
      commit 84d86f83 upstream.
      
      If we are dropping last inode reference from downconvert thread, we will
      end up calling ocfs2_mark_lockres_freeing() which can block if the lock
      we are freeing is queued thus creating an A-A deadlock.  Luckily, since
      we are the downconvert thread, we can immediately dequeue the lock and
      thus avoid waiting in this case.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8f265718
    • Jan Kara's avatar
      ocfs2: implement delayed dropping of last dquot reference · cac99bee
      Jan Kara authored
      commit e3a767b6 upstream.
      
      We cannot drop last dquot reference from downconvert thread as that
      creates the following deadlock:
      
      NODE 1                                  NODE2
      holds dentry lock for 'foo'
      holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                              dquot_initialize(bar)
                                                ocfs2_dquot_acquire()
                                                  ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                                  ...
      downconvert thread (triggered from another
      node or a different process from NODE2)
        ocfs2_dentry_post_unlock()
          ...
          iput(foo)
            ocfs2_evict_inode(foo)
              ocfs2_clear_inode(foo)
                dquot_drop(inode)
                  ...
      	    ocfs2_dquot_release()
                    ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                     - blocks
                                                  finds we need more space in
                                                  quota file
                                                  ...
                                                  ocfs2_extend_no_holes()
                                                    ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                      - deadlocks waiting for
                                                        downconvert thread
      
      We solve the problem by postponing dropping of the last dquot reference to
      a workqueue if it happens from the downconvert thread.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cac99bee
    • Jan Kara's avatar
      quota: provide function to grab quota structure reference · b5258061
      Jan Kara authored
      commit 9f985cb6 upstream.
      
      Provide dqgrab() function to get quota structure reference when we are
      sure it already has at least one active reference.  Make use of this
      function inside quota code.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b5258061
    • Jan Kara's avatar
      ocfs2: move dquot_initialize() in ocfs2_delete_inode() somewhat later · 2eb0658f
      Jan Kara authored
      commit bd62ad7a upstream.
      
      Move dquot_initalize() call in ocfs2_delete_inode() after the moment we
      verify inode is actually a sane one to delete.  We certainly don't want
      to initialize quota for system inodes etc.  This also avoids calling
      into quota code from downconvert thread.
      
      Add more details into the comment why bailing out from
      ocfs2_delete_inode() when we are in downconvert thread is OK.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2eb0658f
    • Lidong Zhong's avatar
      dlm: keep listening connection alive with sctp mode · 34b6c049
      Lidong Zhong authored
      commit 883854c5 upstream.
      
      The connection struct with nodeid 0 is the listening socket,
      not a connection to another node.  The sctp resend function
      was not checking that the nodeid was valid (non-zero), so it
      would mistakenly get and resend on the listening connection
      when nodeid was zero.
      Signed-off-by: default avatarLidong Zhong <lzhong@suse.com>
      Signed-off-by: default avatarDavid Teigland <teigland@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      34b6c049
    • Miao Xie's avatar
      Btrfs: fix BUG_ON() casued by the reserved space migration · 9bf37c05
      Miao Xie authored
      commit 20dd2cbf upstream.
      
      When we did space balance and snapshot creation at the same time, we might
      meet the following oops:
       kernel BUG at fs/btrfs/inode.c:3038!
       [SNIP]
       Call Trace:
       [<ffffffffa0411ec7>] btrfs_orphan_cleanup+0x293/0x407 [btrfs]
       [<ffffffffa042dc45>] btrfs_mksubvol.isra.28+0x259/0x373 [btrfs]
       [<ffffffffa042de85>] btrfs_ioctl_snap_create_transid+0x126/0x156 [btrfs]
       [<ffffffffa042dff1>] btrfs_ioctl_snap_create_v2+0xd0/0x121 [btrfs]
       [<ffffffffa0430b2c>] btrfs_ioctl+0x414/0x1854 [btrfs]
       [<ffffffff813b60b7>] ? __do_page_fault+0x305/0x379
       [<ffffffff811215a9>] vfs_ioctl+0x1d/0x39
       [<ffffffff81121d7c>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff81057fe7>] ? finish_task_switch+0x80/0xb8
       [<ffffffff81121e88>] SyS_ioctl+0x57/0x83
       [<ffffffff813b39ff>] ? do_device_not_available+0x12/0x14
       [<ffffffff813b99c2>] system_call_fastpath+0x16/0x1b
       [SNIP]
       RIP  [<ffffffffa040da40>] btrfs_orphan_add+0xc3/0x126 [btrfs]
      
      The reason of the problem is that the relocation root creation stole
      the reserved space, which was reserved for orphan item deletion.
      
      There are several ways to fix this problem, one is to increasing
      the reserved space size of the space balace, and then we can use
      that space to create the relocation tree for each fs/file trees.
      But it is hard to calculate the suitable size because we doesn't
      know how many fs/file trees we need relocate.
      
      We fixed this problem by reserving the space for relocation root creation
      actively since the space it need is very small (one tree block, used for
      root node copy), then we use that reserved space to create the
      relocation tree. If we don't reserve space for relocation tree creation,
      we will use the reserved space of the balance.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9bf37c05
    • Josef Bacik's avatar
      Btrfs: fix two use-after-free bugs with transaction cleanup · 8b16b61c
      Josef Bacik authored
      commit 724e2315 upstream.
      
      I was noticing the slab redzone stuff going off every once and a while during
      transaction aborts.  This was caused by two things
      
      1) We would walk the pending snapshots and set their error to -ECANCELED.  We
      don't need to do this, the snapshot stuff waits for a transaction commit and if
      there is a problem we just free our pending snapshot object and exit.  Doing
      this was causing us to touch the pending snapshot object after the thing had
      already been freed.
      
      2) We were freeing the transaction manually with wanton disregard for it's
      use_count reference counter.  To fix this I cleaned up the transaction freeing
      loop to either wait for the transaction commit to finish if it was in the middle
      of that (since it will be cleaned and freed up there) or to do the cleanup
      oursevles.
      
      I also moved the global "kill all things dirty everywhere" stuff outside of the
      transaction cleanup loop since that only needs to be done once.  With this patch
      I'm no longer seeing slab corruption because of use after frees.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8b16b61c
    • Josef Bacik's avatar
      Btrfs: don't delete ordered roots from list during cleanup · 445d1c3a
      Josef Bacik authored
      commit 1de2cfde upstream.
      
      During transaction cleanup after an abort we are just removing roots from the
      ordered roots list which is incorrect.  We have a BUG_ON() to make sure that the
      root is still part of the ordered roots list when we put our ordered extent
      which we were tripping in this case.  So do like we do everywhere else and just
      move it to the tail of the ordered roots list and allow the normal cleanup to
      take care of stuff.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      445d1c3a
    • Josef Bacik's avatar
      Btrfs: cleanup transaction on abort · bd32872f
      Josef Bacik authored
      commit 4e121c06 upstream.
      
      If we abort not during a transaction commit we won't clean up anything until we
      unmount.  Unfortunately if we abort in the middle of writing out an ordered
      extent we won't clean it up and if somebody is waiting on that ordered extent
      they will wait forever.  To fix this just make the transaction kthread call the
      cleanup transaction stuff if it notices theres an error, and make
      btrfs_end_transaction wake up the transaction kthread if there is an error.
      Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd32872f
    • Josef Bacik's avatar
      Btrfs: do not release metadata for space cache inodes · 4b6d66d1
      Josef Bacik authored
      commit b6d08f06 upstream.
      
      I've been testing our error paths and I was tripping the BUG_ON() in
      drop_outstanding_extent because our outstanding_extents is 0 for space cache
      inodes.  This is because we don't reserve metadata space for these inodes since
      we depend on the global block reserve for our space.  To fix this we need to
      make sure the DO_ACCOUNTING stuff doesn't actually call release_metadata for
      space cache inodes.  With this patch I'm no longer panicing.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4b6d66d1