1. 08 Jul, 2011 26 commits
  2. 06 Jul, 2011 1 commit
    • Dave Chinner's avatar
      xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da
      Dave Chinner authored
      When inodes are marked stale in a transaction, they are treated
      specially when the inode log item is being inserted into the AIL.
      It tries to avoid moving the log item forward in the AIL due to a
      race condition with the writing the underlying buffer back to disk.
      The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
      in the AIL").
      
      To avoid moving the item forward, we return a LSN smaller than the
      commit_lsn of the completing transaction, thereby trying to trick
      the commit code into not moving the inode forward at all. I'm not
      sure this ever worked as intended - it assumes the inode is already
      in the AIL, but I don't think the returned LSN would have been small
      enough to prevent moving the inode. It appears that the reason it
      worked is that the lower LSN of the inodes meant they were inserted
      into the AIL and flushed before the inode buffer (which was moved to
      the commit_lsn of the transaction).
      
      The big problem is that with delayed logging, the returning of the
      different LSN means insertion takes the slow, non-bulk path.  Worse
      yet is that insertion is to a position -before- the commit_lsn so it
      is doing a AIL traversal on every insertion, and has to walk over
      all the items that have already been inserted into the AIL. It's
      expensive.
      
      To compound the matter further, with delayed logging inodes are
      likely to go from clean to stale in a single checkpoint, which means
      they aren't even in the AIL at all when we come across them at AIL
      insertion time. Hence these were all getting inserted into the AIL
      when they simply do not need to be as inodes marked XFS_ISTALE are
      never written back.
      
      Transactional/recovery integrity is maintained in this case by the
      other items in the unlink transaction that were modified (e.g. the
      AGI btree blocks) and committed in the same checkpoint.
      
      So to fix this, simply unpin the stale inodes directly in
      xfs_inode_item_committed() and return -1 to indicate that the AIL
      insertion code does not need to do any further processing of these
      inodes.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      1316d4da
  3. 24 Jun, 2011 3 commits
    • Dave Chinner's avatar
      xfs: prevent bogus assert when trying to remove non-existent attribute · 4a338212
      Dave Chinner authored
      If the attribute fork on an inode is in btree format and has
      multiple levels (i.e node format rather than leaf format), then a
      lookup failure will trigger an assert failure in xfs_da_path_shift
      if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to
      indicate to the directory btree code that not finding an entry is
      not a fatal error. In the case of doing a lookup for a directory
      name removal, this is valid as a user cannot insert an arbitrary
      name to remove from the directory btree.
      
      However, in the case of the attribute tree, a user has direct
      control over the attribute name and can ask for any random name to
      be removed without any validation. In this case, fsstress is asking
      for a non-existent user.selinux attribute to be removed, and that is
      causing xfs_da_path_shift() to fall off the bottom of the tree where
      it asserts that a lookup failure is allowed. Because the flag is not
      set, we die a horrible death on a debug enable kernel.
      
      Prevent this assert from firing on attribute removes by adding the
      op_flag XFS_DA_OP_OKNOENT to atribute removal operations.
      
      Discovered when testing on a SELinux enabled system by fsstress in
      test 070 by trying to remove a non-existent user.selinux attribute.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      4a338212
    • Dave Chinner's avatar
      xfs: clear XFS_IDIRTY_RELEASE on truncate down · df4368a1
      Dave Chinner authored
      When an inode is truncated down, speculative preallocation is
      removed from the inode. This should also reset the state bits for
      controlling whether preallocation is subsequently removed when the
      file is next closed. The flag is not being cleared, so repeated
      operations on a file that first involve a truncate (e.g. multiple
      repeated dd invocations on a file) give different file layouts for
      the second and subsequent invocations.
      
      Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the
      XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure
      that speculative delalloc is removed on files that have been
      truncated down.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      df4368a1
    • Dave Chinner's avatar
      xfs: reset inode per-lifetime state when recycling it · 778e24bb
      Dave Chinner authored
      XFS inodes has several per-lifetime state fields that determine the
      behaviour of the inode. These state fields are not all reset when an
      inode is reused from the reclaimable state.
      
      This can lead to unexpected behaviour of the new inode such as
      speculative preallocation not being truncated away in the expected
      manner for local files until the inode is subsequently truncated,
      freed or cycles out of the cache. It can also lead to an inode being
      considered to be a filestream inode or having been truncated when
      that is not the case.
      
      Rework the reinitialisation of the inode when it is recycled to
      ensure that it is pristine before it is reused. While there, also
      fix the resetting of state flags in the recycling error paths so the
      inode does not become unreclaimable.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      778e24bb
  4. 16 Jun, 2011 1 commit
    • Christoph Hellwig's avatar
      xfs: make log devices with write back caches work · a27a263b
      Christoph Hellwig authored
      There's no reason not to support cache flushing on external log devices.
      The only thing this really requires is flushing the data device first
      both in fsync and log commits.  A side effect is that we also have to
      remove the barrier write test during mount, which has been superflous
      since the new FLUSH+FUA code anyway.  Also use the chance to flush the
      RT subvolume write cache before the fsync commit, which is required
      for correct semantics.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      a27a263b
  5. 14 Jun, 2011 1 commit
  6. 06 Jun, 2011 5 commits
  7. 04 Jun, 2011 3 commits