1. 27 Mar, 2012 1 commit
    • Dave Chinner's avatar
      xfs: fix fstrim offset calculations · a66d6363
      Dave Chinner authored
      xfs_ioc_fstrim() doesn't treat the incoming offset and length
      correctly. It treats them as a filesystem block address, rather than
      a disk address. This is wrong because the range passed in is a
      linear representation, while the filesystem block address notation
      is a sparse representation. Hence we cannot convert the range direct
      to filesystem block units and then use that for calculating the
      range to trim.
      
      While this sounds dangerous, the problem is limited to calculating
      what AGs need to be trimmed. The code that calcuates the actual
      ranges to trim gets the right result (i.e. only ever discards free
      space), even though it uses the wrong ranges to limit what is
      trimmed. Hence this is not a bug that endangers user data.
      
      Fix this by treating the range as a disk address range and use the
      appropriate functions to convert the range into the desired formats
      for calculations.
      
      Further, fix the first free extent lookup (the longest) to actually
      find the largest free extent. Currently this lookup uses a <=
      lookup, which results in finding the extent to the left of the
      largest because we can never get an exact match on the largest
      extent. This is due to the fact that while we know it's size, we
      don't know it's location and so the exact match fails and we move
      one record to the left to get the next largest extent. Instead, use
      a >= search so that the lookup returns the largest extent regardless
      of the fact we don't get an exact match on it.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      a66d6363
  2. 26 Mar, 2012 3 commits
    • Dave Chinner's avatar
      xfs: Account log unmount transaction correctly · 3948659e
      Dave Chinner authored
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      3948659e
    • Dave Chinner's avatar
      xfs: don't cache inodes read through bulkstat · 5132ba8f
      Dave Chinner authored
      When we read inodes via bulkstat, we generally only read them once
      and then throw them away - they never get used again. If we retain
      them in cache, then it simply causes the working set of inodes and
      other cached items to be reclaimed just so the inode cache can grow.
      
      Avoid this problem by marking inodes read by bulkstat not to be
      cached and check this flag in .drop_inode to determine whether the
      inode should be added to the VFS LRU or not. If the inode lookup
      hits an already cached inode, then don't set the flag. If the inode
      lookup hits an inode marked with no cache flag, remove the flag and
      allow it to be cached once the current reference goes away.
      
      Inodes marked as not cached will get cleaned up by the background
      inode reclaim or via memory pressure, so they will still generate
      some short term cache pressure. They will, however, be reclaimed
      much sooner and in preference to cache hot inodes.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      5132ba8f
    • Christoph Hellwig's avatar
      xfs: trace xfs_name strings correctly · f6161375
      Christoph Hellwig authored
      Strings store in an xfs_name structure are often not NUL terminated,
      print them using the correct printf specifiers that make use of the
      string length store in the xfs_name structure.
      Reported-by: default avatarBrian Candler <B.Candler@pobox.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      f6161375
  3. 22 Mar, 2012 4 commits
    • Dave Chinner's avatar
      xfs: introduce an allocation workqueue · c999a223
      Dave Chinner authored
      We currently have significant issues with the amount of stack that
      allocation in XFS uses, especially in the writeback path. We can
      easily consume 4k of stack between mapping the page, manipulating
      the bmap btree and allocating blocks from the free list. Not to
      mention btree block readahead and other functionality that issues IO
      in the allocation path.
      
      As a result, we can no longer fit allocation in the writeback path
      in the stack space provided on x86_64. To alleviate this problem,
      introduce an allocation workqueue and move all allocations to a
      seperate context. This can be easily added as an interposing layer
      into xfs_alloc_vextent(), which takes a single argument structure
      and does not return until the allocation is complete or has failed.
      
      To do this, add a work structure and a completion to the allocation
      args structure. This allows xfs_alloc_vextent to queue the args onto
      the workqueue and wait for it to be completed by the worker. This
      can be done completely transparently to the caller.
      
      The worker function needs to ensure that it sets and clears the
      PF_TRANS flag appropriately as it is being run in an active
      transaction context. Work can also be queued in a memory reclaim
      context, so a rescuer is needed for the workqueue.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      c999a223
    • Dave Chinner's avatar
      xfs: Fix open flag handling in open_by_handle code · 1a1d7724
      Dave Chinner authored
      Sparse identified some unsafe handling of open flags in the xfs open
      by handle ioctl code. Update the code to use the correct access
      macros to ensure that we handle the open flags correctly.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      1a1d7724
    • Kamal Dasu's avatar
      xfs: fix deadlock in xfs_rtfree_extent · 5575acc7
      Kamal Dasu authored
      To fix the deadlock caused by repeatedly calling xfs_rtfree_extent
      
       - removed xfs_ilock() and xfs_trans_ijoin() from xfs_rtfree_extent(),
         instead added asserts that the inode is locked and has an inode_item
         attached to it.
       - in xfs_bunmapi() when dealing with an inode with the rt flag
         call xfs_ilock() and xfs_trans_ijoin() so that the
         reference count is bumped on the inode and attached it to the
         transaction before calling into xfs_bmap_del_extent, similar to
         what we do in xfs_bmap_rtalloc.
      Signed-off-by: default avatarKamal Dasu <kdasu.kdev@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      5575acc7
    • Gerard Snitselaar's avatar
      fs: xfs: fix section mismatch in linux-next · 1c2ccc66
      Gerard Snitselaar authored
      xfs_qm_exit() is called in init_xfs_fs().
      Signed-off-by: default avatarGerard Snitselaar <dev@snitselaar.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      1c2ccc66
  4. 15 Mar, 2012 4 commits
  5. 14 Mar, 2012 6 commits
  6. 13 Mar, 2012 5 commits
  7. 05 Mar, 2012 3 commits
  8. 29 Feb, 2012 3 commits
  9. 25 Feb, 2012 1 commit
    • Alex Elder's avatar
      xfs: only take the ILOCK in xfs_reclaim_inode() · ad637a10
      Alex Elder authored
      At the end of xfs_reclaim_inode(), the inode is locked in order to
      we wait for a possible concurrent lookup to complete before the
      inode is freed.  This synchronization step was taking both the ILOCK
      and the IOLOCK, but the latter was causing lockdep to produce
      reports of the possibility of deadlock.
      
      It turns out that there's no need to acquire the IOLOCK at this
      point anyway.  It may have been required in some earlier version of
      the code, but there should be no need to take the IOLOCK in
      xfs_iget(), so there's no (longer) any need to get it here for
      synchronization.  Add an assertion in xfs_iget() as a reminder
      of this assumption.
      
      Dave Chinner diagnosed this on IRC, and Christoph Hellwig suggested
      no longer including the IOLOCK.  I just put together the patch.
      Signed-off-by: default avatarAlex Elder <elder@dreamhost.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ad637a10
  10. 23 Feb, 2012 10 commits