1. 12 Mar, 2011 1 commit
    • Chris Mason's avatar
      Btrfs: break out of shrink_delalloc earlier · 36e39c40
      Chris Mason authored
      Josef had changed shrink_delalloc to exit after three shrink
      attempts, which wasn't quite enough because new writers could
      race in and steal free space.
      
      But it also fixed deadlocks and stalls as we tried to recover
      delalloc reservations.  The code was tweaked to loop 1024
      times, and would reset the counter any time a small amount
      of progress was made.  This was too drastic, and with a
      lot of writers we can end up stuck in shrink_delalloc forever.
      
      The shrink_delalloc loop is fairly complex because the caller is looping
      too, and the caller will go ahead and force a transaction commit to make
      sure we reclaim space.
      
      This reworks things to exit shrink_delalloc when we've forced some
      writeback and the delalloc reservations have gone down.  This means
      the writeback has not just started but has also finished at
      least some of the metadata changes required to reclaim delalloc
      space.
      
      If we've got this wrong, we're returning ENOSPC too early, which
      is a big improvement over the current behavior of hanging the machine.
      
      Test 224 in xfstests hammers on this nicely, and with 1000 writers
      trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
      other writers are able to continue until we get 100%.
      
      This is a worst case test for btrfs because the 1000 writers are doing
      small IO, and the small FS size means we don't have a lot of room
      for metadata chunks.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      36e39c40
  2. 10 Mar, 2011 2 commits
  3. 08 Mar, 2011 1 commit
  4. 07 Mar, 2011 2 commits
    • Chris Mason's avatar
      Btrfs: deal with short returns from copy_from_user · 31339acd
      Chris Mason authored
      When copy_from_user is only able to copy some of the bytes we requested,
      we may end up creating a partially up to date page.  To avoid garbage in
      the page, we need to treat a partial copy as a zero length copy.
      
      This makes the rest of the file_write code drop the page and
      retry the whole copy instead of marking the partially up to
      date page as dirty.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      cc: stable@kernel.org
      31339acd
    • Chris Mason's avatar
      Btrfs: fix regressions in copy_from_user handling · b1bf862e
      Chris Mason authored
      Commit 914ee295 fixed deadlocks in
      btrfs_file_write where we would catch page faults on pages we had
      locked.
      
      But, there were a few problems:
      
      1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy
      data when the amount to copy is more than 4K and the offset to start
      copying from is not page aligned.  The result was btrfs_file_write
      looping forever retrying the iov_iter_copy_from_user_atomic
      
      We deal with this by changing btrfs_file_write to drop down to single
      page copies when iov_iter_copy_from_user_atomic starts returning failure.
      
      2) The btrfs_file_write code was leaking delalloc reservations when
      iov_iter_copy_from_user_atomic returned zero.  The looping above would
      result in the entire filesystem running out of delalloc reservations and
      constantly trying to flush things to disk.
      
      3) btrfs_file_write will lock down page cache pages, make sure
      any writeback is finished, do the copy_from_user and then release them.
      Before the loop runs we check the first and last pages in the write to
      see if they are only being partially modified.  If the start or end of
      the write isn't aligned, we make sure the corresponding pages are
      up to date so that we don't introduce garbage into the file.
      
      With the copy_from_user changes, we're allowing the VM to reclaim the
      pages after a partial update from copy_from_user, but we're not
      making sure the page cache page is up to date when we loop around to
      resume the write.
      
      We deal with this by pushing the up to date checks down into the page
      prep code.  This fits better with how the rest of file_write works.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      Reported-by: default avatarMitch Harder <mitch.harder@sabayonlinux.org>
      cc: stable@kernel.org
      b1bf862e
  5. 23 Feb, 2011 1 commit
    • Chris Mason's avatar
      Btrfs: fix fiemap bugs with delalloc · ec29ed5b
      Chris Mason authored
      The Btrfs fiemap code wasn't properly returning delalloc extents,
      so applications that trust fiemap to decide if there are holes in the
      file see holes instead of delalloc.
      
      This reworks the btrfs fiemap code, adding a get_extent helper that
      searches for delalloc ranges and also adding a helper for extent_fiemap
      that skips past holes in the file.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      ec29ed5b
  6. 16 Feb, 2011 6 commits
  7. 14 Feb, 2011 6 commits
    • Tsutomu Itoh's avatar
      Btrfs: check return value of alloc_extent_map() · c26a9203
      Tsutomu Itoh authored
      I add the check on the return value of alloc_extent_map() to several places.
      In addition, alloc_extent_map() returns only the address or NULL.
      Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.
      Signed-off-by: default avatarTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      c26a9203
    • Ilya Dryomov's avatar
      Btrfs - Fix memory leak in btrfs_init_new_device() · 67100f25
      Ilya Dryomov authored
      Memory allocated by calling kstrdup() should be freed.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      67100f25
    • Dan Rosenberg's avatar
      btrfs: prevent heap corruption in btrfs_ioctl_space_info() · 51788b1b
      Dan Rosenberg authored
      Commit bf5fc093 refactored
      btrfs_ioctl_space_info() and introduced several security issues.
      
      space_args.space_slots is an unsigned 64-bit type controlled by a
      possibly unprivileged caller.  The comparison as a signed int type
      allows providing values that are treated as negative and cause the
      subsequent allocation size calculation to wrap, or be truncated to 0.
      By providing a size that's truncated to 0, kmalloc() will return
      ZERO_SIZE_PTR.  It's also possible to provide a value smaller than the
      slot count.  The subsequent loop ignores the allocation size when
      copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.
      
      The fix changes the slot count type and comparison typecast to u64,
      which prevents truncation or signedness errors, and also ensures that we
      don't copy more data than we've allocated in the subsequent loop.  Note
      that zero-size allocations are no longer possible since there is already
      an explicit check for space_args.space_slots being 0 and truncation of
      this value is no longer an issue.
      Signed-off-by: default avatarDan Rosenberg <drosenberg@vsecurity.com>
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      51788b1b
    • Yan, Zheng's avatar
      Btrfs: Fix balance panic · 6848ad64
      Yan, Zheng authored
      Mark the cloned backref_node as checked in clone_backref_node()
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      6848ad64
    • Chris Mason's avatar
      Btrfs: don't release pages when we can't clear the uptodate bits · e3f24cc5
      Chris Mason authored
      Btrfs tracks uptodate state in an rbtree as well as in the
      page bits.  This is supposed to enable us to use block sizes other than
      the page size, but there are a few parts still missing before that
      completely works.
      
      But, our readpage routine trusts this additional range based tracking
      of uptodateness, much in the same way the buffer head up to date bits
      are trusted for the other filesystems.
      
      The problem is that sometimes we need to allocate memory in order to
      split records in the rbtree, even when we are just clearing bits.  This
      can be difficult when our clearing function is called GFP_ATOMIC, which
      can happen in the releasepage path.
      
      So, what happens today looks like this:
      
      releasepage called with GFP_ATOMIC
      btrfs_releasepage calls clear_extent_bit
      clear_extent_bit fails to allocate ram, leaving the up to date bit set
      btrfs_releasepage returns success
      
      The end result is the page being gone, but btrfs thinking the range is
      up to date.   Later on if someone tries to read that same page, the
      btrfs readpage code will return immediately thinking the page is already
      up to date.
      
      This commit fixes things to fail the releasepage when we can't clear the
      extent state bits.  It covers both data pages and metadata tree blocks.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      e3f24cc5
    • Chris Mason's avatar
      Btrfs: fix page->private races · eb14ab8e
      Chris Mason authored
      There is a race where btrfs_releasepage can drop the
      page->private contents just as alloc_extent_buffer is setting
      up pages for metadata.  Because of how the Btrfs page flags work,
      this results in us skipping the crc on the page during IO.
      
      This patch sovles the race by waiting until after the extent buffer
      is inserted into the radix tree before it sets page private.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      eb14ab8e
  8. 07 Feb, 2011 1 commit
  9. 06 Feb, 2011 4 commits
  10. 01 Feb, 2011 3 commits
  11. 31 Jan, 2011 2 commits
    • Chris Mason's avatar
      Btrfs: catch errors from btrfs_sync_log · b31eabd8
      Chris Mason authored
      btrfs_sync_log returns -EAGAIN when we need full transaction commits
      instead of small log commits, but sometimes we were dropping the return
      value.
      
      In practice, we check for this a few different ways, but this is still a
      bug that can leave off full log commits when we really need them.
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      b31eabd8
    • Josef Bacik's avatar
      Btrfs: make shrink_delalloc a little friendlier · b1953bce
      Josef Bacik authored
      Xfstests 224 will just sit there and spin for ever until eventually we give up
      flushing delalloc and exit.  On my box this took several hours.  I could not
      interrupt this process either, even though we use INTERRUPTIBLE.  So do 2 things
      
      1) Keep us from looping over and over again without reclaiming anything
      2) If we get interrupted exit the loop
      
      I tested this and the test now exits in a reasonable amount of time, and can be
      interrupted with ctrl+c.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
      b1953bce
  12. 28 Jan, 2011 11 commits