1. 23 May, 2011 12 commits
    • Josef Bacik's avatar
      Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f
      Josef Bacik authored
      If we have a very large filesystem, we can spend a lot of time in
      find_free_extent just trying to allocate from empty block groups.  So instead
      check to see if the block group even has enough space for the allocation, and if
      not go on to the next block group.
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      cca1c81f
    • Josef Bacik's avatar
      Btrfs: don't always do readahead · 026fd317
      Josef Bacik authored
      Our readahead is sort of sloppy, and really isn't always needed.  For example if
      ls is doing a stating ls (which is the default) it's going to stat in non-disk
      order, so if say you have a directory with a stupid amount of files, readahead
      is going to do nothing but waste time in the case of doing the stat.  Taking the
      unconditional readahead out made my test go from 57 minutes to 36 minutes.  This
      means that everywhere we do loop through the tree we want to make sure we do set
      path->reada properly, so I went through and found all of the places where we
      loop through the path and set reada to 1.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      026fd317
    • Josef Bacik's avatar
      Btrfs: try not to sleep as much when doing slow caching · 589d8ade
      Josef Bacik authored
      When the fs is super full and we unmount the fs, we could get stuck in this
      thing where unmount is waiting for the caching kthread to make progress and the
      caching kthread keeps scheduling because we're in the middle of a commit.  So
      instead just let the caching kthread keep going and only yeild if
      need_resched().  This makes my horrible umount case go from taking up to 10
      minutes to taking less than 20 seconds.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      589d8ade
    • Josef Bacik's avatar
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik authored
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      d82a6f1d
    • Josef Bacik's avatar
      Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba
      Josef Bacik authored
      We have a bit of debugging in btrfs_search_slot to make sure the level of the
      cow block is the same as the original block we were cow'ing.  I don't think I've
      ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
      search.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7e2355ba
    • Josef Bacik's avatar
      Btrfs: map the node block when looking for readahead targets · cb25c2ea
      Josef Bacik authored
      If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
      times, which is 32 pairs of kmap/kunmap, which _sucks_.  So go ahead and map the
      extent buffer while we look for readahead targets.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      cb25c2ea
    • Josef Bacik's avatar
      Btrfs: set range_start to the right start in count_range_bits · af60bed2
      Josef Bacik authored
      In count_range_bits we are adjusting total_bytes based on the range we are
      searching for, but we don't adjust the range start according to the range we are
      searching for, which makes for weird results.  For example, if the range
      
      [0-8192]
      
      is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
      of bytes found, but the range_start will be 0, which makes it look like the
      range is [0-4096].  So instead set range_start = max(cur_start, state->start).
      This makes everything come out right.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      af60bed2
    • Josef Bacik's avatar
      Btrfs: fix how we do space reservation for truncate · fcb80c2a
      Josef Bacik authored
      The ceph guys keep running into problems where we have space reserved in our
      orphan block rsv when freeing it up.  This is because they tend to do snapshots
      alot, so their truncates tend to use a bunch of space, so when we go to do
      things like update the inode we have to steal reservation space in order to make
      the reservation happen.  This happens because truncate can use as much space as
      it freaking feels like, but we still have to hold space for removing the orphan
      item and updating the inode, which will definitely always happen.  So in order
      to fix this we need to split all of the reservation stuf up.  So with this patch
      we have
      
      1) The orphan block reserve which only holds the space for deleting our orphan
      item when everything is over.
      
      2) The truncate block reserve which gets allocated and used specifically for the
      space that the truncate will use on a per truncate basis.
      
      3) The transaction will always have 1 item's worth of data reserved so we can
      update the inode normally.
      
      Hopefully this will make the ceph problem go away.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      fcb80c2a
    • Josef Bacik's avatar
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik authored
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      a4abeea4
    • Josef Bacik's avatar
      Btrfs: if we've already started a trans handle, use that one · 2a1eb461
      Josef Bacik authored
      We currently track trans handles in current->journal_info, but we don't actually
      use it.  This patch fixes it.  This will cover the case where we have multiple
      people starting transactions down the call chain.  This keeps us from having to
      allocate a new handle and all of that, we just increase the use count of the
      current handle, save the old block_rsv, and return.  I tested this with xfstests
      and it worked out fine.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      2a1eb461
    • Josef Bacik's avatar
      Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40
      Josef Bacik authored
      I keep forgetting that btrfs_join_transaction() just ignores the num_items
      argument, which leads me to sending pointless patches and looking stupid :).  So
      just kill the num_items argument from btrfs_join_transaction and
      btrfs_start_ioctl_transaction, since neither of them use it.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      7a7eaa40
    • Josef Bacik's avatar
      Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075
      Josef Bacik authored
      In the prealloc filling code and compressed code we don't set trans->block_rsv
      to the delalloc block reserve properly, which is going to make us use metadata
      from the wrong pool, this patch fixes that.  Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      74b21075
  2. 19 May, 2011 1 commit
  3. 18 May, 2011 22 commits
  4. 17 May, 2011 5 commits