- 19 Oct, 2011 34 commits
-
-
Josef Bacik authored
While looking for a performance regression a user was complaining about, I noticed that we had a regression with the varmail test of filebench. This was introduced by 0d10ee2e which keeps us from calling writepages in writepage. This is a correct change, however it happens to help the varmail test because we write out in larger chunks. This is largly to do with how we write out dirty pages for each transaction. If you run filebench with load varmail set $dir=/mnt/btrfs-test run 60 prior to this patch you would get ~1420 ops/second, but with the patch you get ~1200 ops/second. This is a 16% decrease. So since we know the range of dirty pages we want to write out, don't write out in one page chunks, write out in ranges. So to do this we call filemap_fdatawrite_range() on the range of bytes. Then we convert the DIRTY extents to NEED_WAIT extents. When we then call btrfs_wait_marked_extents() we only have to filemap_fdatawait_range() on that range and clear the NEED_WAIT extents. This doesn't get us back to our original speeds, but I've been seeing ~1380 ops/second, which is a <5% regression as opposed to a >15% regression. That is acceptable given that the original commit greatly reduces our latency to begin with. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
If I have a range where I know a certain bit is and I want to set it to another bit the only option I have is to call set and then clear bit, which will result in 2 tree searches. This is inefficient, so introduce convert_extent_bit which will go through and set the bit I want and clear the old bit I don't want. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
There is a bug that may lead to early ENOSPC in our reservation code. We've been checking against num_bytes which may be above and beyond what we want to actually reserve, which could give us a false ENOSPC. Fix this by making sure the unused space is above how much we want to reserve and not how much we're trying to flush. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
In fixing how we deal with bad inodes, we had a regression in the orphan cleanup code, since it expects to get a bad inode back. So fix it to deal with getting -ESTALE back by deleting the orphan item manually and moving on. Thanks, Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Johannes pointed out we were allocating only kernel pages for doing writes, which is kind of a big deal if you are on 32bit and have more than a gig of ram. So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we don't re-enter. Thanks, Reported-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
I kept getting warnings from evict because we were calling btrfs_start_transaction() with a transaction already started when doing a balance. This is because we remove a block group which requires a transaction, and the put the last reference on the cache inode. Instead of doing this we need to delay the iput so it is done not within a transaction having started. This gets rid of our warnings. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Checksums are charged in 2 different ways. The first case is when we're writing to the disk, we account for the new checksums with the delalloc block rsv. In order for this to work we check if we're allocating a block for the csum root and if trans->block_rsv == the delalloc block rsv. But when we're deleting the csums because of cow, this is charged to the global block rsv, and is done when we run the delayed refs. So we need to make sure that trans->block_rsv == NULL when running the delayed refs. So set it to NULL and reset it in should_end_transaction, and set it to NULL in commit_transaction. This got rid of the ridiculous amount of warnings I was seeing when trying to do a balance. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
The only thing that we need to have a trans handle for is in reserve_metadata_bytes and thats to know how much flushing we can do. So instead of passing it around, just check current->journal_info for a trans_handle so we know if we can commit a transaction to try and free up space or not. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Since the durable block rsv stuff has been killed there is no need to get the block_rsv in btrfs_free_tree_block anymore. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
The alloc warnings everybody has been seeing is because we have been reserving space for csums, but we weren't actually using that space. So make get_block_rsv() return the trans->block_rsv if we're modifying the csum root. Also set the trans->block_rsv to NULL so that if we modify the csum root when running delayed ref's that comes out of the global reserve like it's supposed to. With this patch I'm not seeing those alloc warnings anymore. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Since free space inodes now use normal checksumming we need to make sure to account for their metadata use. So reserve metadata space, and then if we fail to write out the metadata we can just release it, otherwise it will be freed up when the io completes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
In moving some enospc stuff around I noticed that when we unmount we are often evicting the free space cache inodes before we do our last commit. This isn't bad, but it makes us constantly have to re-read the inodes back. So instead don't evict the cache until after we do our last commit, this will make things a little less crappy and makes a future enospc change work properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
While debugging a different issue I noticed that we were always reserving space when we tried to use our truncate block rsv's. This is because they didn't have a ->size value, so use_block_rsv just assumes there is nothing reserved and it does a reserve_metadata_bytes. This is because btrfs_check_block_rsv() doesn't actually add to the size of the block rsv. That seems to be the right thing to do so set ->size to the minimum truncate size we need, since we will always only refill to that size anyway, and this way everything works out correctly. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
If we have to emergency reserve space we need to not increase the block_rsv size, otherwise we'll leak space. Take for instance delalloc, say we reserve 4k, and we use that 4k, and then we have to emergency allocate another 4k, we bump the size up to 8k, however we've only accounted for 4k in reservations in all of our supporting logic, so we'll go to free the 4k and end up having a size of 4k, which will cause us to later not free as much space. I saw this doing testing where I wasn't reserving enough space for something but was still leaking space, very frustrating. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
When changing back to using a spin_lock to protect the extent counters I decided that since we would only be dropping our original extent, it was ok to just drop the extent and return. However since somebody else could have come in and done a reservation, we need to do the normal song and dance to clear the reservation out properly. So calculate how much space we need to free, and then subtract what we just attempted to reserve. If it's more then we know we need to drop those bytes from the delalloc block rsv. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We are setting ins_len to 1 even tho we are just modifying an item that should be there already. This may cause the search stuff to split nodes on the way down needelessly. Set this to 0 since we aren't inserting anything. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
If you run xfstest 224 it you will get lots of messages about not being able to delete inodes and that they will be cleaned up next mount. This is because btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to flush, so if there was not enough space, it simply failed. But in truncate and evict case we could easily flush space to try and get enough space to do our work, so make btrfs_block_rsv_check take a flush argument to pass down to reserve_metadata_bytes. Now xfstests 224 runs fine without all those complaints. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
With btrfs_truncate_inode_items we always return if we have to go to another leaf, which makes us do our reservation again. This means we will only ever modify one leaf at a time, so we only need 1 items worth of slack space. Also, since we are deleting we will not be creating nodes as we go down, if anything we'll be free'ing them as we merge them together, so make a different calculation for truncate which will only have the worst case useage of COW'ing the entire path down to the leaf. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Lukas found a problem where if he tries to fallocate over the same region twice and the first fallocate took up all the space we would fail with ENOSPC. This is because we reserve the total space we want to use for fallocate, regardless of wether or not we will have to actually preallocate. So instead move the check into the loop where we actually have to do the preallocate. Thanks, Tested-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Since we've optimized the truncate path, we no longer require this function. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Currently we're starting and stopping a transaction for no real reason, so kill that and just reserve enough space as if we can truncate all in one transaction. Also use btrfs_block_rsv_check() for our reserve to minimize the amount of space we may have to allocate for our slack space. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot because we have a transaction open it will return EAGAIN, so we do not need to try and commit the transaction again. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
The priority and refill_used flags are not used anymore, and neither is the usage counter, so just remove them from btrfs_block_rsv. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
A user reported getting spammed when moving to 3.0 by this message. Since we switched to the normal checksumming infrastructure all old free space caches will be wrong and need to be regenerated so people are likely to see this message a lot, so ratelimit it so it doesn't fill up their logs and freak them out. Thanks, Reported-by: Andrew Lutomirski <luto@mit.edu> Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
I converted btrfs_truncate to do sane reservations for truncate, but didn't convert btrfs_evict_inode. Basically we need to save the orphan_rsv for deleting the orphan item, and do normal reservations for our truncate. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
This is confusing code and isn't used by anything anymore, so delete it. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
This patch kills off the calculation for the amount of space needed for the orphan operations during a snapshot. The thing is we only do snapshots on commit, so any space that is in the block_rsv->freed[] isn't going to be in the new snapshot anyway, so there isn't any reason to require that space to be reserved for the snapshot to occur. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We have not been reserving enough space for checksums. We were just reserving bytes for the checksum items themselves, we were not taking into account having to cow the tree and such. This patch adds a csum_bytes counter to the inode for keeping track of the number of bytes outstanding we have for checksums. Then we calculate how many leaves would be required for the checksums we are given and use that to reserve space. This adds a significant amount of bytes to our reservations, but we will handle this later. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We always look for delalloc bytes in our io_tree so we can fill in delalloc. This is fine in most cases, but if we're writing out the btree_inode this is just a superfluous tree search on the io_tree, and if we have a lot of metadata dirty this could be an expensive check. So instead check to see if our io_tree has a ->fill_delalloc op, and if not don't even bother doing the lookup. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We have been using bytes_reserved for metadata reservations, which is wrong since we use that to keep track of outstanding reservations from the allocator. This resulted in us doing a lot of silly things to make sure we don't allocate a bunch of metadata chunks since we never had a real view of how much space was actually in use by metadata. This passes Arne's enospc test and xfstests as well as my own enospc tests. Hopefully this will get us moving in the right direction. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
We've only been able to mount with subvol=<whatever> where whatever was a subvol within whatever root we had as the default. This allows us to mount -o subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Currently what we do is just wrong. We either 1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong as we could walk into this subvol later on via another path and hilarity could ensue. Also we don't check the return value of d_splice_alias which isn't good either. or 2) Do a d_find_alias() which we could have lost our dentry from cache at this point and found nothing. So use d_obtain_alias(). In the case that we already have the inode/dentry in cache we will get the correct dentry. If not we will get a disconnected dentry tree so if we walk into it later on everything will be connected up properly. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
reserved_bytes is not used for anything in the inode, remove it. Signed-off-by: Josef Bacik <josef@redhat.com>
-
Josef Bacik authored
Moving things around to give us better packing in the btrfs_inode. This reduces the size of our inode by 8 bytes. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
-
- 05 Oct, 2011 1 commit
-
-
Linus Torvalds authored
-
- 04 Oct, 2011 5 commits
-
-
git://github.com/davem330/netLinus Torvalds authored
* git://github.com/davem330/net: pch_gbe: Fixed the issue on which a network freezes pch_gbe: Fixed the issue on which PC was frozen when link was downed. make PACKET_STATISTICS getsockopt report consistently between ring and non-ring net: xen-netback: correctly restart Tx after a VM restore/migrate bonding: properly stop queuing work when requested can bcm: fix incomplete tx_setup fix RDSRDMA: Fix cleanup of rds_iw_mr_pool net: Documentation: Fix type of variables ibmveth: Fix oops on request_irq failure ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket cxgb4: Fix EEH on IBM P7IOC can bcm: fix tx_setup off-by-one errors MAINTAINERS: tehuti: Alexander Indenbaum's address bounces dp83640: reduce driver noise ptp: fix L2 event message recognition
-
git://github.com/tiwai/soundLinus Torvalds authored
* 'fix/asoc' of git://github.com/tiwai/sound: ASoC: omap_mcpdm_remove cannot be __devexit ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC ASoC: use a valid device for dev_err() in Zylonite
-
git://people.freedesktop.org/~airlied/linuxLinus Torvalds authored
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/kms: fix channel_remap setup (v2) drm/radeon: Set cursor x/y to 0 when x/yorigin > 0. drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation. drm/radeon: Simplify cursor x/yorigin calculation. drm/radeon/kms: fix cursor image off-by-one error drm/radeon/kms: Fix logic error in DP HPD handler drm/radeon/kms: add retry limits for native DP aux defer drm/radeon/kms: fix regression in DP aux defer handling
-
git://git.secretlab.ca/git/linux-2.6Linus Torvalds authored
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6: spi-topcliff-pch: Fix overrun issue spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs spi-topcliff-pch: Fix CPU read complete condition issue spi-topcliff-pch: Fix SSN Control issue spi-topcliff-pch: add tx-memory clear after complete transmitting
-
Jon Mason authored
Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-