- 11 Dec, 2016 1 commit
-
-
Chris Mason authored
This is exposing an existing deadlock between fsync and AIO. Until we have the deadlock fixed, I'm pulling this one out. This reverts commit a23eaa87. Signed-off-by: Chris Mason <clm@fb.com>
-
- 09 Dec, 2016 1 commit
-
-
Chris Mason authored
btrfs_transaction_abort() has a WARN() to help us nail down whatever problem lead to the abort. But most of the time, we're aborting for EIO, and the warning just adds noise. Signed-off-by: Chris Mason <clm@fb.com>
-
- 06 Dec, 2016 20 commits
-
-
David Sterba authored
The helpers are trivial and we don't use them consistently. Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Now we only use the root parameter to print the root objectid in a tracepoint. We can use the root parameter from the transaction handle for that. It's also used to join the transaction with async commits, so we remove the comment that it's just for checking. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call btrfs_wait_marked_extents, which provides a core loop and then handles errors differently based on whether it's it's a log root or not. This means that btrfs_write_and_wait_marked_extents needs to take a root because btrfs_wait_marked_extents requires one, even though it's only used to determine whether the root is a log root. The log root code won't ever call into the transaction commit code using a log root, so we can factor out the core loop and provide the error handling appropriate to each waiter in new routines. This allows us to eventually remove the root argument from btrfs_commit_transaction, and as a result, btrfs_end_transaction. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
With the exception of the one case where btrfs_wait_cache_io is called without a block group, it's called with the same arguments. The root argument is only used in the special case, so let's factor out the core and simplify the call in the normal case to require a trans, block group, and path. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
The extent-tree tracepoints all operate on the extent root, regardless of which root is passed in. Let's just use the extent root objectid instead. If it turns out that nobody is depending on the format of this tracepoint, we can drop the root printing entirely. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
This results in btrfs_assert_delayed_root_empty and btrfs_destroy_delayed_inode taking an fs_info instead of a root. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
The io_ctl->root member was only being used to access root->fs_info. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
The root is never used. We substitute extent_root in for the reada_find_extent call, since it's only ever used to obtain the node size. This call site will be changed to use fs_info in a later patch. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
The root member is never used except for obtaining an fs_info pointer. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
Even though a separate root is passed in, we're still operating on the extent root. Let's use that for the trace point. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
btrfs_init_new_device only uses the root passed in via the ioctl to start the transaction. Nothing else that happens is related to whatever root the user used to initiate the ioctl. We can drop the root requirement and just use fs_info->dev_root instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
There are many functions that are always called with the same root argument. Rather than passing the same root every time, we can pass an fs_info pointer instead and have the function get the root pointer itself. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
There are 11 functions that accept a root parameter and immediately overwrite it. We can pass those an fs_info pointer instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
- 30 Nov, 2016 18 commits
-
-
David Sterba authored
-
Wang Xiaoguang authored
This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a chance to make progress, for example, start_transaction() will blocked in wait_current_trans(root) for long time, sometimes it even triggers soft lockups, and the time taken to delete such heavily reflinked file is also very large, often hundreds of seconds. Using perf top, it reports that: PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) --------------------------------------------------------------------------------------- 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 11.02% [kernel] [k] delay_tsc 0.79% [kernel] [k] _raw_spin_unlock_irq 0.78% [kernel] [k] _raw_spin_unlock_irqrestore 0.45% [kernel] [k] do_raw_spin_lock 0.18% [kernel] [k] __slab_alloc It seems __btrfs_run_delayed_refs() took most cpu time, after some debug work, I found it's select_delayed_ref() causing this issue, for a delayed head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but select_delayed_ref() will firstly try to iterate node list to find BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and waste much time. To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, then in select_delayed_ref(), if this list is not empty, we can directly use nodes in this list. With this patch, it just took about 10~15 seconds to delte the same file. Now using perf top, it reports that: PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) ---------------------------------------------------------------------------------------- 20.74% [kernel] [k] _raw_spin_unlock_irqrestore 16.33% [kernel] [k] __slab_alloc 5.41% [kernel] [k] lock_acquired 4.42% [kernel] [k] lock_acquire 4.05% [kernel] [k] lock_release 3.37% [kernel] [k] _raw_spin_unlock_irq For normal files, this patch also gives help, at least we do not need to iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Commit 62b99540 (btrfs: relocation: Fix leaking qgroups numbers on data extents) only fixes the problem partly. The previous fix is to trace all new data extents at transaction commit time when balance finishes. However balance is not done in a large transaction, every path replacement can happen in its own transaction. This makes the fix useless if transaction commits during relocation. For example: relocate_block_group() |-merge_reloc_roots() | |- merge_reloc_root() | |- btrfs_start_transaction() <- Trans X | |- replace_path() <- Cause leak | |- btrfs_end_transaction_throttle() <- Trans X commits here | | Leak not fixed | | | |- btrfs_start_transaction() <- Trans Y | |- replace_path() <- Cause leak | |- btrfs_end_transaction_throttle() <- Trans Y ends | but not committed |-btrfs_join_transaction() <- Still trans Y |-qgroup_fix() <- Only fixes data leak | in trans Y |-btrfs_commit_transaction() <- Trans Y commits In that case, qgroup fixup can only fix data leak in trans Y, data leak in trans X is out of fix. So the correct fix should happen in the same transaction of replace_path(). This patch fixes it by tracing both subtrees of tree block swap, so it can fix the problem and ensure all leaking and fix are in the same transaction, so no leak again. Reported-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Move account_shared_subtree() to qgroup.c and rename it to btrfs_qgroup_trace_subtree(). Do the same thing for account_leaf_items() and rename it to btrfs_qgroup_trace_leaf_items(). Since all these functions are only for qgroup, move them to qgroup.c and export them is more appropriate. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to btrfs_qgroup_trace_extent(_nolock), according to the new reserve/trace/account naming schema. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Add explaination how btrfs qgroups work. Qgroup is split into 3 main phrases: 1) Reserve To ensure qgroup doesn't exceed its limit 2) Trace To info qgroup to trace which extent 3) Account Calculate qgroup number change for each traced extent. This should save quite some time for new developers. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
And remove the bogus check for a NULL return value from kmap, which can't happen. While we're at it: I don't think that kmapping up to 256 will work without deadlocks on highmem machines, a better idea would be to use vm_map_ram to map all of them into a single virtual address range. Incidentally that would also simplify the code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Rework the loop a little bit to use the generic bio_for_each_segment_all helper for iterating over the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Use the bvec offset and len members to prepare for multipage bvecs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Instead of using bi_vcnt to calculate it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Use bio_for_each_segment_all to iterate over the segments instead. This requires a bit of reshuffling so that we only lookup up the ordered item once inside the bio_for_each_segment_all loop. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Just use bio_for_each_segment_all to iterate over all segments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Pass the full bio to the decompression routines and use bio iterators to iterate over the data in the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
This fixes the WARN_ON on BTRFS_I(inode)->reserved_extents in btrfs_destroy_inode and the WARN_ON on nonzero delalloc bytes on umount with qgroups enabled. I was able to reproduce this by setting up a small (~500kb) quota limit and writing a file one byte at a time until I hit the limit. The warnings would all hit on umount. The root cause is that we would reserve a block-sized range in both the reservation and the quota in btrfs_check_data_free_space, but if we encountered a problem (like e.g. EDQUOT), we would only release the single byte in the qgroup reservation. That caused an iotree state split, which increased the number of outstanding extents, in turn disallowing releasing the metadata reservation. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
At this point we will have dropped extent entries from the file, so if we fail to insert the new hole entries then we are leaving the fs in a corrupt state (albeit an easily fixed one). Abort the transaciton if this happens so we can avoid corrupting the fs. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
In order to do hole punching we have a block reserve to hold the reservation we need to drop the extents in our range. Since we could end up dropping a lot of extents we set rsv->failfast so we can just loop around again and drop the remaining of the range. Unfortunately we unconditionally fill the hole extents in and start from the last extent we encountered, which we may or may not have dropped. So this can result in overlapping file extent entries, which can be tripped over in a variety of ways, either by hitting BUG_ON(!ret) in fill_holes() after the search, or in btrfs_set_item_key_safe() in btrfs_drop_extent() at a later time by an unrelated task. Fix this by only setting drop_end to the last extent we did actually drop. This way our holes are filled in properly for the range that we did drop, and the rest of the range that remains to be dropped is actually dropped. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Jeff Mahoney authored
If we process the last item in the leaf and hit an I/O error while reading the next leaf, we return -EIO without having adjusted the position. Since we have emitted dirents, getdents() will return the byte count to the user instead of the error. Subsequent callers will emit the last successful dirent again, and return -EIO again, with the same result. Callers loop forever. Instead, if we always increment ctx->pos after emitting or skipping the dirent, we'll be sure that we won't hit the same one again. When we go to process the next leaf, we won't have emitted any dirents and the -EIO will be returned to the user properly. We also don't need to track if we've emitted a dirent already or if we've changed the position yet. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-