- 20 Jan, 2020 40 commits
-
-
Dennis Zhou authored
We probably should call sysfs_remove_group() on debug/. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
The prior two patches added discarding via a background workqueue. This just piggybacked off of the fstrim code to trim the whole block at once. Well inevitably this is worse performance wise and will aggressively overtrim. But it was nice to plumb the other infrastructure to keep the patches easier to review. This adds the real goal of this series which is discarding slowly (ie. a slow long running fstrim). The discarding is split into two phases, extents and then bitmaps. The reason for this is two fold. First, the bitmap regions overlap the extent regions. Second, discarding the extents first will let the newly trimmed bitmaps have the highest chance of coalescing when being readded to the free space cache. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
block_group removal is a little tricky. It can race with the extent allocator, the cleaner thread, and balancing. The current path is for a block_group to be added to the unused_bgs list. Then, when the cleaner thread comes around, it starts a transaction and then proceeds with removing the block_group. Extents that are pinned are subsequently removed from the pinned trees and then eventually a discard is issued for the entire block_group. Async discard introduces another player into the game, the discard workqueue. While it has none of the racing issues, the new problem is ensuring we don't leave free space untrimmed prior to forgetting the block_group. This is handled by placing fully free block_groups on a separate discard queue. This is necessary to maintain discarding order as in the future we will slowly trim even fully free block_groups. The ordering helps us make progress on the same block_group rather than say the last fully freed block_group or needing to search through the fully freed block groups at the beginning of a list and insert after. The new order of events is a fully freed block group gets placed on the unused discard queue first. Once it's processed, it will be placed on the unusued_bgs list and then the original sequence of events will happen, just without the final whole block_group discard. The mount flags can change when processing unused_bgs, so when flipping from DISCARD to DISCARD_ASYNC, the unused_bgs must be punted to the discard_list to be trimmed. If we flip off DISCARD_ASYNC, we punt free block groups on the discard_list to the unused_bg queue which will do the final discard for us. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
When discard is enabled, everytime a pinned extent is released back to the block_group's free space cache, a discard is issued for the extent. This is an overeager approach when it comes to discarding and helping the SSD maintain enough free space to prevent severe garbage collection situations. This adds the beginning of async discard. Instead of issuing a discard prior to returning it to the free space, it is just marked as untrimmed. The block_group is then added to a LRU which then feeds into a workqueue to issue discards at a much slower rate. Full discarding of unused block groups is still done and will be addressed in a future patch of the series. For now, we don't persist the discard state of extents and bitmaps. Therefore, our failure recovery mode will be to consider extents untrimmed. This lets us handle failure and unmounting as one in the same. On a number of Facebook webservers, I collected data every minute accounting the time we spent in btrfs_finish_extent_commit() (col. 1) and in btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit() is where we discard extents synchronously before returning them to the free space cache. discard=sync: p99 total per minute p99 total per minute Drive | extent_commit() (ms) | commit_trans() (ms) --------------------------------------------------------------- Drive A | 434 | 1170 Drive B | 880 | 2330 Drive C | 2943 | 3920 Drive D | 4763 | 5701 discard=async: p99 total per minute p99 total per minute Drive | extent_commit() (ms) | commit_trans() (ms) -------------------------------------------------------------- Drive A | 134 | 956 Drive B | 64 | 1972 Drive C | 59 | 1032 Drive D | 62 | 1200 While it's not great that the stats are cumulative over 1m, all of these servers are running the same workload and and the delta between the two are substantial. We are spending significantly less time in btrfs_finish_extent_commit() which is responsible for discarding. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
There is a cap in btrfs in the amount of free extents that a block group can have. When it surpasses that threshold, future extents are placed into bitmaps. Instead of keeping track of if a certain bit is trimmed or not in a second bitmap, keep track of the relative state of the bitmap. With async discard, trimming bitmaps becomes a more frequent operation. As a trade off with simplicity, we keep track of if discarding a bitmap is in progress. If we fully scan a bitmap and trim as necessary, the bitmap is marked clean. This has some caveats as the min block size may skip over regions deemed too small. But this should be a reasonable trade off rather than keeping a second bitmap and making allocation paths more complex. The downside is we may overtrim, but ideally the min block size should prevent us from doing that too often and getting stuck trimming pathological cases. BTRFS_TRIM_STATE_TRIMMING is added to indicate a bitmap is in the process of being trimmed. If additional free space is added to that bitmap, the bit is cleared. A bitmap will be marked BTRFS_TRIM_STATE_TRIMMED if the trimming code was able to reach the end of it and the former is still set. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
Async discard will use the free space cache as backing knowledge for which extents to discard. This patch plumbs knowledge about which extents need to be discarded into the free space cache from unpin_extent_range(). An untrimmed extent can merge with everything as this is a new region. Absorbing trimmed extents is a tradeoff to for greater coalescing which makes life better for find_free_extent(). Additionally, it seems the size of a trim isn't as problematic as the trim io itself. When reading in the free space cache from disk, if sync is set, mark all extents as trimmed. The current code ensures at transaction commit that all free space is trimmed when sync is set, so this reflects that. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
This series introduces async discard which will use the flag DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is synchronously done in transaction commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Dennis Zhou authored
Bitmaps are fairly popular for their space efficiency, but we don't have generic iterators available. Make percpu's bitmap region iterators available to everyone. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
[PROBLEM] There is a user report in the mail list, showing the following corrupted tree blocks: item 62 key (486836 DIR_ITEM 2543451757) itemoff 6273 itemsize 74 location key (4065004 INODE_ITEM 1073741824) type FILE transid 21397 data_len 0 name_len 44 name: FILENAME Note that location key, its offset should be 0 for all INODE_ITEMS. This caused failed lookup of the inode. [CAUSE] That offending value, 1073741824, is 0x40000000. So this looks like a memory bit flip. [FIX] This patch will enhance tree-checker to check location key of DIR_INDEX/DIR_ITEM/XATTR_ITEM. There are several different combinations needs to check: - item_key.type == DIR_INDEX/DIR_ITEM * location_key.type == BTRFS_INODE_ITEM_KEY This location_key should follow the check in inode_item check. * location_key.type == BTRFS_ROOT_ITEM_KEY Despite the existing check, DIR_INDEX/DIR_ITEM can only points to subvolume trees. * All other keys are not allowed. - item_key.type == XATTR_ITEM location_key should be all 0. Reported-by: Mike Gilbert <floppymaster@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
ROOT_ITEM key check itself is not as simple as single line check, and will be reused for both ROOT_ITEM and DIR_ITEM/DIR_INDEX location key check, so refactor such check into check_root_key(). Also since we are here, fix a comment error about ROOT_ITEM offset, which is transid of snapshot creation, not some "older kernel behavior". Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Inode key check is not as easy as several lines, and it will be called in more than one location (INODE_ITEM check and DIR_ITEM/DIR_INDEX/XATTR_ITEM location key check). So here refactor such check into check_inode_key(). And add extra checks for XATTR_ITEM. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The @fs_info parameter can be extracted from extent_buffer structure, and there are already some wrappers getting rid of the @fs_info parameter. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Inspired by btrfs-progs github issue #208, where chunk item in chunk tree has invalid num_stripes (0). Although that can already be caught by current btrfs_check_chunk_valid(), that function doesn't really check item size as it needs to handle chunk item in super block sys_chunk_array(). This patch will add two extra checks for chunk items in chunk tree: - Basic chunk item size If the item is smaller than btrfs_chunk (which already contains one stripe), exit right now as reading num_stripes may even go beyond eb boundary. - Item size check against num_stripes If item size doesn't match with calculated chunk size, then either the item size or the num_stripes is corrupted. Error out anyway. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
zhengbin authored
Fixes coccicheck warning: fs/btrfs/print-tree.c:320:3-4: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
This hasn't been used since it was first introduced in commit b4bd745d ("btrfs: Introduce find_free_extent_ctl structure for later rework"). Passing that to btrfs_add_reserved_bytes in find_free_extent is not strictly necessary and using the local ram_bytes instead seems cleaner. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
Commit 7087a9d8 ("btrfs: Remove extent_io_ops::writepage_end_io_hook") left this logic in a confusing state. Simplify it. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
We only pass this as 1 from __extent_writepage_io(). The parameter basically means "pretend I didn't pass in a page". This is silly since we can simply not pass in the page. Get rid of the parameter from btrfs_get_extent(), and since it's used as a get_extent_t callback, remove it from get_extent_t and btree_get_extent(), neither of which need it. While we're here, let's document btrfs_get_extent(). Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
In __extent_writepage_io(), we check whether i_size <= page_offset(page). Note that if i_size < page_offset(page), then i_size >> PAGE_SHIFT < page->index. If i_size == page_offset(page), then i_size >> PAGE_SHIFT == page->index && offset_in_page(i_size) == 0. __extent_writepage() already has a check for these cases that returns without calling __extent_writepage_io(): end_index = i_size >> PAGE_SHIFT pg_offset = offset_in_page(i_size); if (page->index > end_index || (page->index == end_index && !pg_offset)) { page->mapping->a_ops->invalidatepage(page, 0, PAGE_SIZE); unlock_page(page); return 0; } Get rid of the one in __extent_writepage_io(), which was obsoleted in 211c17f5 ("Fix corners in writepage and btrfs_truncate_page"). Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
Since 40f76580 ("Btrfs: split up __extent_writepage to lower stack usage"), done_unlocked is simply a return 0. Get rid of it. Mid-statement block returns don seem to make the code less readable here. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
We're initializing pg_offset to 0, setting it immediately, then reassigning it to 0 again after. The former became unnecessary in 211c17f5 ("Fix corners in writepage and btrfs_truncate_page"). The latter is a leftover that should've been removed in 40f76580 ("Btrfs: split up __extent_writepage to lower stack usage"). Remove both. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
ordered->start, ordered->len, and ordered->disk_len correspond to fi->disk_bytenr, fi->num_bytes, and fi->disk_num_bytes, respectively. It's confusing to translate between the two naming schemes. Since a btrfs_ordered_extent is basically a pending btrfs_file_extent_item, let's make the former use the naming from the latter. Note that I didn't touch the names in tracepoints just in case there are scripts depending on the current naming. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
Snapshot-aware defrag has been disabled since commit 8101c8db ("Btrfs: disable snapshot aware defrag for now") almost 6 years ago. Let's remove the dead code. If someone is up to the task of bringing it back, they can dig it up from git. This is logically a revert of commit 38c227d8 ("Btrfs: snapshot-aware defrag") except that now we have to clear the EXTENT_DEFRAG bit to avoid need_force_cow() returning true forever. The reasons to disable were caused by runtime problems (like long stalls or memory consumption) on heavily referenced extents (eg. thousands of snapshots). There were attempts to fix that but never finished. Current defrag breaks the extent references and some users prefer that behaviour over the one implemented by snapshot aware (ie. keeping links for defragmentation). To enable both usecases we'd need to extend defrag ioctl but let's do that properly from scratch. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhance ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
We can encode this in the offset parameter: -1 means use the page offsets, anything else is a valid offset. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
Currently, we have two wrappers for __btrfs_lookup_bio_sums(): btrfs_lookup_bio_sums_dio(), which is used for direct I/O, and btrfs_lookup_bio_sums(), which is used everywhere else. The only difference is that the _dio variant looks up csums starting at the given offset instead of using the page index, which isn't actually direct I/O-specific. Let's clean up the signature and return value of __btrfs_lookup_bio_sums(), rename it to btrfs_lookup_bio_sums(), and get rid of the trivial helpers. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
When closing a device, btrfs_close_one_device() first allocates a new device, copies the device to close's name, replaces it in the dev_list with the copy and then finally frees it. This involves two memory allocation, which can potentially fail. As this code path is tricky to unwind, the allocation failures where handled by BUG_ON()s. But this copying isn't strictly needed, all that is needed is resetting the device in question to it's state it had after the allocation. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
In btrfs_close_one_device we're decrementing the number of open devices before we're calling btrfs_close_bdev(). As there is no intermediate exit between these points in this function it is technically OK to do so, but it makes the code a bit harder to understand. Move both operations closer together and move the decrement step after btrfs_close_bdev(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Omar Sandoval authored
When you snapshot a subvolume containing a subvolume, you get a placeholder directory where the subvolume would be. These directories have their own btrfs_dir_ro_inode_operations. Al pointed out [1] that these directories can use simple_lookup() instead of btrfs_lookup(), as they are always empty. Furthermore, they can use the default generic_permission() instead of btrfs_permission(); the additional checks in the latter don't matter because we can't write to the directory anyways. Finally, they can use the default generic_update_time() instead of btrfs_update_time(), as the inode doesn't exist on disk and doesn't need any special handling. All together, this means that we can get rid of btrfs_dir_ro_inode_operations and use simple_dir_inode_operations instead. 1: https://lore.kernel.org/linux-btrfs/20190929052934.GY26530@ZenIV.linux.org.uk/ Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
We have a user report, that cppcheck is complaining about a possible NULL-pointer dereference in btrfs_destroy_dev_replace_tgtdev(). We're first dereferencing the 'tgtdev' variable and the later check for the validity of the pointer with a WARN_ON(!tgtdev); But all callers of btrfs_destroy_dev_replace_tgtdev() either explicitly check if 'tgtdev' is non-NULL or directly allocate 'tgtdev', so the WARN_ON() is impossible to hit. Just remove it to silence the checker's complains. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205003Signed-off-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
btrfsic_process_superblock() BUG_ON()s if 'state' is NULL. But this can never happen as the only caller from btrfsic_process_superblock() is btrfsic_mount() which allocates 'state' some lines above calling btrfsic_process_superblock() and checks for the allocation to succeed. Let's just remove the impossible to hit BUG_ON(). Signed-off-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
A user reports a possible NULL-pointer dereference in btrfsic_process_superblock(). We are assigning state->fs_info to a local fs_info variable and afterwards checking for the presence of state. While we would BUG_ON() a NULL state anyways, we can also just remove the local fs_info copy, as fs_info is only used once as the first argument for btrfs_num_copies(). There we can just pass in state->fs_info as well. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205003Signed-off-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This is a relic from a time before we had a proper reservation mechanism and you could end up with really full chunks at chunk allocation time. This doesn't make sense anymore, so just kill it. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We have the space_info, we can just check its flags to see if it's the system chunk space info. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's a simple wrapper over btrfs_panic and is called only once. Just open code it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
There are two relocation stages but both print the same message. Add the description of the stage. This can help debugging or provides informative message to users. BTRFS info (device dm-5): balance: start -d -m -s BTRFS info (device dm-5): relocating block group 30408704 flags metadata|dup BTRFS info (device dm-5): found 2 extents, stage: move data extents BTRFS info (device dm-5): relocating block group 22020096 flags system|dup BTRFS info (device dm-5): found 1 extents, stage: move data extents BTRFS info (device dm-5): relocating block group 13631488 flags data BTRFS info (device dm-5): found 1 extents, stage: move data extents BTRFS info (device dm-5): found 1 extents, stage: update data pointers BTRFS info (device dm-5): balance: ended with status: 0 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Yunfeng Ye authored
The condition '!ret2' is always true. commit 717beb96 ("Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion") left behind the check after moving this code out of the goto, so remove the unused condition check. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
level <0 and level >= BTRFS_MAX_LEVEL are already performed upon extent buffer read by tree checker in btrfs_check_node. go. As far as 'level <= 0' we are guaranteed that level is '> 0' because the value of level _before_ reading 'next' is larger than 1 (otherwise we wouldn't have executed that code at all) this in turn guarantees that 'level' after btrfs_read_buffer is 'level - 1' since we verify this invariant in: btrfs_read_buffer btree_read_extent_buffer_pages btrfs_verify_level_key This guarantees that level can never be '<= 0' so the warn on is never triggered. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The log_root passed to walk_log_tree is guaranteed to have its root_key.objectid always be BTRFS_TREE_LOG_OBJECTID. This is by merit that all log roots of an ordinary root are allocated in alloc_log_tree which hard-codes objectid to be BTRFS_TREE_LOG_OBJECTID. In case walk_log_tree is called for a log tree found by btrfs_read_fs_root in btrfs_recover_log_trees, that function already ensures found_key.objectid is BTRFS_TREE_LOG_OBJECTID. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
__btrfs_free_reserved_extent now performs the actions of btrfs_free_and_pin_reserved_extent. But this name is a bit of a misnomer, since the extent is not really freed but just pinned. Reflect this in the new name. No semantics changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
__btrfs_free_reserved_extent performs 2 entirely different operations depending on whether its 'pin' argument is true or false. This patch lifts the 2nd case (pin is false) into it's sole caller btrfs_free_reserved_extent. No semantics changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
All callers of btrfs_free_reserved_extent (respectively __btrfs_free_reserved_extent with in set to 0) pass in extents which have only been reserved but not yet written to. Namely, * in cow_file_range that function is called only if create_io_em fails or btrfs_add_ordered_extent fail, both of which happen _before_ any IO is submitted to the newly reserved range * in submit_compressed_extents the code flow is similar - out_free_reserve can be called only before btrfs_submit_compressed_write which is where any writes to the range could occur * btrfs_new_extent_direct also calls btrfs_free_reserved_extent only if extent_map fails, before any IO is issued * __btrfs_prealloc_file_range also calls btrfs_free_reserved_extent in case insertion of the metadata fails * btrfs_alloc_tree_block again can only be called in case in-memory operations fail, before any IO is submitted * btrfs_finish_ordered_io - this is the only caller where discarding the extent could have a material effect, since it can be called for an extent which was partially written. With this change the submission of discards is optimised since discards are now not being created for extents which are known to not have been touched on disk. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-