- 08 Dec, 2020 40 commits
-
-
Pavel Begunkov authored
Most delay calculations are done in ns or ms, so store discard_ctl->delay in ms and convert the final delay to jiffies only at the end. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Pavel Begunkov authored
Instead of using iops_limit only for cutting off extremes, calculate the discard delay directly from it, so it closely follows iops_limit and doesn't under-discard even though quotas are not saturated. The iops limit could be hit more often in some cases and could increase the discard rate. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Function scrub_find_csum() is to locate the csum for bytenr @logical from sctx->csum_list. However it lacks a lot of comments to explain things like how the csum_list is organized and why we need to drop csum range which is before us. Refactor the function by: - Add more comments explaining the behavior - Add comment explaining why we need to drop the csum range - Put the csum copy in the main loop This is mostly for the incoming patches to make scrub_find_csum() able to find multiple checksums. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The @force parameter for scrub_pages() is to indicate whether we want to force bio submission. Currently it's only used for the super block, and it can be easily determined by the @flags, so we can remove the parameter. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
There are several call sites where we declare something like "struct scrub_page *page". This is confusing as we also use regular page in this code, rename it to 'spage' where applicable. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Currently csum_dirty_buffer() uses page to grab extent buffer, but that only works for sector size == PAGE_SIZE case. For subpage we need page + page_offset to grab extent buffer. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Currently btrfs_validate_metadata_buffer() only needs to handle one extent buffer as currently one page maps to at most one extent buffer. For incoming subpage support, we need to extend the support where one page could contain multiple extent buffers. Split the function so we can call validate_extent_buffer on extent buffers independently. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
For subpage size support, metadata blocks of nodesize are smaller than one page and this needs to be handled when calculating the checksum. The checksummed start and length need to be adjusted but only for the first page: - start is simply offset in the page - length is nodesize (subpage) or PAGE_SIZE for all other cases Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Since commit f28491e0 ("Btrfs: move the extent buffer radix tree into the fs_info"), fs_info can be grabbed from extent_buffer directly. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
For subpage sector size support, one page can contain multiple tree blocks. The entries cannot be based on page size and index must be derived from the sectorsize. No change for page size == sector size. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
When calling attach_extent_buffer_page(), either we're attaching anonymous pages, called from btrfs_clone_extent_buffer(), or we're attaching btree inode pages, called from alloc_extent_buffer(). For the latter case, we should hold page->mapping->private_lock to avoid parallel changes to page->private. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
I got the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.9.0+ #101 Not tainted ------------------------------------------------------ btrfs-cleaner/3445 is trying to acquire lock: ffff89dbec39ab48 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170 but task is already holding lock: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&fs_info->commit_root_sem){++++}-{3:3}: down_write+0x3d/0x70 btrfs_cache_block_group+0x2d5/0x510 find_free_extent+0xb6e/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&space_info->groups_sem){++++}-{3:3}: down_read+0x40/0x130 find_free_extent+0x2ed/0x12f0 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb1/0x330 alloc_tree_block_no_bg_flush+0x4f/0x60 __btrfs_cow_block+0x11d/0x580 btrfs_cow_block+0x10c/0x220 commit_cowonly_roots+0x47/0x2e0 btrfs_commit_transaction+0x595/0xbd0 sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x36/0xa0 cleanup_mnt+0x12d/0x190 task_work_run+0x5c/0xa0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x54/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (btrfs-root-00){++++}-{3:3}: __lock_acquire+0x1167/0x2150 lock_acquire+0xb9/0x3d0 down_read_nested+0x43/0x130 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 btrfs_find_root+0x35/0x1b0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 kthread+0x13a/0x150 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: btrfs-root-00 --> &space_info->groups_sem --> &fs_info->commit_root_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->commit_root_sem); lock(&space_info->groups_sem); lock(&fs_info->commit_root_sem); lock(btrfs-root-00); *** DEADLOCK *** 3 locks held by btrfs-cleaner/3445: #0: ffff89dbeaf28838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140 #1: ffff89dbeb6c7640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0 #2: ffff89dbeaf28a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80 stack backtrace: CPU: 0 PID: 3445 Comm: btrfs-cleaner Not tainted 5.9.0+ #101 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x8b/0xb0 check_noncircular+0xcf/0xf0 __lock_acquire+0x1167/0x2150 ? __bfs+0x42/0x210 lock_acquire+0xb9/0x3d0 ? __btrfs_tree_read_lock+0x32/0x170 down_read_nested+0x43/0x130 ? __btrfs_tree_read_lock+0x32/0x170 __btrfs_tree_read_lock+0x32/0x170 __btrfs_read_lock_root_node+0x3a/0x50 btrfs_search_slot+0x614/0x9d0 ? find_held_lock+0x2b/0x80 btrfs_find_root+0x35/0x1b0 ? do_raw_spin_unlock+0x4b/0xa0 btrfs_read_tree_root+0x61/0x120 btrfs_get_root_ref+0x14b/0x600 find_parent_nodes+0x3e6/0x1b30 btrfs_find_all_roots_safe+0xb4/0x130 btrfs_find_all_roots+0x60/0x80 btrfs_qgroup_trace_extent_post+0x27/0x40 btrfs_add_delayed_data_ref+0x3fd/0x460 btrfs_free_extent+0x42/0x100 __btrfs_mod_ref+0x1d7/0x2f0 walk_up_proc+0x11c/0x400 walk_up_tree+0xf0/0x180 btrfs_drop_snapshot+0x1c7/0x780 ? btrfs_clean_one_deleted_snapshot+0x73/0x110 btrfs_clean_one_deleted_snapshot+0xfb/0x110 cleaner_kthread+0xd4/0x140 ? btrfs_alloc_root+0x50/0x50 kthread+0x13a/0x150 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 while testing another lockdep fix. This happens because we're using the commit_root_sem to protect fs_info->caching_block_groups, which creates a dependency on the groups_sem -> commit_root_sem, which is problematic because we will allocate blocks while holding tree roots. Fix this by making the list itself protected by the fs_info->block_group_cache_lock. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
While documenting the usage of the commit_root_sem, I noticed that we do not actually take the commit_root_sem in the case of the free space cache. This is problematic because we're supposed to hold that sem while we're reading the commit roots, which is what we do for the free space cache. The reason I did it inline when I originally wrote the code was because there's the case of unpinning where we need to make sure that the free space cache is loaded if we're going to use the free space cache. But we can accomplish the same thing by simply waiting for the cache to be loaded. Rework this code to load the free space cache asynchronously. This allows us to greatly cleanup the caching code because now it's all shared by the various caching methods. We also are now in a position to have the commit_root semaphore held while we're loading the free space cache. And finally our modification of ->last_byte_to_unpin is removed because it can be handled in the proper way on commit. Some care must be taken when replaying the log, when we expect that the free space cache will be read entirely before we start excluding space to replay. This could lead to overwriting space during replay. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Historically we've allowed recursive locking specifically for the free space inode. This is because we are only doing reads and know that it's safe. However we don't actually need this feature, we can get away with reading the commit root for the extents. In fact if we want to allow asynchronous loading of the free space cache we have to use the commit root, otherwise we will deadlock. Switch to using the commit root for the file extents. These are only read at load time, and are replaced as soon as we start writing the cache out to disk. The cache is never read again, so this is legitimate. This matches what we do for the inode itself, as we read that from the commit root as well. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
The free space cache has been special in that we would load it right away instead of farming the work off to a worker thread. This resulted in some weirdness that had to be taken into account for this fact, namely that if we every found a block group being cached the fast way we had to wait for it to finish, because we could get the cache before it had been validated and we may throw the cache away. To handle this particular case instead create a temporary btrfs_free_space_ctl to load the free space cache into. Then once we've validated that it makes sense, copy it's contents into the actual block_group->free_space_ctl. This allows us to avoid the problems of needing to wait for the caching to complete, we can clean up the discard extent handling stuff in __load_free_space_cache, and we no longer need to do the merge_space_tree() because the space is added one by one into the real free_space_ctl. This will allow further reworks of how we handle loading the free space cache. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This passes in the block_group and the free_space_ctl, but we can get this from the block group itself. Part of this is because we call it from __load_free_space_cache, which can be called for the inode cache as well. Move that call into the block group specific load section, wrap it in the right lock that we need for the assertion (but otherwise this is safe without the lock because this happens in single-thread context). Fix up the arguments to only take the block group. Add a lockdep_assert as well for good measure to make sure we don't mess up the locking again. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Currently unpin_extent_range happens in the transaction commit context, so we are protected from ->last_byte_to_unpin changing while we're unpinning, because any new transactions would have to wait for us to complete before modifying ->last_byte_to_unpin. However in the future we may want to change how this works, for instance with async unpinning or other such TODO items. To prepare for that future explicitly protect ->last_byte_to_unpin with the commit_root_sem so we are sure it won't change while we're doing our work. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
While writing an explanation for the need of the commit_root_sem for btrfs_prepare_extent_commit, I realized we have a slight hole that could result in leaked space if we have to do the old style caching. Consider the following scenario commit root +----+----+----+----+----+----+----+ |\\\\| |\\\\|\\\\| |\\\\|\\\\| +----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 new commit root +----+----+----+----+----+----+----+ | | | |\\\\| | |\\\\| +----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 Prior to this patch, we run btrfs_prepare_extent_commit, which updates the last_byte_to_unpin, and then we subsequently run switch_commit_roots. In this example lets assume that caching_ctl->progress == 1 at btrfs_prepare_extent_commit() time, which means that cache->last_byte_to_unpin == 1. Then we go and do the switch_commit_roots(), but in the meantime the caching thread has made some more progress, because we drop the commit_root_sem and re-acquired it. Now caching_ctl->progress == 3. We swap out the commit root and carry on to unpin. The race can happen like: 1) The caching thread was running using the old commit root when it found the extent for [2, 3); 2) Then it released the commit_root_sem because it was in the last item of a leaf and the semaphore was contended, and set ->progress to 3 (value of 'last'), as the last extent item in the current leaf was for the extent for range [2, 3); 3) Next time it gets the commit_root_sem, will start using the new commit root and search for a key with offset 3, so it never finds the hole for [2, 3). So the caching thread never saw [2, 3) as free space in any of the commit roots, and by the time finish_extent_commit() was called for the range [0, 3), ->last_byte_to_unpin was 1, so it only returned the subrange [0, 1) to the free space cache, skipping [2, 3). In the unpin code we have last_byte_to_unpin == 1, so we unpin [0,1), but do not unpin [2,3). However because caching_ctl->progress == 3 we do not see the newly freed section of [2,3), and thus do not add it to our free space cache. This results in us missing a chunk of free space in memory (on disk too, unless we have a power failure before writing the free space cache to disk). Fix this by making sure the ->last_byte_to_unpin is set at the same time that we swap the commit roots, this ensures that we will always be consistent. CC: stable@vger.kernel.org # 5.8+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> [ update changelog with Filipe's review comments ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
While fixing up our ->last_byte_to_unpin locking I noticed that we will shorten len based on ->last_byte_to_unpin if we're caching when we're adding back the free space. This is correct for the free space, as we cannot unpin more than ->last_byte_to_unpin, however we use len to adjust the ->bytes_pinned counters and such, which need to track the actual pinned usage. This could result in WARN_ON(space_info->bytes_pinned) triggering at unmount time. Fix this by using a local variable for the amount to add to free space cache, and leave len untouched in this case. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
After the rwsem replaced the tree lock implementation, the extent buffer got smaller but leaving some holes behind. By changing log_index type and reordering, we can squeeze the size further to 240 bytes, measured on release config on x86_64. Log_index spans only 3 values and needs to be signed. Before: struct extent_buffer { u64 start; /* 0 8 */ long unsigned int len; /* 8 8 */ long unsigned int bflags; /* 16 8 */ struct btrfs_fs_info * fs_info; /* 24 8 */ spinlock_t refs_lock; /* 32 4 */ atomic_t refs; /* 36 4 */ atomic_t io_pages; /* 40 4 */ int read_mirror; /* 44 4 */ struct callback_head callback_head __attribute__((__aligned__(8))); /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ pid_t lock_owner; /* 64 4 */ bool lock_recursed; /* 68 1 */ /* XXX 3 bytes hole, try to pack */ struct rw_semaphore lock; /* 72 40 */ short int log_index; /* 112 2 */ /* XXX 6 bytes hole, try to pack */ struct page * pages[16]; /* 120 128 */ /* size: 248, cachelines: 4, members: 14 */ /* sum members: 239, holes: 2, sum holes: 9 */ /* forced alignments: 1 */ /* last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); After: struct extent_buffer { u64 start; /* 0 8 */ long unsigned int len; /* 8 8 */ long unsigned int bflags; /* 16 8 */ struct btrfs_fs_info * fs_info; /* 24 8 */ spinlock_t refs_lock; /* 32 4 */ atomic_t refs; /* 36 4 */ atomic_t io_pages; /* 40 4 */ int read_mirror; /* 44 4 */ struct callback_head callback_head __attribute__((__aligned__(8))); /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ pid_t lock_owner; /* 64 4 */ bool lock_recursed; /* 68 1 */ s8 log_index; /* 69 1 */ /* XXX 2 bytes hole, try to pack */ struct rw_semaphore lock; /* 72 40 */ struct page * pages[16]; /* 112 128 */ /* size: 240, cachelines: 4, members: 14 */ /* sum members: 238, holes: 1, sum holes: 2 */ /* forced alignments: 1 */ /* last cacheline: 48 bytes */ } __attribute__((__aligned__(8))); Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We no longer distinguish between blocking and spinning, so rip out all this code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
Now that we're using a rw_semaphore we no longer need to indicate if a lock is blocking or not, nor do we need to flip the entire path from blocking to spinning. Remove these helpers and all the places they are called. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The context structure unnecessarily stores copy of the checksum size, that can be now easily obtained from fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The state structure unnecessarily stores copy of the checksum size, that can be now easily obtained from fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Remove local variable that is then used just once and replace it with fs_info::csum_size. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The fs_info value is 32bit, switch also the local u16 variables. This leads to a better assembly code generated due to movzwl. This simple change will shave some bytes on x86_64 and release config: text data bss dec hex filename 1090000 17980 14912 1122892 11224c pre/btrfs.ko 1089794 17980 14912 1122686 11217e post/btrfs.ko DELTA: -206 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
btrfs_get_16 shows up in the system performance profiles (helper to read 16bit values from on-disk structures). This is partially because of the checksum size that's frequently read along with data reads/writes, other u16 uses are from item size or directory entries. Replace all calls to btrfs_super_csum_size by the cached value from fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
btrfs_csum_bytes_to_leaves shows up in system profiles, which makes it a candidate for optimizations. After the 64bit division has been replaced by shift, there's still a calculation done each time the function is called: checksums per leaf. As this is a constant value for the entire filesystem lifetime, we can calculate it once at mount time and reuse. This also allows to reduce the division to 64bit/32bit as we know the constant will always fit the 32bit type. Replace the open-coded rounding up with a macro that internally handles the 64bit division and as it's now a short function, make it static inline (slight code increase, slight stack usage reduction). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
In many places we need the checksum size and it is inefficient to read it from the raw superblock. Store the value into fs_info, actual use will be in followup patches. The size is u32 as it allows to generate better assembly than with u16. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The value of super_block::s_blocksize_bits is the same as fs_info::sectorsize_bits, but we don't need to do the extra dereferences in many functions and storing the bits as u32 (in fs_info) generates shorter assembly. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Change free_space_bitmap_size to take btrfs_fs_info so we can get the sectorsize_bits to do calculations. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
We do a lot of calculations where we divide or multiply by sectorsize. We also know and make sure that sectorsize is a power of two, so this means all divisions can be turned to shifts and avoid eg. expensive u64/u32 divisions. The type is u32 as it's more register friendly on x86_64 compared to u8 and the resulting assembly is smaller (movzbl vs movl). There's also superblock s_blocksize_bits but it's usually one more pointer dereference farther than fs_info. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The variable @page_size in submit_extent_page() is not related to page size. It can already be smaller than PAGE_SIZE, so rename it to io_size to reduce confusion, this is especially important for later subpage support. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
If we're reading partial page, btrfs will warn about this as read/write is always done in sector size, which now equals page size. But for the upcoming subpage read-only support, our data read is only aligned to sectorsize, which can be smaller than page size. Thus here we change the warning condition to check it against sectorsize, the behavior is not changed for regular sectorsize == PAGE_SIZE case, and won't report error for subpage read. Also, pass the proper start/end with bv_offset for check_data_csum() to handle. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Function process_pages_contig() does not only handle page locking but also other operations. Rename the local variable pages_locked to pages_processed to reduce confusion. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
For check_data_csum(), the page we're using is directly from the inode mapping, thus it has valid page_offset(). We can use (page_offset() + pg_off) to replace @start parameter completely, while the @len should always be sectorsize. Since we're here, also add some comment, as there are quite some confusion in words like start/offset, without explaining whether it's file_offset or logical bytenr. This should not affect the existing behavior, as for current sectorsize == PAGE_SIZE case, @pgoff should always be 0, and len is always PAGE_SIZE (or sectorsize from the dio read path). Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
All callers of btrfs_wq_submit_bio() pass struct inode as @private_data, so there is no need for it to be (void *), replace it with "struct inode *inode". While we can extract fs_info from struct inode, also remove the @fs_info parameter. Since we're here, also replace all the (void *private_data) into (struct inode *inode). Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The @failed_start parameter is only paired with @exclusive_bits, and those parameters are only used for EXTENT_LOCKED bit, which have their own wrappers lock_extent_bits(). Thus for regular set_extent_bit() calls, the failed_start makes no sense, just sink the parameter. Also, since @failed_start and @exclusive_bits are used in pairs, add an assert to make it obvious. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The pitfall here is, if the parameter @bits has multiple bits set, we will return the first range which just has one of the specified bits set. This is a little tricky if we want an exact match. Anyway, update the comment to make that clear. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
The return value of that function is completely wrong. That function only returns 0 if the extent buffer doesn't need to be submitted. The "ret = 1" and "ret = 0" are determined by the return value of "test_and_clear_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)". And if we get ret == 1, it's because the extent buffer is dirty, and we set its status to EXTENT_BUFFER_WRITE_BACK, and continue to page locking. While if we get ret == 0, it means the extent is not dirty from the beginning, so we don't need to write it back. The caller also follows this, in btree_write_cache_pages(), if lock_extent_buffer_for_io() returns 0, we just skip the extent buffer completely. So the comment is completely wrong. Since we're here, also change the description a little. The write bio flushing won't be visible to the caller, thus it's not an major feature. In the main description, only describe the locking part to make the point more clear. For reference, added in commit 2e3c2513 ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-