An error occurred fetching the project authors.
- 15 Oct, 2019 1 commit
-
-
Qu Wenruo authored
[Background] Btrfs qgroup uses two types of reserved space for METADATA space, PERTRANS and PREALLOC. PERTRANS is metadata space reserved for each transaction started by btrfs_start_transaction(). While PREALLOC is for delalloc, where we reserve space before joining a transaction, and finally it will be converted to PERTRANS after the writeback is done. [Inconsistency] However there is inconsistency in how we handle PREALLOC metadata space. The most obvious one is: In btrfs_buffered_write(): btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true); We always free qgroup PREALLOC meta space. While in btrfs_truncate_block(): btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0)); We only free qgroup PREALLOC meta space when something went wrong. [The Correct Behavior] The correct behavior should be the one in btrfs_buffered_write(), we should always free PREALLOC metadata space. The reason is, the btrfs_delalloc_* mechanism works by: - Reserve metadata first, even it's not necessary In btrfs_delalloc_reserve_metadata() - Free the unused metadata space Normally in: btrfs_delalloc_release_extents() |- btrfs_inode_rsv_release() Here we do calculation on whether we should release or not. E.g. for 64K buffered write, the metadata rsv works like: /* The first page */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=0 total: num_bytes=calc_inode_reservations() /* The first page caused one outstanding extent, thus needs metadata rsv */ /* The 2nd page */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=calc_inode_reservations() total: not changed /* The 2nd page doesn't cause new outstanding extent, needs no new meta rsv, so we free what we have reserved */ /* The 3rd~16th pages */ reserve_meta: num_bytes=calc_inode_reservations() free_meta: num_bytes=calc_inode_reservations() total: not changed (still space for one outstanding extent) This means, if btrfs_delalloc_release_extents() determines to free some space, then those space should be freed NOW. So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other than btrfs_qgroup_convert_reserved_meta(). The good news is: - The callers are not that hot The hottest caller is in btrfs_buffered_write(), which is already fixed by commit 336a8bb8 ("btrfs: Fix wrong btrfs_delalloc_release_extents parameter"). Thus it's not that easy to cause false EDQUOT. - The trans commit in advance for qgroup would hide the bug Since commit f5fef459 ("btrfs: qgroup: Make qgroup async transaction commit more aggressive"), when btrfs qgroup metadata free space is slow, it will try to commit transaction and free the wrongly converted PERTRANS space, so it's not that easy to hit such bug. [FIX] So to fix the problem, remove the @qgroup_free parameter for btrfs_delalloc_release_extents(), and always pass true to btrfs_inode_rsv_release(). Reported-by:
Filipe Manana <fdmanana@suse.com> Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc") CC: stable@vger.kernel.org # 4.19+ Reviewed-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 01 Oct, 2019 1 commit
-
-
Josef Bacik authored
A user reported a lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.2.11-gentoo #2 Not tainted ------------------------------------------------------ kswapd0/711 is trying to acquire lock: 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500 but task is already holding lock: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}: kmem_cache_alloc+0x1f/0x1c0 btrfs_alloc_inode+0x1f/0x260 alloc_inode+0x16/0xa0 new_inode+0xe/0xb0 btrfs_new_inode+0x70/0x610 btrfs_symlink+0xd0/0x420 vfs_symlink+0x9c/0x100 do_symlinkat+0x66/0xe0 do_syscall_64+0x55/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (sb_internal){.+.+}: __sb_start_write+0xf6/0x150 start_transaction+0x3a8/0x500 btrfs_commit_inode_delayed_inode+0x59/0x110 btrfs_evict_inode+0x19e/0x4c0 evict+0xbc/0x1f0 inode_lru_isolate+0x113/0x190 __list_lru_walk_one.isra.4+0x5c/0x100 list_lru_walk_one+0x32/0x50 prune_icache_sb+0x36/0x80 super_cache_scan+0x14a/0x1d0 do_shrink_slab+0x131/0x320 shrink_node+0xf7/0x380 balance_pgdat+0x2d5/0x640 kswapd+0x2ba/0x5e0 kthread+0x147/0x160 ret_from_fork+0x24/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sb_internal); lock(fs_reclaim); lock(sb_internal); *** DEADLOCK *** 3 locks held by kswapd0/711: #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30 #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380 #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0 stack backtrace: CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017 Call Trace: dump_stack+0x85/0xc7 print_circular_bug.cold.40+0x1d9/0x235 __lock_acquire+0x18b1/0x1f00 lock_acquire+0xa6/0x170 ? start_transaction+0x3a8/0x500 __sb_start_write+0xf6/0x150 ? start_transaction+0x3a8/0x500 start_transaction+0x3a8/0x500 btrfs_commit_inode_delayed_inode+0x59/0x110 btrfs_evict_inode+0x19e/0x4c0 ? var_wake_function+0x20/0x20 evict+0xbc/0x1f0 inode_lru_isolate+0x113/0x190 ? discard_new_inode+0xc0/0xc0 __list_lru_walk_one.isra.4+0x5c/0x100 ? discard_new_inode+0xc0/0xc0 list_lru_walk_one+0x32/0x50 prune_icache_sb+0x36/0x80 super_cache_scan+0x14a/0x1d0 do_shrink_slab+0x131/0x320 shrink_node+0xf7/0x380 balance_pgdat+0x2d5/0x640 kswapd+0x2ba/0x5e0 ? __wake_up_common_lock+0x90/0x90 kthread+0x147/0x160 ? balance_pgdat+0x640/0x640 ? __kthread_create_on_node+0x160/0x160 ret_from_fork+0x24/0x30 This is because btrfs_new_inode() calls new_inode() under the transaction. We could probably move the new_inode() outside of this but for now just wrap it in memalloc_nofs_save(). Reported-by:
Zdenek Sojka <zsojka@seznam.cz> Fixes: 712e36c5 ("btrfs: use GFP_KERNEL in btrfs_alloc_inode") CC: stable@vger.kernel.org # 4.16+ Reviewed-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 09 Sep, 2019 18 commits
-
-
Omar Sandoval authored
Since commit fee187d9 ("Btrfs: do not set EXTENT_DIRTY along with EXTENT_DELALLOC"), we never set EXTENT_DIRTY in inode->io_tree, so we can simplify and stop trying to clear it. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Omar Sandoval <osandov@fb.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
Further simplifaction of the get/set helpers is possible when the token is uniquely tied to an extent buffer. A condition and an assignment can be avoided. The initializations are moved closer to the first use when the extent buffer is valid. There's one exception in __push_leaf_left where the token is reused. Signed-off-by:
David Sterba <dsterba@suse.com>
-
David Sterba authored
The file ctree.h serves as a header for everything and has become quite bloated. Split some helpers that are generic and create a new file that should be the catch-all for code that's not btrfs-specific. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Christophe Leroy authored
Various notifications of type "BUG kmalloc-4096 () : Redzone overwritten" have been observed recently in various parts of the kernel. After some time, it has been made a relation with the use of BTRFS filesystem and with SLUB_DEBUG turned on. [ 22.809700] BUG kmalloc-4096 (Tainted: G W ): Redzone overwritten [ 22.810286] INFO: 0xbe1a5921-0xfbfc06cd. First byte 0x0 instead of 0xcc [ 22.810866] INFO: Allocated in __load_free_space_cache+0x588/0x780 [btrfs] age=22 cpu=0 pid=224 [ 22.811193] __slab_alloc.constprop.26+0x44/0x70 [ 22.811345] kmem_cache_alloc_trace+0xf0/0x2ec [ 22.811588] __load_free_space_cache+0x588/0x780 [btrfs] [ 22.811848] load_free_space_cache+0xf4/0x1b0 [btrfs] [ 22.812090] cache_block_group+0x1d0/0x3d0 [btrfs] [ 22.812321] find_free_extent+0x680/0x12a4 [btrfs] [ 22.812549] btrfs_reserve_extent+0xec/0x220 [btrfs] [ 22.812785] btrfs_alloc_tree_block+0x178/0x5f4 [btrfs] [ 22.813032] __btrfs_cow_block+0x150/0x5d4 [btrfs] [ 22.813262] btrfs_cow_block+0x194/0x298 [btrfs] [ 22.813484] commit_cowonly_roots+0x44/0x294 [btrfs] [ 22.813718] btrfs_commit_transaction+0x63c/0xc0c [btrfs] [ 22.813973] close_ctree+0xf8/0x2a4 [btrfs] [ 22.814107] generic_shutdown_super+0x80/0x110 [ 22.814250] kill_anon_super+0x18/0x30 [ 22.814437] btrfs_kill_super+0x18/0x90 [btrfs] [ 22.814590] INFO: Freed in proc_cgroup_show+0xc0/0x248 age=41 cpu=0 pid=83 [ 22.814841] proc_cgroup_show+0xc0/0x248 [ 22.814967] proc_single_show+0x54/0x98 [ 22.815086] seq_read+0x278/0x45c [ 22.815190] __vfs_read+0x28/0x17c [ 22.815289] vfs_read+0xa8/0x14c [ 22.815381] ksys_read+0x50/0x94 [ 22.815475] ret_from_syscall+0x0/0x38 Commit 69d24804 ("btrfs: use copy_page for copying pages instead of memcpy") changed the way bitmap blocks are copied. But allthough bitmaps have the size of a page, they were allocated with kzalloc(). Most of the time, kzalloc() allocates aligned blocks of memory, so copy_page() can be used. But when some debug options like SLAB_DEBUG are activated, kzalloc() may return unaligned pointer. On powerpc, memcpy(), copy_page() and other copying functions use 'dcbz' instruction which provides an entire zeroed cacheline to avoid memory read when the intention is to overwrite a full line. Functions like memcpy() are writen to care about partial cachelines at the start and end of the destination, but copy_page() assumes it gets pages. As pages are naturally cache aligned, copy_page() doesn't care about partial lines. This means that when copy_page() is called with a misaligned pointer, a few leading bytes are zeroed. To fix it, allocate bitmaps through kmem_cache instead of using kzalloc() The cache pool is created with PAGE_SIZE alignment constraint. Reported-by:
Erhard F. <erhard_f@mailbox.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204371 Fixes: 69d24804 ("btrfs: use copy_page for copying pages instead of memcpy") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by:
Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by:
David Sterba <dsterba@suse.com> [ rename to btrfs_free_space_bitmap ] Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Correctly handle failure cases when adding an ordered extents in case of REGULAR or PREALLOC extents. Remove the BUG_ON. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Add a comment explaining why we keep the BUG also use the already read and cached value of extent ram bytes stored in 'ram_bytes'. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The extent range check right after the "out_check" label is redundant, because the only way it can trigger is if we have an inline extent. In this case it makes more sense to actually move it in the branch explictly dealing with inlines extents. What's more, the nested 'if (nocow)' can never be true because for inline extents we always do COW and there is no chance 'nocow' can be true, just remove that check. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
There is no point in checking the type of the extent again just to set the 'type' variable, when this check has already been performed before. Instead, extend the original if branch with an 'else' clause. This allows to remove one local variable and make it obvious how the code flow differs for prealloc/regular extents. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
run_delalloc_nocow contains numerous, somewhat subtle, checks when figuring out whether a particular extent should be CoW'ed or not. This patch explicitly states the assumptions those checks verify. As a result also document 2 of the more subtle checks in check_committed_ref as well. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Of the 22 (!!!) local variables declared in this function only 9 have function-wide context. Of the remaining 13, 12 are needed in the main while loop of the function and 1 is needed in a tiny if branch, only in case we have prealloc extent. This commit reduces the lifespan of every variable to its bare minimum. It also renames the 'nolock' boolean to freespace_inode to clearly indicate its purpose. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Josef Bacik authored
btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that it doesn't take into account any splitting at the levels, because truncate will never split nodes. However truncate _and_ changing will never split nodes, so rename btrfs_calc_trunc_metadata_size to btrfs_calc_metadata_size. Also btrfs_calc_trans_metadata_size is purely for inserting items, so rename this to btrfs_calc_insert_metadata_size. Making these clearer will help when I start using them differently in upcoming patches. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We have this weird space flushing loop inside inode.c for evict where we'll do the normal LIMIT flush, and then commit the transaction and hope we get our space. This is super janky, and in fact there's really nothing stopping us from using FLUSH_ALL except that we run delayed iputs, which means we could deadlock. So introduce a new flush state for eviction that does the normal priority flushing with all of the states that are safe for eviction. The nice side-effect of this is that we'll try harder for evictions. Previously if (for example generic/269) you had a bunch of other operations happening on the fs you could race with those reservations when committing the transaction, and eventually miss getting a reservation for the evict. With this code we'll have our ticket in place through the transaction commit, so any pinned bytes will go to our pending evictions first. Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Josef Bacik authored
This is prep work for moving all of the block group cache code into its own file. Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
David Sterba <dsterba@suse.com> [ minor comment updates ] Signed-off-by:
David Sterba <dsterba@suse.com>
-
Jia-Ju Bai authored
In insert_inline_extent(), the case that checks compressed_size > 0 and compressed_pages = NULL cannot occur, otherwise a null-pointer dereference may occur on line 215: cpage = compressed_pages[i]; To catch this incorrect case, an assertion is added. Reviewed-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It's unlikely in-band dedupe is going to land so just remove any leftovers - dedupe.h header as well as the 'dedupe' parameter to btrfs_set_extent_delalloc. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It was added in ba8b04c1 ("btrfs: extend btrfs_set_extent_delalloc and its friends to support in-band dedupe and subpage size patchset") as a preparatory patch for in-band and subapge block size patchsets. However neither of those are likely to be merged anytime soon and the code has diverged significantly from the last public post of either of those patchsets. It's unlikely either of the patchests are going to use those preparatory steps so just remove the variables. Since cow_file_range also took delalloc_end to pass it to extent_clear_unlock_delalloc remove the parameter from that function as well. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This label is only executed if compress_file_range fails to create an inline extent. So move its code in the semantically related inline extent handling branch. No functional changes. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
compress_file_range returns a void, yet uses a function parameter as a return value. Make that more idiomatic by simply returning the number of compressed extents directly. Also track such extents in more aptly named variables. No functional changes. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 17 Jul, 2019 1 commit
-
-
Qu Wenruo authored
As btrfs(5) specified: Note If nodatacow or nodatasum are enabled, compression is disabled. If NODATASUM or NODATACOW set, we should not compress the extent. Normally NODATACOW is detected properly in run_delalloc_range() so compression won't happen for NODATACOW. However for NODATASUM we don't have any check, and it can cause compressed extent without csum pretty easily, just by: mkfs.btrfs -f $dev mount $dev $mnt -o nodatasum touch $mnt/foobar mount -o remount,datasum,compress $mnt xfs_io -f -c "pwrite 0 128K" $mnt/foobar And in fact, we have a bug report about corrupted compressed extent without proper data checksum so even RAID1 can't recover the corruption. (https://bugzilla.kernel.org/show_bug.cgi?id=199707) Running compression without proper checksum could cause more damage when corruption happens, as compressed data could make the whole extent unreadable, so there is no need to allow compression for NODATACSUM. The fix will refactor the inode compression check into two parts: - inode_can_compress() As the hard requirement, checked at btrfs_run_delalloc_range(), so no compression will happen for NODATASUM inode at all. - inode_need_compress() As the soft requirement, checked at btrfs_run_delalloc_range() and compress_file_range(). Reported-by:
James Harvey <jamespharvey20@gmail.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by:
Qu Wenruo <wqu@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 04 Jul, 2019 1 commit
-
-
Josef Bacik authored
We have code for data and metadata reservations for delalloc. There's quite a bit of code here, and it's used in a lot of places so I've separated it out to it's own file. inode.c and file.c are already pretty large, and this code is complicated enough to live in its own space. Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 02 Jul, 2019 2 commits
-
-
Josef Bacik authored
We have been seeing issues in production where a cleaner script will end up unlinking a bunch of files that have pending iputs. This means they will get their final iput's run at btrfs-cleaner time and thus are not throttled, which impacts the workload. Since we are unlinking these files we can just drop the delayed iput at unlink time. We are already holding a reference to the inode so this will not be the final iput and thus is completely safe to do at this point. Doing this means we are more likely to be doing the final iput at unlink time, and thus will get the IO charged to the caller and get throttled appropriately without affecting the main workload. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Presently btrfs_map_block is used not only to do everything necessary to map a bio to the underlying allocation profile but it's also used to identify how much data could be written based on btrfs' stripe logic without actually submitting anything. This is achieved by passing NULL for 'bbio_ret' parameter. This patch refactors all callers that require just the mapping length by switching them to using btrfs_io_geometry instead of calling btrfs_map_block with a special NULL value for 'bbio_ret'. No functional change. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 01 Jul, 2019 4 commits
-
-
Johannes Thumshirn authored
btrfs_print_data_csum_error() still assumed checksums to be 32 bit in size. Make it size agnostic. Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
Currently btrfs_csum_data() relied on the crc32c() wrapper around the crypto framework for calculating the CRCs. As we have our own crypto_shash structure in the fs_info now, we can directly call into the crypto framework without going trough the wrapper. This way we can even remove the btrfs_csum_data() and btrfs_csum_final() wrappers. The module dependency on crc32c is preserved via MODULE_SOFTDEP("pre: crc32c"), which was previously provided by LIBCRC32C config option doing the same. Signed-off-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
There several functions which open code btrfs_lock_and_flush_ordered_range, just replace them with a call to the function. No functional changes. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Liu Bo authored
This code was first introduced in 5f39d397 ("Btrfs: Create extent_buffer interface for large blocksizes") and the function was named btrfs_unlink_trans. It later got renamed to __btrfs_unlink_inode and finally commit 16cdcec7 ("btrfs: implement delayed inode items operation") changed the way inodes are deleted and obviated the need for those two members. Signed-off-by:
Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by:
David Sterba <dsterba@suse.com> [ replace changelog by Nikolay's version ] Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 28 May, 2019 1 commit
-
-
Filipe Manana authored
When replaying a log that contains a new file or directory name that needs to be added to its parent directory, we end up updating the mtime and the ctime of the parent directory to the current time after we have set their values to the correct ones (set at fsync time), efectivelly losing them. Sample reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/dir $ touch /mnt/dir/file # fsync of the directory is optional, not needed $ xfs_io -c fsync /mnt/dir $ xfs_io -c fsync /mnt/dir/file $ stat -c %Y /mnt/dir 1557856079 <power failure> $ sleep 3 $ mount /dev/sdb /mnt $ stat -c %Y /mnt/dir 1557856082 --> should have been 1557856079, the mtime is updated to the current time when replaying the log Fix this by not updating the mtime and ctime to the current time at btrfs_add_link() when we are replaying a log tree. This could be triggered by my recent fsync fuzz tester for fstests, for which an fstests patch exists titled "fstests: generic, fsync fuzz tester with fsstress". Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations") CC: stable@vger.kernel.org # 4.4+ Reviewed-by:
Nikolay Borisov <nborisov@suse.com> Signed-off-by:
Filipe Manana <fdmanana@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- 02 May, 2019 9 commits
-
-
Nikolay Borisov authored
Recent refactoring of cow_file_range_async means it's now possible to request a rather large physically contiguous memory via kmalloc. The size is dependent on the number of 512k chunks that the compressed range consists of. David reported multiple OOM messages on such large allocations. Fix it by switching to using kvmalloc. Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Irrespective of whether the compress code fell back to uncompressed or a compressed extent has to be submitted, the extent range is always locked. So factor out the common lock_extent call at the beginning of the loop. No functional changes just removes one duplicate lock_extent call. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The inode never changes so it's sufficient to dereference it and get the iotree only once, before the execution of the main loop. No functional changes, only the size of the function is decreased: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44) Function old new delta submit_compressed_extents 1240 1196 -44 Total: Before=88476, After=88432, chg -0.05% Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
All context this function needs is held within struct async_chunk. Currently we not only pass the struct but also every individual member. This is redundant, simplify it by only passing struct async_chunk and leaving it to compress_file_range to extract the values it requires. No functional changes. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The associated btrfs_work already contains a reference to the fs_info so use that instead of passing it via async_chunk. No functional changes. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Now that we have an explicit async_chunk struct rename references to variables of this type to async_chunk. No functional changes. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This commit changes the implementation of cow_file_range_async in order to get rid of the BUG_ON in the middle of the loop. Additionally it reworks the inner loop in the hopes of making it more understandable. The idea is to make async_cow be a top-level structured, shared amongst all chunks being sent for compression. This allows to perform one memory allocation at the beginning and gracefully fail the IO if there isn't enough memory. Now, each chunk is going to be described by an async_chunk struct. It's the responsibility of the final chunk to actually free the memory. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Nikolay Borisov <nborisov@suse.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Al Viro authored
a lot of stuff remains in ->destroy_inode() Acked-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 01 May, 2019 1 commit
-
-
Linus Torvalds authored
The 'extent_type' variable does seem to be reliably initialized, but it's _very_ non-obvious, since there's a "goto next" case that jumps over the normal initialization. That will then always trigger the "start >= extent_end" test, which will end up never falling through to the use of that variable. But the code is certainly not obvious, and the compiler warning looks reasonable. Make 'extent_type' an int, and initialize it to an invalid negative value, which seems to be the common pattern in other places. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 30 Apr, 2019 1 commit
-
-
Christoph Hellwig authored
We only have two callers that need the integer loop iterator, and they can easily maintain it themselves. Suggested-by:
Matthew Wilcox <willy@infradead.org> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Acked-by:
David Sterba <dsterba@suse.com> Reviewed-by:
Hannes Reinecke <hare@suse.com> Acked-by:
Coly Li <colyli@suse.de> Reviewed-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-