- 06 Aug, 2018 40 commits
-
-
Filipe Manana authored
If we end up with logging an inode reference item which has the same name but different index from the one we have persisted, we end up failing when replaying the log with an errno value of -EEXIST. The error comes from btrfs_add_link(), which is called from add_inode_ref(), when we are replaying an inode reference item. Example scenario where this happens: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foo $ ln /mnt/foo /mnt/bar $ sync # Rename the first hard link (foo) to a new name and rename the second # hard link (bar) to the old name of the first hard link (foo). $ mv /mnt/foo /mnt/qwerty $ mv /mnt/bar /mnt/foo # Create a new file, in the same parent directory, with the old name of # the second hard link (bar) and fsync this new file. # We do this instead of calling fsync on foo/qwerty because if we did # that the fsync resulted in a full transaction commit, not triggering # the problem. $ touch /mnt/bar $ xfs_io -c "fsync" /mnt/bar <power fail> $ mount /dev/sdb /mnt mount: mount /dev/sdb on /mnt failed: File exists So fix this by checking if a conflicting inode reference exists (same name, same parent but different index), removing it (and the associated dir index entries from the parent inode) if it exists, before attempting to add the new reference. A test case for fstests follows soon. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
If we're trying to make a data reservation and we have to allocate a data chunk we could leak ret == 1, as do_chunk_alloc() will return 1 if it allocated a chunk. Since the end of the function is the success path just return 0. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The exported helper just calls the static one. There's no obvious reason to have them separate eg. for performance reasons where the static one could be better optimized in the same unit. There's a slight decrease in code size and stack consumption. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Lock owner and nesting level have been unused since day 1, probably copy&pasted from the extent_buffer locking scheme without much thinking. The locking of device replace is simpler and does not need any lock nesting. Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Added in 58176a96 ("Btrfs: Add per-root block accounting and sysfs entries") in 2007, the roots had names exported in sysfs. The code was commented out in 4df27c4d ("Btrfs: change how subvolumes are organized") and cleaned by 182608c8 ("btrfs: remove old unused commented out code"). Signed-off-by: David Sterba <dsterba@suse.com>
-
Adam Borowski authored
Requiring a read-write descriptor conflicts both ways with exec, returning ETXTBSY whenever you try to defrag a program that's currently being run, or causing intermittent exec failures on a live system being defragged. As defrag doesn't change the file's contents in any way, there's no reason to consider it a rw operation. Thus, let's check only whether the file could have been opened rw. Such access control is still needed as currently defrag can use extra disk space, and might trigger bugs. We return EINVAL when the request is invalid; here it's ok but merely the user has insufficient privileges. Thus, the EPERM return value reflects the error better -- as discussed in the identical case for dedupe. According to codesearch.debian.net, no userspace program distinguishes these values beyond strerror(). Signed-off-by: Adam Borowski <kilobyte@angband.pl> Reviewed-by: David Sterba <dsterba@suse.com> [ fold the EPERM patch from Adam ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Josef Bacik authored
We recently ran into the following deadlock involving btrfs_write_inode(): [ +0.005066] __schedule+0x38e/0x8c0 [ +0.007144] schedule+0x36/0x80 [ +0.006447] bit_wait+0x11/0x60 [ +0.006446] __wait_on_bit+0xbe/0x110 [ +0.007487] ? bit_wait_io+0x60/0x60 [ +0.007319] __inode_wait_for_writeback+0x96/0xc0 [ +0.009568] ? autoremove_wake_function+0x40/0x40 [ +0.009565] inode_wait_for_writeback+0x21/0x30 [ +0.009224] evict+0xb0/0x190 [ +0.006099] iput+0x1a8/0x210 [ +0.006103] btrfs_run_delayed_iputs+0x73/0xc0 [ +0.009047] btrfs_commit_transaction+0x799/0x8c0 [ +0.009567] btrfs_write_inode+0x81/0xb0 [ +0.008008] __writeback_single_inode+0x267/0x320 [ +0.009569] writeback_sb_inodes+0x25b/0x4e0 [ +0.008702] wb_writeback+0x102/0x2d0 [ +0.007487] wb_workfn+0xa4/0x310 [ +0.006794] ? wb_workfn+0xa4/0x310 [ +0.007143] process_one_work+0x150/0x410 [ +0.008179] worker_thread+0x6d/0x520 [ +0.007490] kthread+0x12c/0x160 [ +0.006620] ? put_pwq_unlocked+0x80/0x80 [ +0.008185] ? kthread_park+0xa0/0xa0 [ +0.007484] ? do_syscall_64+0x53/0x150 [ +0.007837] ret_from_fork+0x29/0x40 Writeback calls: btrfs_write_inode btrfs_commit_transaction btrfs_run_delayed_iputs If iput() is called on that same inode, evict() will wait for writeback forever. btrfs_write_inode() was originally added way back in 4730a4bc ("btrfs_dirty_inode") to support O_SYNC writes. However, ->write_inode() hasn't been used for O_SYNC since 148f948b ("vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode"), so btrfs_write_inode() is actually unnecessary (and leads to a bunch of unnecessary commits). Get rid of it, which also gets rid of the deadlock. CC: stable@vger.kernel.org # 3.2+ Signed-off-by: Josef Bacik <jbacik@fb.com> [Omar: new commit message] Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced from the passed transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced from the passed transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
This function is always passed a well-formed tgtdevice so the fs_info can be referenced from there. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced from the passed 'device' argument which is always a well-formed device. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced from the passed transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced from the passed srcdev argument. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
It can be referenced form the passed transaction handle. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
In find_free_extent() under checks: label, we have the following code: search_start = ALIGN(offset, fs_info->stripesize); /* move on to the next group */ if (search_start + num_bytes > block_group->key.objectid + block_group->key.offset) { btrfs_add_free_space(block_group, offset, num_bytes); goto loop; } if (offset < search_start) btrfs_add_free_space(block_group, offset, search_start - offset); BUG_ON(offset > search_start); However ALIGN() is rounding up, thus @search_start >= @offset and that BUG_ON() will never be triggered. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Filipe Manana authored
At send.c:full_send_tree() we were setting the 'key' variable in the loop while never using it later. We were also using two btrfs_key variables to store the initial key for search and the key found in every iteration of the loop. So remove this useless key assignment and use the same btrfs_key variable to store the initial search key and the key found in each iteration. This was introduced in the initial send commit but was never used (commit 31db9f7c ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive"). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The data and metadata callback implementation both use the same function. We can remove the call indirection and intermediate helper completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The data and metadata callback implementation both use the same function. We can remove the call indirection completely. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
All implementations of the callback are trivial and do the same and there's only one user. Merge everything together. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The end_io callbacks passed to btrfs_wq_submit_bio (btrfs_submit_bio_done and btree_submit_bio_done) are effectively the same code, there's no point to do the indirection. Export btrfs_submit_bio_done and call it directly. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
After splitting the start and end hooks in a758781d ("btrfs: separate types for submit_bio_start and submit_bio_done"), some of the function arguments were dropped but not removed from the structure. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
Introduced by c6100a4b ("Btrfs: replace tree->mapping with tree->private_data") to be used in run_one_async_done where it got unused after 736cd52e ("Btrfs: remove nr_async_submits and async_submit_draining"). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Gu Jinxiang authored
Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an image that has an invalid chunk type but does not return an error. Add chunk type check in btrfs_check_chunk_valid, to detect the wrong type combinations. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839Reported-by: Xu Wen <wen.xu@gatech.edu> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
EXTENT_BUFFER_DUMMY is an awful name for this flag. Buffers which have this flag set are not in any way dummy. Rather, they are private in the sense that are not mapped and linked to the global buffer tree. This flag has subtle implications to the way free_extent_buffer works for example, as well as controls whether page->mapping->private_lock is held during extent_buffer release. Pages for an unmapped buffer cannot be under io, nor can they be written by a 3rd party so taking the lock is unnecessary. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ EXTENT_BUFFER_UNMAPPED, update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Remove stale comment since there is no longer an eb->eb_lock and document the locking expectation with a lockdep_assert_held statement. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
The function used to release one page (and always the first one), but not anymore since a50924e3 ("btrfs: drop constant param from btrfs_release_extent_buffer_page"). Update the name and comment. Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
The purpose of the function is to free all the pages comprising an extent buffer. This can be achieved with a simple for loop rather than the slightly more involved 'do {} while' construct. So rewrite the loop using a 'for' construct. Additionally we can never have an extent_buffer that has 0 pages so remove the check for index == 0. No functional changes. The reversed order used to have a meaning in the past where the first page served as a blocking point for several callers. See eg 4f2de97a ("Btrfs: set page->private to the eb"). Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Commit eb14ab8e ("Btrfs: fix page->private races") fixed a genuine race between extent buffer initialisation and btree_releasepage. Unfortunately as the code has evolved the comments weren't changed which made them slightly wrong and they weren't very clear in the fist place. Fix this by (hopefully) rewording them in a more approachable manner. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Nikolay Borisov authored
Current version of the page unlocking code was added in 727011e0 ("Btrfs: allow metadata blocks larger than the page size") but even in this commit that particular flag was never used per-se. In fact, btrfs only uses PageChecked for data pages to identify pages which have been dirtied but don't have ORDERED bit set. For more information see 247e743c ("Btrfs: Use async helpers to deal with pages that have been improperly dirtied"). However, this doesn't apply to extent buffer pages. The important bit here is that the pages are unlocked AFTER the extent buffer has been properly recorded in the radix tree to avoid races with btree_releasepage. Let's exploit this fact and simplify the page unlocking sequence by unlocking the pages in-order and removing the redundant PageChecked flag setting/clearing. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Remove the remaining code that misused the page cache pages during device replace and could cause data corruption for compressed nodatasum extents. Such files do not normally exist but there's a bug that allows this combination and the corruption was exposed by device replace fixup code. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
David Sterba authored
There are many places that open code the duplicity factor of the block group profiles, create a common helper. This can be easily extended for more copies. Signed-off-by: David Sterba <dsterba@suse.com>
-
Anand Jain authored
We have assigned the %fs_info->fs_devices in %fs_devices as its not modified just use it for the mutex_lock(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
The fs_info can be fetched from the transaction handle directly. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Lu Fengqi authored
It can be fetched from the transaction handle. In addition, remove the WARN_ON(trans == NULL) because it's not possible to hit this condition. Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
-