- 12 Nov, 2013 40 commits
-
-
Wang Shilong authored
Originally, we introduced scrub_super_lock to synchronize tree log code with scrubbing super. However we can replace scrub_super_lock with device_list_mutex, because writing super will hold this mutex, this will reduce an extra lock holding when writing supers in sync log code. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Wang Shilong authored
We define a 'int' to get extent's generation by mistake,fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Miao Xie authored
After running space balance on a new fs, the fs check program outputed the following warning message: free space inode generation (0) did not match free space cache generation (20) Steps to reproduce: # mkfs.btrfs -f <dev> # mount <dev> <mnt> # btrfs balance start <mnt> # umount <mnt> # btrfs check <dev> It was because there was no data space after the space balance, and the free space write out task didn't try to allocate a new data chunk for the free space inode when doing the reservation. So the data space reservation failed, and in order to tell the free space loader that this free space inode could not be trusted, the generation of the free space inode wasn't updated. Then the check program found this problem and outputed the above message. But in fact, it is safe that we try to allocate a new data chunk when we find the data space is not enough. The patch fixes the above problem by this way. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I noticed with my horrible snapshot excercisor that we were taking forever to relocate the larger the file system got. This appeared to be because we were committing the transaction _constantly_. There were a few places where we do braindead things with metadata reservation, like start a transaction and then try to refill the block rsv, which not only keeps us from committing a transaction during the enospc stuff, but keeps us from doing some of the harder flushing work which will make us more likely to need to commit the transaction. We also were checking the block rsv and committing the transaction if the block rsv was below a certain threshold, but we were doing this in a place where we don't actually keep anything in the block rsv so this was always ending up false so we always committed the transaction in this case. I tested this to make sure it didn't break anything, but it takes about 10 hours to get the box to this state so I don't know how much of an impact it will really make. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
When using delalloc workers in a non-waiting way (like for enospc handling) we can end up not actually waiting for the dirty pages to be started if we have compression. We need to add an extra filemap flush to make sure any async extents that have started are actually moved along before returning. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
A user reported a list corruption warning from btrfs_remove_ordered_extent, it is because we aren't taking the ordered_root_lock when we remove the inode from the ordered operations list. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
This is just the write path, the only reason we start a transaction is so we can check cross references, we don't make any actual changes, so there is no reason to abort the transaction if we fail. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
We can just return an error and we'll bail out properly. We still want to catch this case to make sure we don't have a bug somewhere, so just warn if this pops up. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
Apparently we don't actually close the files until we return to userspace, so stop using vfs_read in send. This is actually better for us since we can avoid all the extra logic of holding the file we're sending open and making sure to clean it up. This will fix people who have been hitting too many files open errors when trying to send. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
In mixed-mode, when a data-block was later reused for metadata, a warning was printed. This condition is now filtered out and the warning is eliminated in this case. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
Yet another cleanup patch broke code for which no xfstest exists. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
Instead of doing another extent tree search if the first search failed to find a metadata item, check if the previous item in the leaf is an extent item and use it if it is, otherwise do the second tree search for an extent item. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Jeff Mahoney authored
When debugging ENOSPC issues, it's nice to be able to see which reservations failed as well as the ones which succeeded. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Zach Brown authored
fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Zach Brown authored
move_pages() has an inefficient backwards byte copy of regions of two different pages. They're different pages so the regions won't overlap and it could use memcpy(). At that point, though, move_pages() would be a slightly dimmer re-implementation of copy_pages() that lacked the test for overlapping page regions. So remove move_pages() and just call copy_pages(). Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Zach Brown authored
Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
When a directory has a default ACL and a subdirectory is created under that directory, btrfs_init_acl() is called when the subdirectory's inode is created to initialize the inode's ACL (inherited from the parent directory) but it was clearing the ACL from the inode after setting it if posix_acl_create() returned success, instead of clearing it only if it returned an error. To reproduce this issue: $ mkfs.btrfs -f /dev/loop0 $ mount /dev/loop0 /mnt $ mkdir /mnt/acl $ setfacl -d --set u::rwx,g::rwx,o::- /mnt/acl $ getfacl /mnt/acl user::rwx group::rwx other::r-x default:user::rwx default:group::rwx default:other::--- $ mkdir /mnt/acl/dir1 $ getfacl /mnt/acl/dir1 user::rwx group::rwx other::--- After unmounting and mounting again the filesystem, fgetacl returned the expected ACL: $ umount /mnt/acl $ mount /dev/loop0 /mnt $ getfacl /mnt/acl/dir1 user::rwx group::rwx other::--- default:user::rwx default:group::rwx default:other::--- Meaning that the underlying xattr was persisted. Reported-by: Giuseppe Fierro <giuseppe@fierro.org> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
Due to an off-by-one error, it is possible to reproduce a bug when the inode cache is used. The same inode number is assigned twice, the second time this leads to an EEXIST in btrfs_insert_empty_items(). The issue can happen when a file is removed right after a subvolume is created and then a new inode number is created before the inodes in free_inode_pinned are processed. unlink() calls btrfs_return_ino() which calls start_caching() in this case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by searching for the highest inode (which already cannot find the unlinked one anymore in btrfs_find_free_objectid()). So if this unlinked inode's number is equal to the highest_ino + 1 (or >= this value instead of > this value which was the off-by-one error), we mustn't add the inode number to free_ino_pinned (caching_thread() does it right). In this case we need to try directly to add the number to the inode_cache which will fail in this case. When this inode number is allocated while it is still in free_ino_pinned, it is allocated and still added to the free inode cache when the pinned inodes are processed, thus one of the following inode number allocations will get an inode that is already in use and fail with EEXIST in btrfs_insert_empty_items(). One example which was created with the reproducer below: Create a snapshot, work in the newly created snapshot for the rest. In unlink(inode 34284) call btrfs_return_ino() which calls start_caching(). start_caching() calls add_free_space [34284, 18446744073709517077]. In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. btrfs_unpin_free_ino calls add_free_space [34284, 1]. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. EEXIST when the new inode is inserted. One possible reproducer is this one: #!/bin/sh # preparation TEST_DEV=/dev/sdc1 TEST_MNT=/mnt umount ${TEST_MNT} 2>/dev/null || true mkfs.btrfs -f ${TEST_DEV} mount ${TEST_DEV} ${TEST_MNT} -o \ rw,relatime,compress=lzo,space_cache,inode_cache btrfs subv create ${TEST_MNT}/s1 for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2 FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'` rm ${TEST_MNT}/s2/$FILENAME touch ${TEST_MNT}/s2/$FILENAME # the following steps can be repeated to reproduce the issue again and again [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3 btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3 rm ${TEST_MNT}/s3/$FILENAME touch ${TEST_MNT}/s3/$FILENAME ls -alFi ${TEST_MNT}/s?/$FILENAME touch ${TEST_MNT}/s3/_1 || logger FAILED ls -alFi ${TEST_MNT}/s?/_1 touch ${TEST_MNT}/s3/_2 || logger FAILED ls -alFi ${TEST_MNT}/s?/_2 touch ${TEST_MNT}/s3/__1 || logger FAILED ls -alFi ${TEST_MNT}/s?/__1 touch ${TEST_MNT}/s3/__2 || logger FAILED ls -alFi ${TEST_MNT}/s?/__2 # if the above is not enough, add the following loop: for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done # one of the touch(1) calls in s3 fail due to EEXIST because the inode is # already in use that btrfs_find_ino_for_alloc() returns. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
If we decrement the key type, we must reset its offset to the largest possible offset (u64)-1. If we decrement the key's objectid, then we must reset the key's type and offset to their largest possible values, (u8)-1 and (u64)-1 respectively. Not doing so can make us miss an items in the tree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
Avoid repeated tree searches by processing all inode ref items in a leaf at once instead of processing one at a time, followed by a path release and a tree search for a key with a decremented offset. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Geyslan G. Bem authored
Use memdup_user rather than duplicating its implementation This is a little bit restricted to reduce false positives The semantic patch that makes this report is available in scripts/coccinelle/api/memdup_user.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
chandan authored
Comparison of an inode's last modified transaction with the last committed transaction is incorrect. Fix it. Signed-off-by: chandan <chandan@linux.vnet.ibm.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
If the path allocation failed, we would return without decrementing the reference count in the delayed node we got before, resulting in a leak. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
If the user remounts the filesystem read-only while the uuid-tree scan and rebuild task is still running (this happens once after the filesystem was mounted with an old kernel, or when forced with the mount options), the remount should wait on the tasks completion before setting the filesystem read-only. Otherwise the background task continues to write to the filesystem which is apparently not what users expect. The reproducer: TEST_DEV=/dev/sdzzzzz1 TEST_MNT=/mnt mkfs.btrfs -f $TEST_DEV mount $TEST_DEV $TEST_MNT for i in `seq 50000`; do btrfs subvolume create ${TEST_MNT}/$i; done umount $TEST_MNT mount $TEST_DEV $TEST_MNT -o rescan_uuid_tree sleep 1 ps -elf | fgrep '[btrfs-uuid]' | grep -v grep mount $TEST_DEV $TEST_MNT -o ro,remount ps -elf | fgrep '[btrfs-uuid]' | grep -v grep sleep 1 umount $TEST_MNT Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Stefan Behrens authored
Device stats are only initialized (read from tree items) on mount. Trying to read device stats after adding or replacing new devices will return errors. btrfs_init_new_device() and btrfs_init_dev_replace_tgtdev() are the two functions that allocate and initialize new btrfs_device structures after a filesystem is mounted. They set the device stats to zero by using kzalloc() which is correct for new devices. The only missing thing was to declare these stats as being valid (device->dev_stats_valid = 1) and this patch adds this missing code. This is the reproducer: TEST_DEV1=/dev/sdzzzzz1 TEST_DEV2=/dev/sdzzzzz2 TEST_DEV3=/dev/sdzzzzz3 TEST_MNT=/mnt mkfs.btrfs $TEST_DEV1 mount $TEST_DEV1 $TEST_MNT btrfs device add $TEST_DEV2 $TEST_MNT btrfs device stat $TEST_MNT btrfs replace start -B $TEST_DEV2 $TEST_DEV3 $TEST_MNT btrfs device stat $TEST_MNT umount $TEST_MNT Reported-by: Ondrej Kunc <kunc88@gmail.com> Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Liu Bo authored
When we fail to add a reference after a non-inline insertion by some reasons, eg. ENOSPC, we'll abort the transaction, but we don't return this error to the caller who has to walk around again to find something wrong, that's unnecessary. Also fixup other error paths to keep it simple. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Ilya Dryomov authored
For both balance and replace, cancelling involves changing the on-disk state and committing a transaction, which is not a good thing to do on read-only filesystems. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Ilya Dryomov authored
struct btrfs_ioctl_dev_replace_args memory is leaked if replace is requested on a read-only filesystem. Fix it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Ilya Dryomov authored
On mount failures, __btrfs_close_devices can be called well before dev-replace state is read and ->is_tgtdev_for_dev_replace is set. This leads to a bogus decrement of ->rw_devices and sets off a WARN_ON in __btrfs_close_devices if replace target device happens to be on the lists and we fail early in the mount sequence. Fix this by checking the devid instead of ->is_tgtdev_for_dev_replace before the decrement: for replace targets devid is always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Geyslan G. Bem authored
In add_inode_ref() function: Initializes local pointers. Reduces the logical condition with the __add_inode_ref() return value by using only one 'goto out'. Centralizes the exiting, ensuring the freeing of all used memory. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Liu Bo authored
After commit de78b51a (btrfs: remove cache only arguments from defrag path), @blockptr is no more used. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Liu Bo authored
@is_extent is no more needed since we don't defrag extent root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Filipe David Borba Manana authored
The btrfs_insert_empty_item() function doesn't modify its key argument. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Chandra Seetharaman authored
alloc_extent_buffer() uses radix_tree_lookup() when radix_tree_insert() fails with EEXIST. That part of the code is very similar to the code in find_extent_buffer(). This patch replaces radix_tree_lookup() and surrounding code in alloc_extent_buffer() with find_extent_buffer(). Note that radix_tree_lookup() does not need to be protected by tree->buffer_lock. It is protected by eb->refs. While at it, this patch - changes the other usage of radix_tree_lookup() in alloc_extent_buffer() with find_extent_buffer() to reduce redundancy. - removes the unused argument 'len' to find_extent_buffer(). Signed-Off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
Whoever wrote this was braindead. Also it doesn't work right if you have VACANCY's since we assumed you would only have that at the end of the file, which won't be the case in the near future. I tested this with generic/285 and generic/286 as well as the btrfs tests that use fssum since it uses seek_hole/seek_data to verify things are ok. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I was hitting weird issues when trying to remove hole extents and it turned out it was because I was sending non-aligned offsets down to btrfs_lookup_csums_range. So add an assert for this in case somebody trips over this in the future. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
I added an assert to make sure we were looking up aligned offsets for csums and I tripped it when running xfstests. This is because log_one_extent was checking if block_start == 0 for a hole instead of EXTENT_MAP_HOLE. This worked out fine in practice it seems, but it adds a lot of extra work that is uneeded. With this fix I'm no longer tripping my assert. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
Btrfs_get_extent was not handling this case properly, add a test to make sure we don't regress. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-
Josef Bacik authored
While trying to kill our hole extents I noticed I was seeing problems where we seek into a file and then start writing and then try to fiemap that file later. This is because we search for offset 0, don't find anything and so back up one slot, which puts us at the inode ref or something like that, which means we goto not_found and create an extent map for our entire search area. This isn't quite what we want, we want to move forward one slot and see if there is an extent there so we can limit our hole extent. This patch fixes this problem, I will add a testcase for this as well. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
-