Commits · f29021b29a85701c08afadfd51d87163fb078059 · Kirill Smelkov / linux

26 May, 2012 8 commits

Btrfs: add tree mod log to fs_info · f29021b2
Jan Schmidt authored May 16, 2012
```
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
```
f29021b2

Btrfs: dummy extent buffers for tree mod log · 815a51c7

Jan Schmidt authored May 16, 2012

The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

815a51c7

Btrfs: move struct seq_list to ctree.h · 64947ec0
Jan Schmidt authored May 16, 2012
```
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
```
64947ec0

Btrfs: don't set for_cow parameter for tree block functions · 5581a51a

Jan Schmidt authored May 16, 2012

Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

5581a51a

Btrfs: look into the extent during find_all_leafs · 976b1908

Jan Schmidt authored May 17, 2012

Before this patch we called find_all_leafs for a data extent, then called
find_all_roots and then looked into the extent to grab the information
we were seeking. This was done without holding the leaves locked to avoid
deadlocks. However, this can obviouly race with concurrent tree
modifications.

Instead, we now look into the extent while we're holding the lock during
find_all_leafs and store this information together with the leaf list.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

976b1908

Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs · d5c88b73

Jan Schmidt authored May 15, 2012

The key we store with a tree block backref is only a hint. It is set when
the ref is created and can remain correct for a long time. As the tree is
rebalanced, however, eventually the key no longer points to the correct
destination.

With this patch, we change find_parent_nodes to no longer add keys unless it
knows for sure they're correct (e.g. because they're for an extent data
backref). Then when we later encounter a backref ref with no parent and no
key set, we grab the block and take the first key from the block itself.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

d5c88b73

Btrfs: bugfix in btrfs_find_parent_nodes · dadcaf78

Jan Schmidt authored May 22, 2012

That one has been around since the addition of backref.c. Due to the way we
calculate our slot numbers, after adding inline refs we're missing one keyed
ref unless it's located at the beginning of a new leaf.
Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

dadcaf78

Btrfs: ulist realloc bugfix · cd1b413c

Jan Schmidt authored May 22, 2012

ulist_next gets the pointer to the previously returned element to find the
next element from there. However, when we call ulist_add while iteration
with ulist_next is in progress (ulist explicitly supports this), we can
realloc the ulist internal memory, which makes the pointer to the previous
element useless.

Instead, we now use an iterator parameter that's independent from the
internal pointers.
Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

cd1b413c

06 May, 2012 1 commit

Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919

Chris Mason authored May 06, 2012

verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b9fab919

04 May, 2012 4 commits

Btrfs: fix crash in scrub repair code when device is missing · ea9947b4

Stefan Behrens authored May 04, 2012

Fix that when scrub tries to repair an I/O or checksum error and one of
the devices containing the mirror is missing, it crashes in bio_add_page
because the bdev is a NULL pointer for missing devices.
Reported-by: Marco L. Crociani <marco.crociani@gmail.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ea9947b4

btrfs: Fix mismatching struct members in ioctl.h · d04b1deb

Alexander Block authored May 04, 2012

Fix the size members of btrfs_ioctl_ino_path_args and
btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used
__u64 and the kernel headers used __u32 before.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d04b1deb

Btrfs: fix page leak when allocing extent buffers · 17de39ac

Josef Bacik authored May 04, 2012

If we happen to alloc a extent buffer and then alloc a page and notice that
page is already attached to an extent buffer, we will only unlock it and
free our existing eb. Any pages currently attached to that eb will be
properly freed, but we don't do the page_cache_release() on the page where
we noticed the other extent buffer which can cause us to leak pages and I
hope cause the weird issues we've been seeing in this area. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

17de39ac

Btrfs: Add properly locking around add_root_to_dirty_list · e5846fc6

Chris Mason authored May 03, 2012

add_root_to_dirty_list happens once at the very beginning of the
transaction, but it is still racey.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e5846fc6

27 Apr, 2012 7 commits

Btrfs: reduce lock contention during extent insertion · dc7fdde3

Chris Mason authored Apr 27, 2012

We're spending huge amounts of time on lock contention during
end_io processing because we unconditionally assume we are overwriting
an existing extent in the file for each IO.

This checks to see if we are outside i_size, and if so, it uses a
less expensive readonly search of the btree to look for existing
extents.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

dc7fdde3

Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir · fede766f

Chris Mason authored Apr 27, 2012

Btrfs has an optimization where it will preallocate dentries during
readdir to fill in enough information to open the inode without an extra
lookup.

But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
that leads to deadlocks because our readdir code has tree locks held.

For now, disable this optimization.  We'll fix the gfp mask in the next
merge window.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fede766f

Btrfs: Fix space checking during fs resize · 7654b724

Daniel J Blueman authored Apr 27, 2012

Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7654b724

Btrfs: fix block_rsv and space_info lock ordering · 1f699d38

Stefan Behrens authored Apr 27, 2012

may_commit_transaction() calls
        spin_lock(&space_info->lock);
        spin_lock(&delayed_rsv->lock);
and update_global_block_rsv() calls
        spin_lock(&block_rsv->lock);
        spin_lock(&sinfo->lock);

Lockdep complains about this at run time.
Everywhere except in update_global_block_rsv(), the space_info lock is
the outer lock, therefore the locking order in update_global_block_rsv()
is changed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1f699d38

Btrfs: Prevent root_list corruption · 1daf3540

Daniel J Blueman authored Apr 27, 2012

I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1daf3540

Btrfs: fix repair code for RAID10 · 3e74317a

Jan Schmidt authored Apr 27, 2012

btrfs_map_block sets mirror_num, so that the repair code knows eventually
which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
Before this fix mirror_num was incorrectly related to our stripe index.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3e74317a

Btrfs: do not start delalloc inodes during sync · 996d282c

Josef Bacik authored Apr 23, 2012

btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
start writing them out, but it doesn't splice the list or anything so as
long as somebody is doing work on the box you could end up in this section
_forever_.  So just remove it, it's not needed anyway since sync will start
writeback on all inodes anyway, all we need to do is wait for ordered
extents and then we can commit the transaction.  In my horrible torture test
sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

996d282c

18 Apr, 2012 19 commits

Btrfs: fix that check_int_data mount option was ignored · 25cd999e

Stefan Behrens authored Mar 30, 2012

The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

25cd999e

Btrfs: don't count CRC or header errors twice while scrubbing · 5c84fc3c

Stefan Behrens authored Mar 30, 2012

Each CRC or header error was counted twice, this is now fixed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

5c84fc3c

Btrfs: fix btrfs_ioctl_dev_info() crash on missing device · 99ba55ad

Stefan Behrens authored Mar 19, 2012

When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

99ba55ad

btrfs: don't return EINTR · b9688bb8

Arne Jansen authored Apr 18, 2012

It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.
Signed-off-by: Arne Jansen <sensille@gmx.net>

b9688bb8

Btrfs: double unlock bug in error handling · 253beebd

Dan Carpenter authored Apr 18, 2012

The caller expects this function to return with the lock held and
releases it immediately on error.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

253beebd

Btrfs: always store the mirror we read the eb from · 5cf1ab56

Josef Bacik authored Apr 16, 2012

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

5cf1ab56

fs/btrfs/volumes.c: add missing free_fs_devices · 48d28232

Julia Lawall authored Apr 14, 2012

Free fs_devices as done in the error-handling code just below.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

48d28232

btrfs: fix early abort in 'remount' · 8a3db184

Sergei Trofimovich authored Apr 16, 2012

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>

8a3db184

Btrfs: fix max chunk size check in chunk allocator · 37db63a4

Ilya Dryomov authored Apr 13, 2012

Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.

Cc: Arne Jansen <sensille@gmx.net>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

37db63a4

Btrfs: add missing read locks in backref.c · b916a59a

Jan Schmidt authored Apr 13, 2012

iref_to_path and iterate_irefs both increment the eb's refcount to use it
after releasing the path. Both depend on consistent data remaining in the
extent buffer and need a read lock to protect it.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

b916a59a

Btrfs: don't call free_extent_buffer twice in iterate_irefs · aefc1eb1

Jan Schmidt authored Apr 13, 2012

Avoid calling free_extent_buffer more than once when the iterator function
returns non-zero. The only code that uses this is scrub repair for corrupted
nodatasum blocks.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

aefc1eb1

Btrfs: Make free_ipath() deal gracefully with NULL pointers · 4735fb28

Jesper Juhl authored Apr 12, 2012

Make free_ipath() behave like most other freeing functions in the
kernel and gracefully do nothing when passed a NULL pointer.

Besides this making the bahaviour consistent with functions such as
kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL
deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that
function we have this code:

...
        ipath = init_ipath(size, root, path);
        if (IS_ERR(ipath)) {
                ret = PTR_ERR(ipath);
                ipath = NULL;
                goto out;
        }
...
out:
        btrfs_free_path(path);
        free_ipath(ipath);
...

If we ever take the true branch of that 'if' statement we'll end up
passing a NULL pointer to free_ipath() which will subsequently
dereference it and we'll go "Boom" :-(
This patch will avoid that.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>

4735fb28

Btrfs: avoid possible use-after-free in clear_extent_bit() · cdc6a395

Li Zefan authored Mar 12, 2012

clear_extent_bit()
{
    next_node = rb_next(&state->rb_node);
    ...
    clear_state_bit(state);  <-- this may free next_node
    if (next_node) {
        state = rb_entry(next_node);
        ...
    }
}

clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

cdc6a395

Btrfs: retrurn void from clear_state_bit · 8e52acf7

Li Zefan authored Mar 12, 2012

Currently it returns a set of bits that were cleared, but this return
value is not used at all.

Moreover it doesn't seem to be useful, because we may clear the bits
of a few extent_states, but only the cleared bits of last one is
returned.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

8e52acf7

btrfs: add missing unlocks to transaction abort paths · 871383be

David Sterba authored Apr 02, 2012

Added in commit 49b25e05
("btrfs: enhance transaction abort infrastructure")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.cz>

871383be

Btrfs: do not mount when we have a sectorsize unequal to PAGE_SIZE · 8d082fb7

Liu Bo authored Apr 03, 2012

Our code is not ready to cope with a sectorsize that's not equal to PAGE_SIZE.
It will lead to hanging-on while writing something.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>

8d082fb7

btrfs: don't add both copies of DUP to reada extent tree · 207a232c

Arne Jansen authored Feb 25, 2012

Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.
Signed-off-by: Arne Jansen <sensille@gmx.net>

207a232c

btrfs: fix race in reada · 8c9c2bf7

Arne Jansen authored Feb 25, 2012

When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.
Signed-off-by: Arne Jansen <sensille@gmx.net>

8c9c2bf7

Btrfs: avoid setting ->d_op twice · 848cce0d

Li Zefan authored Feb 21, 2012

Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():

  # mkfs.btrfs /dev/loop3
  # mount /dev/loop3 /mnt
  # btrfs sub create /mnt/sub
  # btrfs sub snap /mnt /mnt/snap
  # touch /mnt/snap/sub
  touch: cannot touch `tmp': Permission denied

__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>

848cce0d

13 Apr, 2012 1 commit

Btrfs: use commit root when loading free space cache · d53ba474

Josef Bacik authored Apr 12, 2012

A user reported that booting his box up with btrfs root on 3.4 was way
slower than on 3.3 because I removed the ideal caching code. It turns out
that we don't load the free space cache if we're in a commit for deadlock
reasons, but since we're reading the cache and it hasn't changed yet we are
safe reading the inode and free space item from the commit root, so do that
and remove all of the deadlock checks so we don't unnecessarily skip loading
the free space cache. The user reported this fixed the slowness. Thanks,
Tested-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d53ba474