Commits · f395694c2cd76cb1882fa82dd37e761598367fe9 · nexedi / linux

31 May, 2012 4 commits

Btrfs: fix tree mod log del_ptr · f395694c

Jan Schmidt authored May 31, 2012

Logging for del_ptr when we're not deleting the last pointer was wrong. This
fixes both, duplicate log entries and log sequence.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

f395694c

Btrfs: add tree_mod_dont_log helper · e9b7fd4d

Jan Schmidt authored May 31, 2012

Replace duplicate code by small inline helper function.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

e9b7fd4d

Btrfs: add missing spin_lock for insertion into tree mod log · 926dd8a6

Jan Schmidt authored May 31, 2012

tree_mod_alloc calls __get_tree_mod_seq and must acquire a spinlock before
doing so.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

926dd8a6

Btrfs: add inodes before dropping the extent lock in find_all_leafs · 3301958b

Jan Schmidt authored May 30, 2012

We must build up the inode list with the extent lock held after following
indirect refs.

This also requires an extension to ulists, which allows to modify the stored
aux value in case a key already exists in the list.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

3301958b

30 May, 2012 8 commits

Btrfs: use delayed ref sequence numbers for all fs-tree updates · 95a06077

Jan Schmidt authored May 29, 2012

The sequence number for delayed refs is needed to postpone certain delayed
refs for a very short period while walking backrefs. Before the tree
modification log, we thought we'd only have to hold back those references
that don't have a counter operation.

While now we've the tree mod log, we're rewinding fs tree blocks to a
defined consistent state. We cannot know in advance for which tree block
we'll be doing rewind operations later. Therefore, we must postpone all the
delayed refs for fs-tree blocks, even those having a counter operation.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

95a06077

Btrfs: tree mod log sanity checks in join_transaction · 20b297d6

Jan Schmidt authored May 20, 2012

When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

20b297d6

Btrfs: fs_info variable for join_transaction · 19ae4e81
Jan Schmidt authored May 20, 2012
```
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
```
19ae4e81

Btrfs: use the tree modification log for backref resolving · 8445f61c

Jan Schmidt authored May 16, 2012

This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

8445f61c

Btrfs: add btrfs_search_old_slot · 5d9e75c4

Jan Schmidt authored May 16, 2012

The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

5d9e75c4

Btrfs: add del_ptr and insert_ptr modifications to the tree mod log · f3ea38da

Jan Schmidt authored May 26, 2012

Record all relevant modifications to block pointers in the tree mod log so
that we can rewind them later on for backref walking.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

f3ea38da

Btrfs: put all block modifications into the tree mod log · f230475e

Jan Schmidt authored May 26, 2012

When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we're currently modifying. If so, we record our modification to be
able to rewind it later on.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

f230475e

Btrfs: add tree modification log functions · bd989ba3

Jan Schmidt authored May 16, 2012

The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.

With the tree modification log, it's possible to reconstruct a consistent
old state of the tree. This is required to do backref walking on a busy
file system.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

bd989ba3

26 May, 2012 8 commits

Btrfs: add tree mod log to fs_info · f29021b2
Jan Schmidt authored May 16, 2012
```
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
```
f29021b2

Btrfs: dummy extent buffers for tree mod log · 815a51c7

Jan Schmidt authored May 16, 2012

The tree modification log needs two ways to create dummy extent buffers,
once by allocating a fresh one (to rebuild an old root) and once by
cloning an existing one (to make private rewind modifications) to it.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

815a51c7

Btrfs: move struct seq_list to ctree.h · 64947ec0
Jan Schmidt authored May 16, 2012
```
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
```
64947ec0

Btrfs: don't set for_cow parameter for tree block functions · 5581a51a

Jan Schmidt authored May 16, 2012

Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

5581a51a

Btrfs: look into the extent during find_all_leafs · 976b1908

Jan Schmidt authored May 17, 2012

Before this patch we called find_all_leafs for a data extent, then called
find_all_roots and then looked into the extent to grab the information
we were seeking. This was done without holding the leaves locked to avoid
deadlocks. However, this can obviouly race with concurrent tree
modifications.

Instead, we now look into the extent while we're holding the lock during
find_all_leafs and store this information together with the leaf list.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

976b1908

Btrfs: bugfix: ignore the wrong key for indirect tree block backrefs · d5c88b73

Jan Schmidt authored May 15, 2012

The key we store with a tree block backref is only a hint. It is set when
the ref is created and can remain correct for a long time. As the tree is
rebalanced, however, eventually the key no longer points to the correct
destination.

With this patch, we change find_parent_nodes to no longer add keys unless it
knows for sure they're correct (e.g. because they're for an extent data
backref). Then when we later encounter a backref ref with no parent and no
key set, we grab the block and take the first key from the block itself.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

d5c88b73

Btrfs: bugfix in btrfs_find_parent_nodes · dadcaf78

Jan Schmidt authored May 22, 2012

That one has been around since the addition of backref.c. Due to the way we
calculate our slot numbers, after adding inline refs we're missing one keyed
ref unless it's located at the beginning of a new leaf.
Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

dadcaf78

Btrfs: ulist realloc bugfix · cd1b413c

Jan Schmidt authored May 22, 2012

ulist_next gets the pointer to the previously returned element to find the
next element from there. However, when we call ulist_add while iteration
with ulist_next is in progress (ulist explicitly supports this), we can
realloc the ulist internal memory, which makes the pointer to the previous
element useless.

Instead, we now use an iterator parameter that's independent from the
internal pointers.
Reported-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>

cd1b413c

06 May, 2012 1 commit

Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919

Chris Mason authored May 06, 2012

verify_parent_transid needs to lock the extent range to make
sure no IO is underway, and so it can safely clear the
uptodate bits if our checks fail.

But, a few callers are using it with spinlocks held.  Most
of the time, the generation numbers are going to match, and
we don't want to switch to a blocking lock just for the error
case.  This adds an atomic flag to verify_parent_transid,
and changes it to return EAGAIN if it needs to block to
properly verifiy things.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

b9fab919

04 May, 2012 4 commits

Btrfs: fix crash in scrub repair code when device is missing · ea9947b4

Stefan Behrens authored May 04, 2012

Fix that when scrub tries to repair an I/O or checksum error and one of
the devices containing the mirror is missing, it crashes in bio_add_page
because the bdev is a NULL pointer for missing devices.
Reported-by: Marco L. Crociani <marco.crociani@gmail.com>
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

ea9947b4

btrfs: Fix mismatching struct members in ioctl.h · d04b1deb

Alexander Block authored May 04, 2012

Fix the size members of btrfs_ioctl_ino_path_args and
btrfs_ioctl_logical_ino_args. The user space btrfs-progs utilities used
__u64 and the kernel headers used __u32 before.
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

d04b1deb

Btrfs: fix page leak when allocing extent buffers · 17de39ac

Josef Bacik authored May 04, 2012

If we happen to alloc a extent buffer and then alloc a page and notice that
page is already attached to an extent buffer, we will only unlock it and
free our existing eb. Any pages currently attached to that eb will be
properly freed, but we don't do the page_cache_release() on the page where
we noticed the other extent buffer which can cause us to leak pages and I
hope cause the weird issues we've been seeing in this area. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

17de39ac

Btrfs: Add properly locking around add_root_to_dirty_list · e5846fc6

Chris Mason authored May 03, 2012

add_root_to_dirty_list happens once at the very beginning of the
transaction, but it is still racey.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

e5846fc6

27 Apr, 2012 7 commits

Btrfs: reduce lock contention during extent insertion · dc7fdde3

Chris Mason authored Apr 27, 2012

We're spending huge amounts of time on lock contention during
end_io processing because we unconditionally assume we are overwriting
an existing extent in the file for each IO.

This checks to see if we are outside i_size, and if so, it uses a
less expensive readonly search of the btree to look for existing
extents.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

dc7fdde3

Btrfs: avoid deadlocks from GFP_KERNEL allocations during btrfs_real_readdir · fede766f

Chris Mason authored Apr 27, 2012

Btrfs has an optimization where it will preallocate dentries during
readdir to fill in enough information to open the inode without an extra
lookup.

But, we're calling d_alloc, which is doing GFP_KERNEL allocations, and
that leads to deadlocks because our readdir code has tree locks held.

For now, disable this optimization.  We'll fix the gfp mask in the next
merge window.
Signed-off-by: Chris Mason <chris.mason@oracle.com>

fede766f

Btrfs: Fix space checking during fs resize · 7654b724

Daniel J Blueman authored Apr 27, 2012

Fix out-of-space checking, addressing a warning and potential resource
leak when resizing the filesystem down while allocating blocks.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

7654b724

Btrfs: fix block_rsv and space_info lock ordering · 1f699d38

Stefan Behrens authored Apr 27, 2012

may_commit_transaction() calls
        spin_lock(&space_info->lock);
        spin_lock(&delayed_rsv->lock);
and update_global_block_rsv() calls
        spin_lock(&block_rsv->lock);
        spin_lock(&sinfo->lock);

Lockdep complains about this at run time.
Everywhere except in update_global_block_rsv(), the space_info lock is
the outer lock, therefore the locking order in update_global_block_rsv()
is changed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1f699d38

Btrfs: Prevent root_list corruption · 1daf3540

Daniel J Blueman authored Apr 27, 2012

I was seeing root_list corruption on unmount during fs resize in 3.4-rc4; add
correct locking to address this.
Signed-off-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

1daf3540

Btrfs: fix repair code for RAID10 · 3e74317a

Jan Schmidt authored Apr 27, 2012

btrfs_map_block sets mirror_num, so that the repair code knows eventually
which device gave us the read error. For RAID10, mirror_num must be 1 or 2.
Before this fix mirror_num was incorrectly related to our stripe index.
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

3e74317a

Btrfs: do not start delalloc inodes during sync · 996d282c

Josef Bacik authored Apr 23, 2012

btrfs_start_delalloc_inodes will just walk the list of delalloc inodes and
start writing them out, but it doesn't splice the list or anything so as
long as somebody is doing work on the box you could end up in this section
_forever_.  So just remove it, it's not needed anyway since sync will start
writeback on all inodes anyway, all we need to do is wait for ordered
extents and then we can commit the transaction.  In my horrible torture test
sync goes from taking 4 minutes to about 1.5 minutes.  Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

996d282c

18 Apr, 2012 8 commits

Btrfs: fix that check_int_data mount option was ignored · 25cd999e

Stefan Behrens authored Mar 30, 2012

The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

25cd999e

Btrfs: don't count CRC or header errors twice while scrubbing · 5c84fc3c

Stefan Behrens authored Mar 30, 2012

Each CRC or header error was counted twice, this is now fixed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

5c84fc3c

Btrfs: fix btrfs_ioctl_dev_info() crash on missing device · 99ba55ad

Stefan Behrens authored Mar 19, 2012

When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>

99ba55ad

btrfs: don't return EINTR · b9688bb8

Arne Jansen authored Apr 18, 2012

It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.
Signed-off-by: Arne Jansen <sensille@gmx.net>

b9688bb8

Btrfs: double unlock bug in error handling · 253beebd

Dan Carpenter authored Apr 18, 2012

The caller expects this function to return with the lock held and
releases it immediately on error.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

253beebd

Btrfs: always store the mirror we read the eb from · 5cf1ab56

Josef Bacik authored Apr 16, 2012

A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>

5cf1ab56

fs/btrfs/volumes.c: add missing free_fs_devices · 48d28232

Julia Lawall authored Apr 14, 2012

Free fs_devices as done in the error-handling code just below.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

48d28232

btrfs: fix early abort in 'remount' · 8a3db184

Sergei Trofimovich authored Apr 16, 2012

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Josef Bacik <josef@redhat.com>
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>

8a3db184